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HUMAN DNA SEQUENCES 
Background of the Invention 

Current methods for testing pharmacological substances rely on a three-stage testing 
approach to drug development. First, candidate compounds are typically screened in some 
sort of in vitro system, like inhibition of cancer cell growth. Candidates are then tested in 
an animal model, as a first approximation of systemic effects, including efficacy and 
toxicity. Compounds that still show promise after these initial in vivo screens, finally are 
tested in humans. Again, human testing typically occurs in three phases: toxicity; 
preliminary efficacy; and efficacy. The entire process can take more than a decade and cost 
hundreds of millions of dollars. Aside from the monetary costs and protracted time scale, 
moreover, current testing regimes waste the lives of countless laboratory animals and 
needlessly endanger the lives of human subjects. 

A need exists, therefore, for more sophisticated drug screening techniques that can 
be done rapidly in vitro. These screening techniques ideally will be reflective of systemic 
and/or organ-specific responses, so that they provide a reliable indicator of action in a 
human body. Current techniques, however, tend to utilize only a single or limited number 
of markers, thus answering only very simple questions that are of questionable medical 
import. For example, a typical in vitro assay may ask whether a lead compound binds a 
particular receptor, which has been implicated in a certain disorder. It is presumed that 
such binding is indicative of therapeutic usefulness, but it does not even purport to address 
systemic effects. 

Not only are screening techniques for efficacy inadequate, the available toxicity 
screens likewise are inadequate. Toxicity, on a first level, is usually measured by animal 
testing. Aside from the complications related to in vivo versus in vitro testing, such screens 
are insufficient because of differences in metabolism, uptake, etc., relative to humans. 
Thus, improved methods would be not only be in vi/ro-based, they would also be more 
"human." 

With the increasing miniaturization of screening assays and the growing availability 
of targets for pharmaceutical intervention, there is increasing interest in developing arrays 
containing large numbers of these targets that can be assayed simultaneously. If such an 
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array contains a large enough population of targets, it can be used to essentially mimic the 
systemic response. In other words, the array becomes an in vitro surrogate for the human 
body. The more refined the array, the more accurate the predictive capability. In theory, 
an array could be constructed that can detect all of the known human expression products 
simultaneously, thereby, providing a very reliable indicator of the human response to a 
given compound. These arrays offer advantages over the present in vitro screening systems 
in that they can assay large numbers of responses simultaneously. They are superior to 
animal testing because they are more ''human" and, thus, more predictive of human 
responses. 

In order to construct such arrays, however, the field is in need of farther human 
targets. Advantageously, such targets will be provided with additional physiologically 
relevant information, such as whether the target is expressed in a particular tissue and 
whether it is related to a known functional class of targets. In this way, the artisan can 
focus as needed, for example, on tissue-specific effects or target class-specific effects, 
thereby providing information useful in evaluating efficacy and/or toxicity. 

In addition to a need for pharmacological screening targets, there is a need for 
further pharmacological substances. These substances can be used in the formulation of 
medicinal compositions and in treating a wide variety of disorders. 

The present invention responds to the aforementioned and other needs in the field by 
providing a population of novel targets useful, inter aliCy in the profiling and medicinal 
contexts described above. 

Summary of the Invention 

It is an object of the invention, therefore, to provide a set of human cDNA clones. 
Further to this object, the invention provides sequences of human cDNA clones that were 
isolated from libraries generated from different human tissues. 

It is another object of the invention to provide assemblages of targets useful in 
profiling matrices for screening pharmacological test compounds. According to this object, 
assemblages comprising different populations of human nucleic acids, proteins and 
antibodies are provided. In different embodiments, cDNA library-specific assemblages and 
target-family-specific targets are provided. 


2 


wo 01/1 2659 PCT/IBOO/01496 

It is a further object of the invention to provide a database of human nucleotide and 
protein sequences. Further to this object, novel human nucleotide and protein sequences 
are provided in electronic form. In one embodiment, one or more of these sequences is 
provided in a searchable database. 

It is still another object of the invention to provide biologically active target 
molecules useful in treating or detecting human disorders. Further to this object, the 
invention provides nucleic acid and protein molecules that have the capacity to affect 
disease etiology or symptoms or correlate with known disease states. Also further to this 
object, a database is provided which comprises the disclosed molecules in electronic form. 

It is still a further object of the invention to provide polypeptides encoded by the 
human cDNA clones disclosed herein. Further to this object, the invention provides 
antibodies and fragments thereof that are capable of binding to a specific portion of these 
polypeptides. 

It is yet another object of the invention to provide pharmaceutical compositions which 
comprise an effective amount of a pharmaceutical agent, wherein the pharmaceutical agent is 
selected from the group consisting of one or more polypeptides contemplated by the invention, 
variants or functional derivatives thereof, and antibodies thereto; and a physiologically 
acceptable carrier or excipient. 

It is still another object of the invention to provide expression vectors comprising one 
or more human cDNA clones disclosed herein or fragments thereof; and optionally a 
promoter operably linked to the cDNA clone or fragment thereof . Further to this object, the 
invention provides methodology for recombinantlyproducing a desired peptide, comprising 
expressing in a host cell a peptide encoded by a human cDNA clone disclosed herein. 

Detailed Descrip tion 

The invention results from a need in the art for new human nucleic acids and proteins. 
This need arises in several contexts. First, there is a need to identify targets for therapeutic 
intervention. Second, there is a need to identify molecules that may be adversely affected in a 
therapeutic context, thereby resulting in toxicity. Knowledge of these molecules will aid in 
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the design of new medicaments with enhanced efficacy and decreased toxicity. Finally, the 
need encompasses human nucleic acids and proteins that have medicinal applicability in their 
own right. 

In view of these needs, the present inventors set out to isolate and sequence human 
cDNAs from tissue-specific libraries. In this way, they represent subsets of molecules likely 
to be targets for therapeutic intervention or for avoiding toxicity. In addition, the inventors 
divided the molecules into various sub-categories, based on suspected functionality, structural 
similarity etc, which are of interest from a pharmacological perspective. These molecules are 
disclosed in provisional application serial nos. 60/149,499 and 60/156,503, filed August 18, 
1999, and September 28, 1999, respectively, both of which are hereby incorporated by 
reference in their entirety. 

GENERAL DESCRIPTION OF THE INVENTIVE MOLECULES 

The present invention provides novel polynucleotide molecules that, in some 
instances, have similarities with known molecules. The inventive DNAs were cloned from 
five different human cDNA libraries. In addition to these DNA molecules, die invention 
provides their protein translations and antibodies derived from them. The inventive DNA and 
protein sequences are show individually, below. The inventive nucleic acids also include the 
complements of these DNA sequences, as well as their RNA counterparts. Methods of 
producing the molecules also are provided. Further, the invention provides methods for 
detecting all or part of the molecules and of detecting polynucleotides encoding all or part of 
the molecules. 

The inventive molecules derive from five cDNA libraries: human fetal brain; human 
fetal kidney; human mammary carcinoma; human testis; and human uterus. For convenience, 
each sequence bears a designation that indicates from which library it is derived. In 
particular, tiiese designations are: ^^hfpbr" for human fetal brain; "hfkd" for human fetal 
kidney; "hmcP for human mammary carcinoma; "htes** for human testis; and ''hute" for 
human uterus. The individual libraries were constructed and screened as described below in 
the examples. 

The protein and DNA molecules of the invention are variously described herein as 
"target" molecules or "inventive" molecules. The sequences and other information pertinent 
to the nucleic acid and protein molecules of the invention are shown, below. 
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Interpreting the data disclosed with the Table and cDNA sequences, below: 

The table and data below provide the coding sequences of the inventive cDNAs as 
well as the protein sequences and other useful information, as set out below. 

Grouping 

The clones were assigned to the following fourteen functional and/or tissue-derived 

groups: 

1. Cell Cycle 

2. Cell Structure and Motility 

3 . Differentiation/Development 

4. Intracellular Transport and Trafficking 

5. Metabolism 

6. Nucleic Acid Management 

7. Signal Transduction 

8. Transmembrane Protein 

9. Transcription Factors 

10. Brain derived 

11. Kidney derived 

1 2. Mammary Carcinoma derived 

13. Testes derived 

14. Uterus derived 

Description of Clone Files 

The individual clone files are structured in the same pattern. The Sections are 
separated by paragraphs. 

1. Clone Name 

The clone names are deciphered widi reference to the following example: 
DKFZphfkd2_24e23, wherein the code represents: 

• producer of library ("DKFZ") (for convenience, this reference may be 
eliminated) 

• a "p" for "plasmid cDNA library" (for convenience, this reference may be 
eliminated) * 

• library name (e.g. hfbr = human fetal brain; hfkd = human fetal kidney; hmcf = 
human mammary carcinoma; htes = human testes; hute = human uterus) 

• an underscore ("_") to separate library information from plate infomiation 

• plate number (e.g. "16") 

• plate coordinates Getter first; e.g. "fl 4*') 

2. Group 
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3. Introduction 

short review of the similarities, function of the protein and possible applications 

4. Short Information 

specifications about the cDNA (who sequenced, completeness of the cDNA, similarity, who 
sequenced, chromosomal localisation, length of cDNA, localisation of poly A tail and 
polyadenylation signal) 

5* cDNA-Sequence 

6. BLASTn Results 

search results of blasting the cDNA sequence against all public databases 

7. Medline Entries 

information about genes/proteins similar to the novel cDNA (if available) 

8. Putative Encoded Protein Information 

specifications about the encoded protein (ORF: length and localisation of the reading frame) 

9. Protein Sequence 

10. BLASTp Results 

search results of blasting the protein sequence against all public databases 

11. Pedant Information 

output of fully automated annotation: summarises peptide information, homologies, patterns 
as follows: 

[Length] 

- length of the protein = number of amino acid residues 

[MW] 
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- molecular weight of the protein 

[pl] 

- isoelectric point 
[HOMOL] 

- shows protein with closest similarity to the cDNA-encoded protein 
[FUNCAT] 

- functional mformation according to a catalogue developed by Munich 
Information center for Protein Sequences (MIPS) 

[BLOCKS] 

- Blocks are multiply aligned ungapped segments corresponding to the most 
highly conserved regions of proteins. The blocks for the Blocks Database are made 
automatically by looking for the most highly conserved regions in groups of proteins 
documented in the Prosite Database. The Prosite pattern for a protein group is not 
used in any way to make the Blocks Database and the pattern may or may not be 
contained in one of the blocks representing a group. These blocks are then calibrated 
against the S WISS-PROT database to obtain a measure of the chance distribution of 
matches. It is these calibrated blocks that make up the Blocks Database. The WWW 
versions of the Prosite and SWISS-PROT Databases that are used on this server are 
located at the ExPASy World Wide Web (WWW) Molecular Biology Server of the 
Geneva University Hospital and the University of Geneva. World Wide Web URL 
http://blocks.fhcrc.org/blocks/about_blocks.html/ is the entry point to the database. 

- here Blocks segments foimd in the analysed protein sequences are displayed 
[SCOP] 

Nearly all proteins have structural similarities with other proteins and, in some 
of these cases, share a common evolutionary origin. The scop database provides a 
detailed and comprehensive description of the structural and evolutionary 
relationships between all proteins whose structure is known, including all entries in 
Brookhaven National Laboratory's Protein Data Bank (PDB). It is available as a set of 
tightly linked hypertext documents which make the large database comprehensible 
and accessible. In addition, the hypertext pages offer a panoply of representations of 
proteins, including links to PDB entries, sequences, references, images and interactive 
display systems. World Wide Web URL http://scop.mrc-hiib.cam.ac.uk/scop/ is the 
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entry point to the database. Existing automatic sequence and structure comparison 
tools cannot identify all structural and evolutionary relationships between proteins. 
The scop classification of proteins has been constructed manually by visual inspection 
and comparison of structures, but with the assistance of tools to make the task 
manageable and help provide generality. Proteins are classified to reflect both 
structural and evolutionary relatedness. Many levels exist in the hierarchy, but the 
principal levels are family, superfamily and fold. The exact position of boundaries 
between these levels are to some degree subjective. Scop evolutionary classification is 
generally conservative: where any doubt about relatedness exists, we made new 
divisions at the family and superfamily levels. 

- - here SCOPE segments found in the analysed protein sequences are 
displayed 

[EC] 

ENZYME is a repository of information relative to the nomenclature of 
enzymes. It is primarily based on the recommendations of the Nomenclature 
Committee of the International Union of Biochemistry and Molecular Biology 
(lUBMB) and it describes each type of characterized enzyme for which an EC 
(Enzyme Commission) number has been provided. World Wide Web URL 
http://www.expasy.ch/enzyme/ is the entry point to the database. 

- here EC-number and name of enzymes with similarity to the analysed protein 
sequences are displayed 

[PIRKW] 

- fimctional information according to the Protein Information Resource (PIR) 
database catalogue developed by Munich Information Center for Protein Sequences 
(MIPS), the National Biomedical Research Foundation (NBRF) and the International 
Protein Information Database in Japan (JIPID). 

[SUPFAM] 

- information according to the Protein Information Resource (PIR) database 
catalogue of protein superfamilies developed by Munich Information Center for 
Protein Sequences (MIPS), the National Biomedical Research Foundation (NBRF) 
and the International Protein Information Database in Japan (JIPID). 
[PROSITE] 
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please refer to 12. PROSITE Motifs 
[PFAM] 

please refer to 13. PFAM Motifs 

[KW] 

- overall 2dimensional folding information 

- 3D indicates that the proteins is similar to a protein of which a 3 dimensional 
structure is known 

- overall structural information 

□ 

The last PEDANT-block depicts information about the folding structure of the 
protein generated by PREDATOR. PREDATOR is a secondary structure prediction 
program. It takes as input a single protein sequence to be predicted and can optimally 
use a set of unaligned sequences as additional information to predict the query 
sequence. The mean prediction accuracy of PREDATOR is 68% for a single sequence 
and 75% for a set of related sequences. PREDATOR does not use multiple sequence 
alignment. Instead, it relies on careful pairwise local alignments of the sequences in 
the set with the query sequence to be predicted. 

World Wide Web URL http://www.embl- 
heidelberg.de/argos/predator/predatorJnfo.html is the entry point to the database. 

- H = helix, E = extended or sheet, _ = coil, T = transmembrane, B = beta 

- X indicates a low-complexity region vnth repeat-like structure v/iuch is 
omitted in all BLAST searches 

12. PROSITE Motifs 

PROSITE is a database of protein families and domains. It consists of biologically significant 
sites, patterns and profiles that help to reliably identify to which known protem family (if 
any) a new sequence belongs. World Wide Web URL http://www.expasy.ch/prosite/ is the 
entry point to the database. A description of the prosite consensus patterns is also provided, 
below. 

13. PFAM Motifs 

PFAM (protein families) is a large collection of multiple sequence alignments and hidden 
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Markov models covering many common protein domains. World Wide Web URL 
http://www.sanger.ac.ulc/Pfam/ is the entry point to the database. 


Deposit of Clones 

Clones were deposited as a pool vwth the American Type Culture Collection under 


polynucleotide is obtainable. Each clone has been transfected into separate bacterial cells (E. 
coli) in this composite deposit. 

The clones may also be obtained from the Resource Center of the German Human 
Genome Project (Heubner Weg 6, 14059 Beriin, GERMANY). The Resource Center library 
numbers are slightly different that those presented here, but may be readily obtained by the 
following key or with the assistance of Resource Center personnel. 

The library name becomes a number: brain (hfbr2) becomes 564; kidney (hfkd2) 
becomes 566; mammary carcinoma (hmcfl) becomes 727; testis (htes3) becomes 434;and 
uterus (hutel) becomes 586. Next, the plate number is converted to two digits (e.g., **2" 
becomes "02") and is moved behind the plate coordinate, and the underscore is dropped. The 
following examples are helpM: 

Listed Number Resource Center Number 


The libraries were constructed using two commercially available vectors. The brain 
(hfbr2 designations) and kidney (hfkd2 designations) libraries utilize pAMP 1 from Life 
Technologies and are maintained in XL-2Blue (Strategene); the uterus (hutel), testes (htes3) 
and mammary carcinoma (hmcfl) libraries are constructed in pSPORTl, also from Life 
Technologies, and are maintained in DHIOB (LifeTechnologies). In addition to the following 
techniques, consultation with the conunercial literature available on these clones will make 
evident all of the housekeeping techniques needed to propagate and isolate the individual 
constructs. All inserts may be excised with a Notl/Sall digestion. Alternatively, universal 
primers, flanking the cloning region, may be used to amplify the inserts using PCR methods. 


accession number 


, from which each clone comprising a particular 


DKFZphfbr2_16f21 

DKFZphfkd2_lj9 

DKFZphmcfl_lc23 

DKFZphtes3_14g5 

DKFZphutel_17k7 


DKFZp564F2116 

DKFZp566J091 

DKFZp727C231 

DKFZp434G0514 

DKFZp586K0717 
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Bacterial cells containing a particular clone can be obtained from the composite 
deposit as follows: 

An oligonucleotide probe or probes should be designed to the sequence that is known 
for that particular clone. This sequence can be derived from the sequences provided herein, 
or from a combination of those sequences. Methods of probe design are presented below. 

Oligonucleotide probes may be labeled with y-^^P ATP (specific activity 6000 
Ci/mmole) and T4 polynucleotide kinase usmg commonly employed techniques for labeling 
oligonucleotides. Other, non-radioactive labeling techniques can also be used. 
Unincorporated label typically is removed by gel filtration chromatography or other 
established methods. The amount of radioactivity incorporated into the probe can be 
quantified by measurement in a scintillation counter. Preferably, specific activity of the 
resulting probe generally should be approximately 4X10^ dmp/pmole. 

The bacterial culture containing the pool of full-length clones should preferably be 
thawed and 100 ^1 of the stock used to inoculate a sterile culture flask containing 25 ml of 
sterile L-broth containing ampicillin at 50- 100 \xg/ml (for XL-2Blue strains 25 \xg/ml 
tetracycline should also be used). The culture should preferably be grown to saturation at 
37°C., and the saturated culture should preferably be diluted in fresh L-broth. Aliquots of 
these dilutions should preferably be plated to determine the diliition and volume which will 
yield approximately 5000 distinct and well-separated colonies on solid bacteriological media 
containing L-broth containing ampicillin at 100 jig/ml (for XL-2Blue strains 25 ^g/ml 
tetracycline should also be used)and agar at 1.5% in a 150 mm petri dish when grown 
overnight at ST'C. Other known methods of obtaining distinct, well-separated colonies can 
also be employed. 

Standard colony hybridization procedures should then be used to transfer the colonies 
to nitrocellulose filters and lyse, denature and bake them. The filter is then preferably 
incubated at 65*^0. for 1 hour with gentle agitation in 6 x SSC (20 x stock is 1753 g 
NaCl/liter, 88.2 g Na citrate/liter, adjusted to pH 7.0 with NaOH) containing 0.5% SDS, 100 
Hg/ml of yeast RNA, and 1 0 mM EDTA (approximately 1 0 mL per 1 50 mm filter). 
Preferably, the probe is then added to the hybridization mix at a concentration greater than or 
equal to 1X10^ dpm/mL. The filter is then preferably incubated at 65''C. with gentle agitation 
overnight. The filter is then preferably washed in 500 mL of 2 x SSC/0.5% SDS at room 
temperature without agitation, preferably followed by 500 mL of 2 x SSC/0.1% SDS at room 
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temperature with gentle shaking for 15 minutes. A third wash with 0.1 x SSC/0.5% SDS at 
65°C. for 30 minutes to 1 hour is optional. The filter is then preferably dried and subjected to 
autoradiography for sufficient time to visualize the positives on the X-ray film. Other known 
hybridization methods can also be employed. 

The positive colonies are picked, grown in culture, and plasmid DNA isolated using 
standard procedures. The clones can then be verified by restriction analysis, hybridization 
analysis, or DNA sequencing. 

Alternatively, clones may be grown as described above, and PCR used to isolate the 
insert DNAs. Methods of PCR are described below and are otherwise well known . 

ERROR SOIEENING 

The DNA sequences found herein derive from individual clones, which are publicly 
available, as noted above. Thus, the skilled artisan will recognize that any specific sequence 
disclosed herein readily can be screened for errors by resequencing a particular fragment, in 
both directions {i.e., by sequencing both strands). Alternatively, error screening can be 
performed by amplifying and/or cloning any of the inventive DNAs, using for example RT- 
PCR, and sequencing the resulting anqplified product. In the event that there is a sequencmg 
error, reference should be made to the deposited clone as the correct sequence. 

USES AND BIOLOGICAL ACTIVITIES OF THE INVENTIVE MOLECULES 

The inventive molecules and their derivatives are susceptible to a wide variety of uses, 
based on functional and/or structural properties. The skilled worker will appreciate, based on 
the biological activities detailed below, and discussed with regard to the individual sequences 
disclosed below, that the inventive molecules will find usefulness in numerous therapeutic and 
diagnostic applications. 

The DNA molecules, especially the potassium salts thereof, can be used as fertilizer 
supplements due to their high nitrogen and phosphorus contents. Since the DNAs are of 
defined length, they are also useful in gel electrophoresis as molecular weight markers. Due 
to their similarity with known molecules, certain of the DNA molecules and their variants ami 
derivatives may be used in any number of different diagnostic procedures and therapeutic 
applications. They may also be used to make the encoded proteins. 
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The proteins themselves have many possible uses. They may be used as a nutritional 
supplement for humans, animals and even for laboratory use as, for example, medium for 
bacterial cultures. Moreover, since the proteins are of defined, known sizes, they may be 
used as molecular weight markers for gel electrophoresis and gel filtration. Because they are 
of defined sequences, they also have use in microsequencing and protein fingerprinting 
applications. 

Expression Profiling Applications 

Given their known tissue expression and functional associations, assemblages of the 
inventive proteins (or corresponding antibodies) and nucleic acids are particularly suited to 
expression profiling applications. Expression profiling generally entails constructing an array 
of indicators that signal the presence of a particular RNA or protein expi^ssion product. Such 
arrays can be used to evaluate, for example, pharmacological effectiveness and toxicity. In 
particular, expression profiles from such arrays can be generated from cells treated with 
known compounds, having known properties, and these profiles can be compared to profiles 
of unknowns to evahiate similarities and differences, which can be correlated with efficacy or 
toxicity. 

Additional uses of profiling include diagnosis, tracking development, and ascertauiing 
signaling and metabolic pathways. For examples of references describing profiling and its 
uses, see FanetaL, U.S. Patent 5.811.231 (1998); Seilhamerer a/., U.S. Patent 5,840,484 

(1998) : Rine et ai, U.S. Patent No. 5,777,888 (1998); WO 97/27317; WO 99/05323; WO 
99/09218; and WO 99/14369. For a device for implementing such techniques, see Lipshutz 
et al., U.S. Patent No. 5.856.174 (1999) and Anderson et al.. U.S. Patem No. 5.922.591 

(1999) . 

In one embodiment, a subset of the inventive DNAs will be arrayed on a substrate, 
like a gene chip, a filter or a 96-well plate. Test samples containing cells are maintained in 
the presence of a label capable of incorporation into nascent mRNA. Samples are treated with 
test and control compounds, which will induce mRNA expression in the sample, resulting in 
incorporation of label. Whole mRNA is isolated and applied to the array such that it 
hybridizes with the DNAs contained therein. After washing, the amount of hybridization is 
quantified and a profile is generated. These steps are repeated with various control and test 
compounds, thereby generating a library of profiles, which can be used to ascertain the 
relationships relevant to pharmacological efficacy or toxicity. 
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The matrices used in such profihng, however, need not be limited to those utilizing 
DNAs. Rather, other nucleic acids, like RNAs and protein nucleic acids (PNAs), as well as 
the inventive proteins and antibodies corresponding to the inventive proteins may also be 
employed. Hence, for example, antibodies could form the array and the samples could be 
treated in order to label nascent proteins. Whole proteins then would be isolated and applied 
to the antibody matrix. Developing the resulting signal would result in a protein expression 
profile, which is useful in essentially the same manner as the nucleic acid profile. A protein 
matrix could be used, for example, in evaluating antibody responses to pharmaceutical agents 
in order to eliminate possible cross-reactivity. 

Moreover, where nucleic acids are used in the matrix, it is often beneficial to use 
variants (as defined below) of the molecules described herein. This can be used to account 
for genetic variations that are of little or no consequence to the function of the resultant gene 
product. Hence, they can account for wobble or conservative amino acid variations that do 
not perturb fimction, like variations in some of the protein motifs elucidated below. Thus, 
each position in the matrix can employ multiple nucleic acid probes that account for a series 
of variants. 

Expression profiling may also be done, in another embodiment, using two- 
dimensional protein gels in which the inventive proteins are detected. The resultant profiles 
can be used in the same way as described. 

Matrices useful for profiling may be constructed based on different criteria. Of 
course, the more relevant profiles will take into account expression of most human genes, 
preferably all of them. In certain situations, however, it is advantageous to look at a smaller 
subset. For example, if one were concerned about fetal neural toxicity, a fetal brain-specific 
matrix might be chosen. On the other hand, if one were interested in targeting mammary 
carcinoma tissue, a corresponding matrix could be used. Thus, matrices may be constructed 
using all of the sequences available from a tissue-specific library. 

♦ * ♦ 

The following discussion relates to some of the various functional and structural 
groupings that would be of interest to the artisan wishing to construct profiling matrices. 
Of course, the artisan will also recognized that these functional descriptions may find 
additional applicability in the therapeutic and diagnostic applications discussed below. 
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Cell Cycle 

A proliferating cell must coordinate replication and chromosomal sq)aration to ensure 
that the genome is replicated completely, and that a single copy, is correctly inherited by each 
daughter cell. The cell cycle is the coordinated series of events that achieves these aims. 
Many of the key events are initiated by a family of conserved Seiren/threonine protein 
kinases, the cyclin-dependent kinases (CDKs), that are activated by the cyclin family of 
proteins (cyclins A-H). In turn, the cyclin-CDK complexes are modulated by other protein 
kinases or phosphatases, and by binding specific inhibitor proteins. The enormous variety of 
ways in which CDK activity can be regulated allows the cell to respond to internal signals 
generated by preceding events in the cell cycle and to external growth signals. 

The somatic cell cycle is divided into four phases: DNA replication (S phase) and 
chromosome separation (M phase) are separated by gap phases (Gl and G2). At specific 
control points the decision to begin the next stage (DNA synthesis or mitosis) is carefiilly 
regulated. 

Cdc2, the primary kinase, is especially required for the Gl-S transition and S phase. 
Cdc4 and Cdc6 are involved at the restriction point, where the cell can decide to proliferate or 
arrest (G1<->G0) and Cdc7 is a CDK activating kinase (CAK) as well as a subunit of TFIIH. 

The Cyclin-CDK complexes are regulated in various ways. One is through 
phosphorylation by CDK activating kinases (CAK), like the Y15 kinase (Weel) and 
dephosphorylation by CDK associated phosphatases (CAP), like Cdc25A a member of the 
Cdc25 family (Cdc25A, B and C). 

An other way of regulation occurs through two classes of CDK inhibitors (CKI), the 
INK4 proteins pl5, pl6, pl8, and pl9, who negatively regulates the cyclin D CDK 
complexes and second the p21 family with p21, p27, and p57. 

The cell cycle is also regulated through ubiquitin-mediated proteolysis involving the 
destruction of both cyclins and CDK inhibitors by the 26S proteasome, that requires an 
ubiquitin conjugating enzyme (UBC) and an ubiquitin ligase. The instabiUty is conferred by 
PEST regions (cyclin D and E) or a ten amino acid region in the amino terminus (degradation 
box) in the A- and B-type cyclins. 
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All these modifications play an important role for the cellular localization, because 
only the nuclear CDK-cyclin complexes are functional for cell cycle. During Gl phase of the 
cell cycle, cyclines A, E and D are synthesized and bind to their cyclin-dependent kinase 
(CDK) partners. CDK complexes containing cyclins A, E and Dl are then imported into and 
concentrated within nuclei. Cdk6- cyclin D3 has been localized to both cytoplasmic and 
nuclear compartments, although only the nuclear complex is active. As cells enter S phase, 
cyclin A and cyclin E complexes remain within the nucleus, whereas cyclin Dl relocalizes to 
the cytoplasm for proteolysis at the onset of S phase. Like Cdk2-cyclin A, Cdc2-cyclin A is 
nuclear and remains so until it is degraded during mitosis. By contrast, as a result of ongoing 
nuclear import and more rapid re-export, cyclin Bl, which binds to Cdc2 upon synthesis 
during S phase, is predominantly cytoplasmic. Cdc2-cyclin B2 is also cytoplasmic, although 
this might occur through anchoring of the complex to some cytoplasmic constituent. At 
prophase, phosphorylation of cyclin Bl promotes accumulation of Cdc2-cyclin Bl in the 
nucleus, whereas cyclin B2 remains in the cytoplasm until nuclear envelope breakdown. 

Two crucial regulators of Cdc2-cyclin B-Weel and Cdc25C exist and are responsible 
for the G2 to M control point. Weel is a nuclear protein throughout the cell cycle, whereas 
Cdc25C binds to 14-3-3 proteins during interphase and remains predominantly cytoplasmic. 
In some systems Cdc25C, like cyclin Bl, rushes precipitously into the nucleus just before 
entry into mitosis. 

The 1 10-kDa retinoblastoma (tumor suppressor) protein (RB), a pRB-family member 
is an important regulator of cell-cycle progression and differentiation. Like the E2F family 
(E2F1-5) or DP family (DP 1-3) of transcription activators, RB suppresses inappropriate 
proliferation by arresting cells in 01 by repressing the transcription of genes required for the 
transition into S phase. Before the cell proceeds into S phase, RB becomes phosphorylated at 
multiple sites by the cyclin dependent protein kinases (CDKs) and loses its transcriptional 
repressing activity. Phosphorylation of RB during late Gl phase results in the dissociation of 
the E2F-RB repressor complex which allows S-phase specific genes to be transcribed. Cyclin 
E is the evolutionary conserved target for E2F and interacts together with CDC2 in late Gl. 

For a proliferating cell it is vital that only imdamaged DNA is replicated because if 
DNA damage is substantial, its replication can lead to chromosome loss or rearrangement. 
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Thus, we find a G1<->S checkpoint in late Gl that requires tumor suppressor p53. A p53- 
dependent 01 arrest is effected by the cycHn dependent kinase inhibitor p21 through higher 
expression levels that inhibits almost all cyclin CDK complexes. 

The kinase responsible for phosphorylating the unidentified kinetochore component 
in metaphase may be a member of the MAP kinase family and appears to be the proto 
oncogene c-MOS, a cytostatic factor (CSF) in meiosis. 

Several categories of proteins are coded for by clones of the invention within the 
overall group of "Cell cycle^and include, among others, the following: 

Tumor suppressors fe.p. Tumour-suppressor genes are known to be involved in 
the control of cell growth and division, interacting with proteins which control the cell cycle. 
The N33 gene is significantly methylated in tumour cells, a mechanism by which tumor- 
suppressor genes are inactivated in cancer. The N33 gene has been reported by OMIN OMIN 
(Online Mendelian Inheritance in Man at httpr/M-ww.ncbi.nlm.nih.gov/htbin-post/Omin) to 
be associated (as potentially diagnostic, therapeutic, causative, and/or related, etc..) with the 
following diseases: 1) prostate cancer suppression (OMIN *601385). Clones in this category 
include: fbr2_2kl4. 

C-TAKl Cdc25c associated protein kinase: Cdc25C is a protein kinase that controls 
entry into mitosis by dephosphorylation of Cdc2. Cdc25C function is regulated by 
phosphorylation, too. Serine 216 phosphorylation of Cdc25C mediates the binding of 14-3-3 
protein to Cdc25C. C-TAKl (Cdc twenty-five C associated protein kmase) phosphorylates 
Cdc25C on serine 216 in vitro. Alterations in the gene coding for the above protein kinase 
has been reported by OMIN to be associated (as potentially diagnostic, therapeutic, causative, 
and/or related, etc..) with Pancreatic cancer (OMIN ♦60278). Clones in this category 
include: tes3_7j3. 

Cell structure and motility 

One of the major differences between prokaryotes and eukaryotes is the ability of the 
eukaryotic cell to adopt very different shapes dependent on its fimction during the 
differentiation process. Animal cells vary firom being round to extended cylindric forms like 
motomeurons or muscle cells. In humans, more than 100 different cell types can be 
distinguished, each having a characteristic shape. The form of a cell often is closely related to 
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its capacity to move. Some completely differentiated cells like fibroblasts can still change 
their form actively, thereby migrating. Other cell types serve as motor elements - 
"macroscopically" like muscle cells or "microscopically" like ciliated epithelia. Such tasks 
are fulfilled by a big class of proteins; on the one hand responsible for maintenance of cell 
structure and contacting neighbor cells or the intercellular matrix and on the other hand for 
cell motility. These topics cannot be regarded separately: The motility apparatus e.g. must be 
fixed in the cytoskeleton. Three different types of filaments can be distinguished: Actin 
filaments, tubulin filaments and intermediate filaments, each present in almost all types of 
cells. 

Actin filaments (F-actin) are built up of monomers (G-Actin). In muscle cells, actin, 
myosin, for both of which several paralogous genes are known, as well as many more 
proteins are constituents of the contractile apparatus. 

The "thin" and "thick filaments" in a muscle cell consist mainly of actin and myosin, 
respectively. 

Several different proteins are responsible for the anchoring of the actin filaments in 
the Z-disks (e.g. alpha-actinin and desmin) or at the end of the myofibers in the cell 
membrane. 

Troponin I, -C, -T and Tropomyosin - associated with actin - confer the Ca++- 
dependent triggering of contraction. 

Length of the sarcomere is controlled by the giant protein titin. 

In smooth muscle, there is no troponin. Contraction activity is controlled by 
phosphorylation / dephosphoiylation of myosin by a specialized kinase instead. Contractile 
fibers are not organized in sarcomeres. 

Apart from contributing to muscle contraction, the actomyosin system is responsible 
for many other motions at cellular level, e.g. the amoeboid movement of pseudopodia or the 
fission of cells at the end of mitosis by a contractile ring. 

Besides this, actin fibers fulfill structural tasks like maintenance of the shape of 
stereocilia or microvilli. Here, actin filaments are connected by proteins like fimbrin. But not 
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only specialized structures like the mentioned ones contain actin fibers. There is a network 
covering the complete cell volume with F-actin as a major constituent. Whereas the actin 
filaments in the structures mentioned above are relatively stable, diis F-actin is highly 
dynamic. Management of the network structure and turnover is achieved by connecting 
proteins like alpha-actinin. funbrin or fill-in; turnover is regulated by gelsolin, villin, and 
different capping- and fiagmoitation-ptoteins. 

Microtubules are built up of alpha-beta tubulin heterodimers. Turnover of filaments is 
achieved by building-in and releasing of monomers with different time constant rates at both 
ends. The resulting cycle is called "treadmilling". Thirteen strings of tubulin duplets build up 
one subfiber, whereas one fiber contains two or three of those. A complete axoneme consists 
of 9 radial and 2 central fibers. This "9+2" - structure is the basis both of flageUa, their basal 
bodies and centrioles. In flagella, several additional structures like radial elements exist 
Nexin connects the fibers and dyneine is the motor ATPase which shifts the fibers relative to 
each other. Several genetic diseases like the Caitageneric syndrome are caused by 
deficiencies of distinct proteins In cilia. 

Besides this, microtubules are abundant in all types of cells. They are part of a 
delivery system for organelles, e.g. in the golgi apparatus. A further very important system 
based on microtubules is the mitotic spindle, it is organized by the centrosomes. Besides 
many other components, the major part of a centrosome are two centrioles which are built up 
of nine microtubulc-triplets. Most remarkably, new centrioles arc not synthesized de novo but 
generated by duplication of old ones. 

Cytoplasmic microtubules are associated with many different proteins. Two major 
classes are known: The MAPs C'microtubule-associated proteins", with molecular masses 
between 200 and 300 kD) and the much smaller tau-Proteins with a MW between 60 and 70 
kD. These proteins regulate the treadmill-process and the interaction with other structures in 
the cell. 

Besides actin and myosin the so-called intermediate filaments constitute a third class 
of filaments. In contrast to the former two groups, they do not participate in motility, nor are 
they dynamic structures subject to a vivid turnover. The most important ones are 
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neurofilaments (in neurons), keratin filaments (mainly in epithelial cells), and vimentin 
filaments (in many sorts different cell types). 


The biological function of both the cytoskeleton as well as contractile apparatus of a 
cell does not end at the cell membrane. Cells must be embedded in the extracellular matrix, 
all cells of a muscle must act as one single mechanical unit and epithelia must resist 
macroscopic mechanical forces. Hence, cell adhesion and the extracellular matrix are closely 
connected to the cytoskeleton. Vincullin is one of the proteins which serve as an anchor for 
intracellular fibers (actin). Different types of desmosomes and tight junctions connect 
neighbor cells with intercellular fibers. On the inside, cytoplasmic plaques connect them to 
the cytoskeleton. These structures, on the one hand, serve as mechanical elements whereas 
gq? junctions, on the other hand, connect cells metabolicaily. 

The extracellular matrix consists of a network of proteins, glycoproteins and 
polysaccharides. Different proteins are present in relation to different mechaniced demands:. 
Elastin is found in tissues with high elasticity (lungs, heart) whereas collagen, a more hard- 
wearing protein, is found in tendons and ligaments. Fibronectin is an extracellular protein 
highly important for cell adhesion. 

Reference: Murray J et al (1992): Cell Motil Cytoskeleton 22: 211-223. 

Within the overall group of Cell Structure and Motility several categories of proteins 
are coded for by clones of the invention: 

Collagen alpha chain proteins : Proteins with the typical (xxG)n repeat of collagen 
proteins and Pfam von Willebrand factor type A domain(s) suggest they are collagen alpha 
chains. These proteins can find application in modulation of connective tissue, bone and 
cartilage development and maintainance, OMIN reports collagen alpha chains have 
associations (as potentially diagnostic, therapeutic, causative, and/or related, etc..) with the 
foUov^ng diseases: 1) Osteogenesis imperfecta, type I (OMIN #166200); 2) Osteogenesis 
imperfecta congenita (OMIN #166210); 3) Alport Syndrome, X-linked (OMIN #301050); 4) 
Thrombastenia of Glanzmann and Naegeli (OMIN ^273800); 5) Ehlers-Danlos Syndrome, 
Type VII (OMIN #130060); 6) Marfan Syndrome (OMIN #154700); 7) Alport Syndrome, 
Autosomal Recessive (OMIN #203780); 8) Alpha-2-Deficient Collagen Disease (OMIN 
203760); 9) Goodpasture Syndrome (Omin 233450); 10) Osteogenesis Imperfecta, 
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progressively deforming, with normal sclerae (OMIN #259420); 11)) Ehlers-Danlos 
Syndrome, Type VII Autosomal Recessive (OMIN *225410); and 12) ) Osteogenesis 
imperfecta. Type IV (OMIN #166220). OMIN reports that von Willebrand factor type A 
domains have associations (as potentially diagnostic, therapeutic, causative, and/or related, 
etc.. .) with the following diseases:: 1) Hemophilia A (OMIN ♦306700); 2) Von Willebrand 
Disease (OMIN ♦193400); 3) Giant Platelet Syndrome (OMIN •231200); 4) Thrombastenia 
of Glanzmann and Naegeli (OMIN *273800); 5) Congenital Thrombotic Diseasae due to 
protein C deficiency (OMIN #176860); 6) Polycystic Kidney Disease 1 (OMIN ♦601313); 7) 
Nephrogenic Diabetes Insipidus (OMIN ♦304800); 8) Factor V Deficiency (OMIN ^227400); 
and 9) Dentatoiubral-Pallidoluysian Atrophy (Omin ♦ 125370). Clones in this category 
include: fbr2_2b5. 

Radial spokehead protein: Radial spokehead proteins, e.g., Chlamydomonas 
reinhardtii radial spokehead protein of flagella or axoneme and the Strongylocentrotus 
putpuratus sea urchin spermatozoa protein p63, and human proteins with similarity thereto 
are important for the maintenance of a planar form of sperm flagellar beating. The human 
protein(s) can find application in modulating the structure of the human spemiatozoa radial 
spoke head and modulation of sperm motility in men (e.g., in sterility). Clones in this 
category include: tes3_15i5. 

Ankyrins : Ankyrins are peripheral membrane proteins which interconnect integral 
proteins with the spectrin-based membrane skeletoa Thus these proteins are involved in 
coupling of cyto skeleton and ceU membrane. OMIN reports that Ankyrins have associations 
(as potentially diagnostic, tiierapeutic, causative, and/or related, etc.. .) with the following 
diseases: 1) Heriditary Spherocytosis (OMIN ♦182900); 2) Hemolytic Poikilocytic Anemia 
due to reduced ankyrin binding sites (OMIN 141700); 3) Atypical Elliptocytosis (OMIN 
225450); 4) Autosomal recessive spherocystosis (OMIN #270970); 5) Wemer Syndrome 
(OMIN ^277700); and 6) Rhesus-unlinked type Elliptocytosis (OMIN #130600). Clones in 
this category include: tes3_1817. 

FGDl-related F-actin binding protein rFarhin/FrTHI) - FGDl -related F-actin-binding 
protein (Farbin/FGDl) is a novel F-actin-binding protein. The gene locus fgdl seems to be 
responsible for faciogenital dysplasia or Aarskog-Scott syndrome. (OMIN 305400). Frabin 
binds F-actin and shows F-actin-cross^linking activity. Overexpression of fi-abin in Swiss 3T3 
cells and C0S7 cells induces cell shape change and c-Jun N-teiminal kinase activation, as 
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described for FGDl . Because FGDl has been shown to serve as a GDP/GTP exchange 
protein for Cdc42 small G protein, it is likely that frabin is a direct linker between Cdc42 and 
the actin cytoskeleton. Cdc42p is an esin yeast, Cdc42p transduces signals to the actin 
cytoskeleton to initiate and maintain polarized growth and to mitogen-activated protein 
morphogenesis. In mammalian cells, Cdc42p regulates a variety of actin-dependent events 
and induces the JNK/SAPK protein kinase cascade, which leads to the activation of 
transcription factors within the nucleus. Clones in this category include: tes3_72kl5. 

Paramvosins : Paramyosin is a major structural component of thick filaments and 
invertebrate muscle. Paramyosins are promising antigens for immunization against several 
parasites, such as Schistosoma mansoni. Clones in this category include: tes3_7b22. 

Tuftelin : Tuftelin/enamelin are matrix proteins of the teeth. As other proteins involved 
in calcification, these proteins are also expressed in the uterus matrix. The new protein can 
find application in modulation of tissue-calcification, especially the uterus. As reported by 
OMIN, tuftelin has been associated (as potentially diagnostic, therapeutic, causative, and/or 
related, etc..) with amelogenesis imperfecta (OMIN *600087). Clones in this category 
include: utel 19g22. 

Cell Adhesion Regulator (CARD : CARl is involved in the regulation of cell-cell 
adhesion. OMIN reports the association (as potentially diagnostic, therapeutic, causative, 
and/or related, etc. . .) of CARl with tumor suppression by the reduction of tumor invasion 
(OMIN ♦116935), Clones in this category include: utel_24j6. 

Differentiation/Development 

Almost every multicellular organism originates from meiotic cell divisions and the 
recombination of a paternal and a maternal set of chromosomes. After fertilization of the egg, 
all cells of a body originate from this one cell. Thus the cells of the developing body are 
initially genetically alike. But phenotypically they become very different. They are 
specialized to a certain cell type and arranged in an organized pattern to a certain type of 
tissue and the whole structure has the well-defined shape of an organ. All these features are 
determined by the DNA sequence of the genome, which is reproduced in every cell. Each cell 
acts on the genetic instructions given to a certain time and at a certain place of development 
and plays its individual part in the multicellular organism. Cell differentiation may be divided 
into three general steps: cell cycle exit, apoptosis protection and tissue specific gene 
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expression. These processes are coordinated to provide the final and unique tissue 
characteristics. 

An animal cell that has achieved a certain level of development is said to be 
determined. This differentiation of a cell may be irreversible and in that case the cell may be 
renewed only by simple duplication. Other cells are renewed by means of stem cells which 
are immortal ( e.g. stem cells of the bone marrow, epidermal stem cells). The genetic control 
of development is extensively studied in non-vertebrates and vertebrates. The classical animal 
model is the fruit fly Drosophilia and the modem model is the transgenic mouse. Animal 
transgenesis has proven to be useful for physiological as well as physiopathological studies. 
Besides the approach based on the random integration of a DNA construct in the mouse 
genome, gene targeting can be achieved using totipotent embryonic stem cells for targeted 
transgenesis. Transgenic mice are than derived from the embryonic stem cells. This allows 
the introduction of null mutations in the genome (so-called knock-out) or the control of the 
transgene expression by the endogeneous regulatory sequence of the gene of interest (so- 
called knock-in). Mice can be created that express wild-type genes, mutant genes, marker 
genes or cell lethal genes in a tissue specific manner. These animal models allow to follow 
changes in tissue and organ development and lead to a better understanding of the cellular 
ftinction of many genes or to the generation of animal models for human diseases. 
Fundamental problems in immunology, onset and development of cancer, regulation in fatty 
acid metabolism, aspects of cardiovascular function, control of the central nervous system 
development, analysis of reproductive development and function are only some examples of 
research interests. 

The fmal stage of cell differentiation is growth arrest. In animal tissues with rapid cell 
turnover terminally differentiated cells undergo programmed cell death. The cells have the 
abiUty to kill themselves by activating an intrinsic cell suicide program when they are no 
longer needed or have become seriously damaged. The execution of this program is termed 
apoptosis. Apoptosis is of importance for development and homeostasis of animals. The key 
components of this program have been conserved in evolution from worms (C. elegans) to 
insects (Drosophilia) to humans. The roles of apoptosis include the sculpting of structures 
during development, deletion of unneeded cells and tissues, regulation of growth and cell 
number, and the elimination of abnormal and potentially dangerous cells. In tfiis way 
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apoptosis provides "quality control mechanism" that limits the accumulation of harmful cells, 
such as virus-infected cells and tumor cells. On the other hand inappropriate apoptosis is 
associated with a wide variety of diseases, including AIDS, neuro-degenerative disorders and 
ischemic stroke. Because it is now clear that apoptosis is a result of an active, gene-directed 
process, it should be eventually possible to manipulate this form of cell death by developing 
drugs that interact with its recently identified mechanisms of action. Inducers of cell 
differentiation, cell cycle arrest and q)optosis might be the novel molecular targets for new 
anticancer agents in addition to the signaling pathways for growth factors and cytokines. 

Proteins, factors, receptors and genes of importance in apoptosis : 

Proteases: 

- Calpain, an intracellular cysteine protease, exact role unknown. 

- Caspase-1 to Caspase-1 1, a family of proteases synthesized as an inactive 
proenzyme. Targets of the activated enzymes include: poly(ADP-ribose) polymerase, DNA- 
dependent protein kinase, Ul ribonucleoprotein, nuclear laminins and cytoskeleton 
components (actin). 

- Granzyme B, a serine protease released by cytotoxic T-cells. 
Receptors: 

- CD 95 (synonyms: Fas, APO-1), a receptor protein of the TNF-receptor family 
which includes TNF-Rl and TNF-R2 with the common characteristic of a 70 amino acid 
cytoplasmic domain. 

- FADD (synonym: MORT-i), a cytoplasmic protein 

- DR-3 (synonym: APO-3) a member of the TNF-receptor-family 

- DR-4 and DR-5 
Genes: 
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- ced-3, ced-4 and ced-9 encode the general apoptotic and antiapoptotic program in 
Caenorhabditis elegans. Apaf-3 is the mammalian homologue of ced-3. 

- Bcl-2 / Bcl-xL / Bax / Bcl-xS / Bak: a large gene family that can either inhibit or 
promote spjptosis. 

- Cytokine response modifier A, a cowpox virus gene whose gene product inhibits 
caspases. 

Others; 

- Caspase-activated DNase (CAD) and its inhibitor aCAD), causes DNA 
fragmentation in the nucleus 

- Ceramide, a complex lipid that acts as a second messenger. 

- c-Jun N-terminal kinase (JNK) is a proline-directed kinase 

- p53 protein, is essential for the induction of apoptosis as a response to chromosomal 
damage. 

- RAIDD, a death signal-transducing protein. 

- Receptor interacting protein (RIP) is an accessory protein with a death domain and a 
serine/threonine kinase activity. 

- Sphingomyelinase, an enzyme that hydrolyzes the complex lipid sphingomyelin to 
ceramide. 

- Tumor necrosis fector (TNF) is a type -II membrane protein 

- TNF-receptor associated factor (TRAF2), is an accessory protein that can bind to 
both TNF-Rl and TNF-R2. 

Within the overall group of Differentiation/Development, several categories of 
proteins are coded for by clones of the invention: 

Interleukins (e.g. Interlgul^ip-T) : Interleukin precursors related to mterleukin-7, for 
example, are expected to act as new growth factors for human B lineage cells. Additionally, 
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these proteins should induce the gene rearrangement of the T-cell receptor repertoire, leading 
to thymocyte commitment, and subsequently induce both cytotoxic T-cell- and lymphocyte- 
activated killer cells These interleukins could find clinical application in a variety of 
conditions of hematolymphopoietic failure and different tumours, because of its recruitment 
of B cell lineage cells, cytotoxic T-cell- and lymphocyte-activated killer cells. (OMIN 
* 146660). Clones in this category include: tes3_35e21. 

Testis-specific Y-encoded proteins : The TSP Y genes are arranged in clusters on the Y 
chromosome of many mammalian species. TSPY is believed to function in early 
spermatogenesis and is a candidate for GB Y, the putative gonadoblastoma-inducing gene on 
the Y. Proteins of the TSPY-SET-NAPILI family represent proteins closely related to 
TSPY. These proteins seem to be involved in early spemiatogenesis. Clones in this category 
include: fbr2__2dl5. 

Intracellular transport and trafficidng 

Eukaiyotic cells rely for their viability on the partitioning of many basic cellular 
processes into membrane-bounded organelles. These are the nucleus, endoplasmic reticulum 
(ER), Golgi apparatus, endosomes, lysosomal compartments, mitochondria and peroxisomes. 
Most molecules destined for the lysosome, cell surface and outside the cell are routed through 
the ER and Golgi, which together with the vesicular intermediates between them, comprise 
the secretory pathway (Palade 1975). In the ER and Golgi compartments proteins are sorted, 
modified and ofien assembled into complexes en route to their final destination. Incorrectly 
assembled proteins are retained in the ER until they fold correctly or are targeted for 
degradation. Additional proteins are translocated into and function within the lumenal spaces 
of organelles or are secreted. Thus a large proportion of proteins synthesized require targeting 
to membranes either for insertion into or transport across them. A major purpose of this is 
growth. The secretory pathway is dependent on an intact cytoskeleton and also closely linked 
to general metabolism by affecting ribosome biogenesis (Mizuta and Wamer, 1994). A huge 
number of proteins is required for targeting, translocation and sorting of newly synthesized 
proteins. 

The first step in sorting is the recognition of cis-acting targeting or signal sequences 
that organelle-targeted proteins contain. This is carried out by cytosolic targeting factors 
and/or receptors on the membrane to which the protein is targeted. In some cases the primary 
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sequences are extremely degenerate, with only the overall character being conserved 
(hydrophobicity for an ER signal sequence, helical amphiphilicity for mitochondrial targeting 
sequence (Kaiser et al., 1987; Lemire et al, 1989). Following the targeting step, proteins are 
either inserted into or transported across the membrane (translocated) through a proteinaceous 
apparatus (termed the translocon). The translocon include or recruit motors to drive the 
translocation process in the correct direction (Schatz and Dobberstein, 1996). 
Defined intracellular protein transport steps: 

•ER 

- targeting to the ER 

- translocation into the lumen of the ER, and, depending on the presence of 
certain signals in the peptide sequence transport through the golgi complex 

• Mitochondria 

- targeting 

- translocation 

• Peroxisomes 

• The general secretory pathway 

- protein modification, assembly and quality control in the ER 

- vesicle-mediated trafficking 

- vesicle docking and fusion 

- transport through the golgi apparatus and sorting at the trans-golgi 

- transport to the cell surface 

- transport routes to the lysosome 

• Endocytosis 

• Specialized protein transport routes 

• Protein export from the cytoplasm 

References: Palade, G (1975) Science 189:347-358; Mizuta et al. (1994) Mol Cell 
Biol 14: 2493^2502; Kaiser a/. (1987) Science 235: 312-317; Lemire etaL (1989) JBiol 
Chem264: 20206-20215; Schatzet al. (1996) Science 271: 1519-1526. 

Rab proteins 

In eukaryotic cells the compartmentalisation of processes is a prerequisite for a tight 
regulation of processes and activities. The cells contain a highly dynamic set of membrane 
compartments that are responsible for packaging, sorting, secreting, and recycling proteins 
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and other molecules. Trafficking between organelles within the secretory pathway occurs as 
vesicles derived from a donor compartment fuse with specific acceptor membranes, resulting 
in the directional transfer of cargo molecules. This process is tightly controlled by the 
RabA'pt family of proteins (reviewed by Novick and Zerial, 1997 ), a branch of the 
superfamily of small GTPases. Rab proteins regulate a variety of functions, including vesicle 
translocation and docking at specific fusion sites. Rabs may also play critical roles in higher 
order processes such as modulating the levels of neurotransmitter release in neurons, a likely 
mechanism in synaptic plasticity that underlies learning and memory (Geppert and Siidhof, 
1998). 

Small GTPases share a common three-dimensional fold that, in the GTP bound state, 
can bind a variety of downstream effector proteins. GTP hydrolysis leads to a conformational 
change in the "switch" regions that renders the GTPase unrecognizable to its effectors. In this 
way, by localizing and activating a select set of effectors, a common structural motif is used 
to control a wide array of distinct cellular processes. 

The final steps in membrane fusion are likely to be driven by a set of proteins known 
as SNAREs. After a vesicle becomes docked, the cytoplasmic domains of VAMP (also 
termed synaptobrevin) and syntaxin on opposing membranes, in combination with a SNAP- 
25 molecule, coalesce into an elongated -helical bundle (Poirier et al., 1998 ; Sutton et al., 
1998 ), which may lead to fusion. Because numerous SNARE isoforms have been identified 
that localize to distinct membrane compartments, it was originally proposed that the 
specificity of interaction between the SNARE proteins accounted for the specificity in 
membrane trafficking. Recent results, however, suggest that SNAREs are not specific in their 
ability to form complexes in vitro, suggesting that trafficking specificity requires additional 
factors (Yang et al., 1999 ). In this regard, Rab proteins are strong candidates for governing 
the specificity of vesicle trafficking. Like the SNAREs, many isoforms (40) of the Rab family 
have been identified that localize to specific membrane compartments (reviewed by Novick 
and Zerial, 1997 ). 

Concomitant with the SNARE cycle, Rab proteins undergo a intricate cycle of 
membrane and protein interactions. Rabs are posttranslationally modified at C-teraiinal 
cysteines by the addition of two geranylgeranyl groups, which mediate membrane association 
when the Rab is in the GTP-bound state. After guanine nucleotide hydrolysis occurs, the Rab 
is extracted from the membrane upon forming a complex with a cytosolic GDP-dissociation 
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inhibitor (GDI). This cytosolic intermediate is then recycled onto a newly forming vesicle, 
most likely through a secondary factor termed a GDI dissociation factor (GDF), which 
displaces GDI. After the Rab becomes membrane bound, a guanidine nucleotide exchange 
factor (GEF) promotes release of GDP and the subsequent loading of GTP. In its GTP-bound 
conformation, the Rab is then free to associate with its specific set of effectors, which can in 
turn trigger events leading to the eventual fusion of the vesicle with a target membrane. To 
complete the cycle, perhaps after or concurrent with membrane fiision, a GTPase activating 
protein (GAP) accelerates nucleotide hydrolysis, switching off the GTPase. The remaining 
GDP-bound Rab can then participate in a new round of fusion. 

Rab interactions with effectors are likely to regulate vesicle targeting and membrane 
fiision in three ways. First, a Rab may specifically facilitate vectorial vesicle transport. 
Vesicles are transported fi-om their site of origin to acceptor compartments likely through 
associations with cytoskeletal elements and transport motors. A protein has been identified 
with a domain stnicture that suggests a connection between the cytoskeleton and the Rabs. 
This protein, called Rabkinesin-6, contains a kinesin-like ATPase motor domain followed by 
a coiled-coil stalk region and a RBD that specifically binds Rab6 (Echard et al., 1998 ). An 
additional link with the cytoskeleton is provided by the Rab effector, Rabphilin-3A. 
RabphilinOA has been shown in vitro to interact with -actinin, an actin-bimdling protein, but 
only when not bound to Rab3A (Kato et al., 1996 ). These results raise the intriguing 
possibility that Rab proteins regulate vesicle interactions with the cytoskeleton and thereby 
play an active role in targeting vesicles to their appropriate destinations. 

Second, Rab proteins may regulate membrane trafRcking at the vesicle docking step. 
A number of Rab effectors, including Rabaptin-5. EEAl. Rabphilin-3A, and Rim, may serve 
as molecular tetheis. Each effector protein contains a RBD, followed by a linker region (some 
having the potential to form elongated coiled-coil structures), and a domain capable of 
interacting with a second Rab or the target membiane. Rabaptin-5, for example, contains two 
RBDs, one near the N tenninus that specifically recognizes Rab4 and a second near the C 
terminus that binds Rab5 (Vitale et al., 1998 ). Both Rim, which is localized to the target 
membrane, and Rabphilin-3A, which is localized to the vesicle, contain N-terminal RBDs and 
C-terminal Ca2+-binding C2 domains, implicating these effectors in synaptic vesicle 
localization or docking in response to Ca2+ influx (Wang et al., 1997 ). Tethering effectors 
may also recognize protein complexes on the acceptor membrane. Sec4p, a yeast Rab3A 
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homolog, interacts with the exocyst (Guo et al., 1999 ), a complex of seven or more subunits 
that is assembled at sites of vesicle fusion along the plasma membrane. The exocyst complex 
may therefore function as a landmark for Rab/effector-mediated vesicle docking. 

Third, once a vesicle has become tethered to its fusion site, Rab proteins may 
selectively activate the SNARE fusion machinery. The mechanism of this activation is 
imknown but may involve direct interactions of Rabs or, more likely, their effectors with 
SNAREs. For example, Hrs-2 is a protein that binds to SNAP-25 and contains a Zn2+-finger 
motif characteristic of Rab-binding proteins such as Rabphilin-3A, Rim, EEAl, and Noc2, 
suggesting that Hrs-2 may form a physical link between Rabs and SNAREs (Bean et al., 
1997). In addition, certain mutations in the syntaxin-binding protein Slylp, the Seclp 
homolog utilized in ER to Golgi trafficking, eliminate the requirement for Yptlp, a Rab 
protein that functions at this trafficking step (Dascher et al., 1991 ). Rabs may therefore 
regulate SNARE associations through Seel family members. In support of this idea, a Rab 
effector was recently found to interact with a vacuole Rab, a Seclp homolog, and a SNARE 
protein (Peterson et al., 1999 ), which suggests that this effector serves to connect Rab and 
SNARE function. In this way, Rabs and their effectors may facilitate the correct pairing of 
SNAREs. 

References: Dascher et al. (1991) Mol. Cell. Biol. 11, 872-885; Echard et al. (1998). 
Science. 279, 580-585; Geppert et al. (1998) Annu. Rev. Neurosci. 21, 75-95; Guo et al. 
(1999). EMBO J. 18, 1071-1080; Kato et al. (1996) J. Biol. Chem. 271, 31775-31778; 
Novick et al. (1997) Curt. Opin. Cell Biol. 9, 496-504; Peterson (1999) Curr. Biol. 9, 159- 
162; Poirier et al. (1998) Nat Struct. Biol. 5, 765-769; Vitale et al. (1998) EMBO J. 17, 
1941-1951; Wang et al. (1997) Nature. 388, 593-598; Yang et al. (1999) J. Biol. Chem. 274, 
5649-5653. 

Within the overall group of Intracellular Transport and Trafficking several categories 
of proteins are coded for by clones of the invention. 
Rab proteins : 

RablB is essential for the intracellular transport of nascent low density lipoprotein 
(LDL) receptor. It is discussed as a universal mediator of endoplasmatic reticulum to Golgi 
transport of membrane glycoproteins in mammalian cells. . Clones in this category include: 
fbr2^2il7,fbr2_3bl6. 
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RablO appear concentrated on membranes in the perinuclear region. Rab 10 has been 
associated (as potentially diagnostic, therapeutic, causative, and/or related, etc..) with the 
following diseases as reported by OMIN: 1) Choroideremia (OMIN *303199); and 2)RETT 
Syndrome (OMIN 312750). Clones in this category include: fbr2_62119. 

In mice, Rab 17 shows epithelial cell specificity. Rab 17 is discussed as candidate gene 
for the mouse mutations In (leaden), Tw (twirler), and ax (ataxia). Cloned from a brain cDNA 
library, the new putative Rab-protein is expected to be involved in vesicle trafficking within 
neuronal cells. These proteins can find application in modulating the transport of vesicles 
inside neuronal cells, which are essential for development of functional dendritic processes. . . 
Clones in this category include: fbr2_41ml5. 

Ankvrin G: The ankyrin 3 gene encodes a novel ankyrin, which is expressed in 
multiple tissues, with very high expression at the axonal initial segment and nodes of Ranvier 
of neurons in the central and peripheral nervous systems. Ankyrin G shows several tissue- 
specific alternative mRNA processing. The different ankyrin G proteins participate in 
maintenance/targeting of ion channels and cell adhesion molecules to nodes of Ranvier and 
axonal initial segments. Ankyrin G has been associated (as potentially diagnostic, 
therapeutic, causative, and/or related, etc..) with Werner disease (OMIN *277700). Clones 
in this category include: fkd2_24p5. 

Zn-T-transporters: The Zn-T-transporters are membrane proteins that facilitates 
sequestration of zinc in endosomal vesicles. In the brain, ZnT-3 mRNA seems to be involved 
in the accumulation of zinc in synaptic vesicles. Zinc (Zn) is an essential element in noimal 
development and metabolism. Recent studies show that in Alzheimer's disease, Zn functions 
as a double-edged sword, affording protection against Alzheimer's amyloid beta peptide (the 
major component of senile plaques) at low concentrations and enhancing toxicity at high 
concentrations by accelerated aggregation of the amyloid beta peptide. These proteins can 
fmd appUcation in modulation of Zinc transport in neuronal cells, thus providing means for a 
modulation of Alzheimer's amyloid beta peptide plaque formation. (OMIN *602878, 
♦602095). Clones in this category include: fbr2_62fl 0. 

Metabolism 

This group includes proteins which are involved in the uptake and consumption of 
nutrients, and enzymes which are part of the biochemical pathways for energy metabolism or 
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which are involved in the supply of building blocks of nucleic acids, proteins (NTPs, dNTPs, 
amino acids) for DNA/RNA and protein synthesis, and fatty acids (membranes), to allow for 
the generation of higher order structures. This group constitutes the most important and 
largest group in prokaryotes and lower eukaryotes. The higher the evolutionary level of an 
organism is, however, the more other protein classes like 'signal transduction', *cell cycle' 
and 'differentiation and development' increase in importance and number of representatives. 

Proteins involved in the metabolism of energy and compounds (here: other than 
nucleic acids or proteins) are usually the products of house keeping genes, they are often 
constitutively and/or ubiquitously expressed. 

Several categories of proteins are coded for by clones of the invention within the 
overall group of Metabolism: 

NATL ARDl : In yeast, ARDl and NATl , are required for the expression of an N- 
terminal protein acetyltransferase 1 . NATl controls full repression of the silent mating type 
locus HML, spomlation and entry into GO. ARDl is involved in the assembly of the NAT 1- 
complex. These can find application modulating NAT assembly and action and therefore 
could be important in metabolism of drugs and environmental mutagens.(OMIN ♦108345). 
Clones in this category include: ft)r2 3g8. 

Apolipoprotein E receptor : In LDL-receptors the class A domains form the binding 
site for LDL and calcium. The acidic residues between the fourth and sixth cysteines are 
important for high-affinity binding of positively charged sequences in LDLR's ligands. These 
proteins can find application in modulation of cholesterol binding and transport by LDL- 
receptors and LDL-binding proteins. In normal individuals, chylomicion remnants and very 
low density lipoprotein (VLDL) remnants are rapidly removed fi-om the circulation by 
receptor-mediated endocytosis in the liver. In familial dysbetalipoproteinemia, or type III 
hyperlipoproteinemia (HLP III), increased plasma cholesterol and triglycerides are the 
consequence of impaired clearance of chylomicron and VLDL renmants because of a defect 
in apolipoprotein E. Accumulation of the renmants can result in xanthomatosis and premature 
coronary and/or peripheral vascular disease. OMIN reports that apolipoprotein has 
associations (as potentially diagnostic, therapeutic, causative, and/or related, etc. . .) with the 
following diseases: 1) Familial hypercholesterolemia (OMIN 143890); 2) Familial combined 
hyperlipidemia (OMIN 144250); and 3) Alzheimer disease. (OMIN #104300), Clones in this 
category include: fbr2_62017. 
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Ubiquitin carboxvl-terminal hydrolases: Ubiquitin carboxyl-tenninal hydrolases (EC 
3.1.2.15) (UCH) (deubiquitinating enzymes) are thiol proteases that recognize and hydrolyze 
the peptide bond at the C-terminal glycine of ubiquitin. These enzymes are involved in the 
processing of poly-ubiquitin precursors as well as that of ubiquinated proteins. OMIN reports 
that Ubiquitin-specific proteases have associations (as potentially diagnostic, thenqwutic, 
causative, and/or related, etc. . .) with the following diseases: 1) Lung carcinoma (OMIN 
•603486); 2) x-Unked retinal diseases (OMIN *30O050); 3) oncogenesis (OMIN *30005O);4) 
ovarian cancer (OMIN *300050). Clones in this category include: fbr2_78k24; htes3_27dl . 

Phosphoserine signature rphosphogluc omutases, phosDhomannnmiitfl<!p) - These 
proteins take part in the conversion of hexose phosphates. OMIN reports that these proteins 
have associations (as potentially diagnostic, therapeutic, causative, and/or related, etc..) with 
the following disease: Fanconi-Bickel Syndrome (OMIN #227810). Clones in this category 
include: fkd2 24bl5. 


NADH ubiquinone oxi doreductase: NADH:ubiquinone oxidoreductase is the first 
enzyme in the respiratory electron transport chain of mitochondria. It is a a membrane-bound 
multi-subunit protein. The bovine heart enzyme contains about 40 diflfaent polypeptides. 
OMIN reports that these proteins have associations (as potentially diagnostic, thei^utic, 
causative, and/or related, etc. ..) with the foUowing disease: Brancio-oto-renal syndrome 
(OMIN *6601445). Clones in this category include: flcd2_3ol7. 

Transketolases: Transketolase requires thiamin pyrophosphate as cofactor and shows 
a wide specificity for both reactants, e.g. converts hydroxypyruvate and R-CHO into CO(2) 
and R-CHOH-CO-CH(2)OH. OMIN reports that these proteins have associations (as 
potentially diagnostic, therapeutic, causative, and/or related, etc. . .) with the following 
diseases: Wernicke-Korsakoff Syndrome (OMIN ♦277730). Clones in this category include: 
tes3_17117. 

Fatty acid-CoA svnthetases/lipasesr These proteins contain AMP-binding domain 
signature(s), which is present in enzymes which act via an ATP-dependent covalent binding 
of AMP to their substrate. This domain is found in several CoA synthetases, such as acetate- 
CoA ligase (EC 6.2.1.1), long-chain-fatty-acid-CoA ligase (EC 6.2.1.3), bile acid-CoA ligase. 
OMIN reports that these protems have associations (as potentially diagnostic, therapeutic. 
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causative, and/or related, etc..) with the following diseases: 1) Alport syndrome , mental 
retardation and elliptocytosis (OMIN *300157); 2) Adrenoleukodystrophy (OMIN *300100). 
Clones in this category include: tes3_35kl7. 

ADP/ATP or Adenine Nucleotide Translocataors : These proteins contain 
mitochondrial energy transfer signature(s) and are most abundant in mitochondria. In its 
functional state, it is a homodimer of 30-kD subunits embedded asymmetrically in the inner 
mitochondrial membrane. The dimer forms a gated pore through which ADP is moved from 
the matrix into the cytoplasm.. OMFN reports that these proteins have associations (as 
potentially diagnostic, therapeutic, causative, and/or related, etc.) with the following 
diseases: 1 ) cardiomyopathy (OMIN * 1 03220); 2) myopathy (OMIN * 1 03220); 
3)Progressive external ophthalmoplegia (OMIN *601227). Clones in this category include: 
tes3_35nI2. 

Carboxvlesterases : OMIN reports that these proteins have associations (as potentially 
diagnostic, therapeutic, causative, and/or related, etc..) with the following diseases: 
l)hepatic carboxylesterase with detoxification of foreign compounds (OMIN ♦! 14835); 2) 
non-Hodgkin lymphoma (OMIN *1 14835); 3) B-cell chronic lymphocytic leukemia (OMIN 
* 1 14835); 4) rheumatoid arthritis (OMIN ♦ 1 14835). Clones in this category mclude: 
tes3_35n9. 

Heat shock proteins: OMIN reports that these proteins have associations (as 
potentially diagnostic, therapeutic, causative, and/or related, etc..) with the following 
diseases: 1)27 kd heat shock protein has been correlated with thermotolerance in response to 
environmental challenges and developmental transitions. (OMIN *6021295). Clones in this 
category include: utell_23el3. 

Nucleic acid management 

The genetic information is stored in the form of nucleic acids in all organisms. Two 
kinds of nucleic acids exist, DNA and RNA. Whereas the more stable DNA in most 
organisms constitutes the storage form of the genetic information, the labile RNA and in 
particular mRNA is an intermediate used for the temporal expression of specific genes. 

In eukaryotes, DNA is usually a double stranded linear molecule consisting of two 
antiparallel strands and made up of a deoxyribose, a phosphorus backbone and the four bases 
A, C, G, and T. The DNA of some organisms has a ring structure. The structure of DNA was 
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unraveled years ago by Watson and Crick. DNA is directional molecule determined by the C- 
atoms of the sugar. 

The most important processes dealing with nucleic acids ai«: 

• replication (e.g. DNA polymerases, Telomerase) 

• transcription (RNA polymerases) 

• RNA processing (maturation - splicing and degradation) 

• in addition, enzymes and proteins exist which require a nucleic acid (mostly RNA) in the 
active center to be functional (ribozymes - e.g. RNase, Ribosomal proteins) 

The DNA of a cell is replicated in the S-phase of the cell cycle. Several enzymes cany 
out the task of doubling this nucleic acid. As all steps of the cell cycle, also the process of 
replication is tightly regulated. The enzyme DNA polymerase and several other proteins are 
involved in this process. Whereas many prokaryotes do have only one origin of replication 
(i.e., the starting point of the replication cycle), in eukaryotic DNAs (chromosomes) multiple 
such start points exist. The switch from the synthesis (S) phase to the subsequent G2 or M 
phases of the cell cycle are dependent on the completion of the replication. This makes clear, 
that a number of proteins are involved in the replication itself as well as in the control of the 
process. Since most eukaryotic chromosomes are Unear structures, additional proteins and 
enzymes are necessary to make sure that the stnicture is maintained through successive 
generations. This includes those proteins necessary to build the three dimensional stnicture of 
chromosomes (e.g. histones) and the stnictural network of the nucleus and nucleolus 
(including the defined localization of transcriptionally active genes in the vicinity of nucleoli) 
but also such enzymes as telomerase which guarantees the integrity of the chromosomal ends. 

The expression of genes is usually perfonned in two steps. First a messenger RNA 
(mRNA) is produced (transcribed) in one to many copies and second this mRNA is translated 
into the protein product. The regulation of transcription is discussed under the separate 
heading 'transcription factors', but also the classes 'signal transduction', 'development', 'cell 
cycle' and others are affected as the expression of certain genes detennines the fate of a cell 
or organism. 

The primary transcript (hnRNA - heterogeneous nuclear RNA) is a single stranded 
one-to-one copy of the gene as it is located on the chromosome. Before a protein can be 
translated, already during transcription the process of maturation is initiated. Firstly, a 5' cap 
structure is enzymatically and covalently added to the RNA, blocking the 5* end of the RNA. 
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Second, when the RNA polymerase hfis terminated polymerization, the enzyme poly A 
polymerase adds varying numbers of adenine residues to the 3* end of the transcript. This 
enzyme recognizes the sequence AAUAAA or AUUAAA (+ some minor variations), cuts the 
RNA 10-30 nucleotides downstream and adds the A residues. The size of the poly A 
sequence affects the stability of the RNA. Finally, in the process of splicing, the introns 
present on the genomic level and also present in the hnRNA are spliced out by a multi-protein 
complex consisting of several proteins and RNAs. The finally maturated mRNA is exported 
to the cytoplasm where it is translated with help of the ribozymes. 

The half life of RNA is usually much shorter than that of DNA. Usually, the mRNA is 
degraded shortly after synthesis, to guarantee a very defined window of expression of a given 
gene. This regulation is necessary to specifically maintain or change the set of proteins 
present at any time in a cell. Specific regions in the 3'UTR (untranslated region) detennine 
the stability of the mRNA in the cytoplasm before it is degraded by RNases, enzymes 
consisting both of protein and RNA. 

References: Watson and Crick (1953) Nature 171: 737-738, 

Several categories of proteins are coded for by clones of the invention within the 
overall group of "Nucleic acid management"and include, among others, the following: 

RNA helicases including DEAD/H box helicases : RNA helicases comprise a large 
family of proteins that are involved in basic biological systems such as nuclear and 
mitochondrial splicing processes, RNA editing, rRNA processing, translation initiation, 
nuclear mRNA export, and mRNA degradation. RNA helicases are essential factors in cell 
development and differentiation, and some of them play a role in transcription and replication 
of viral single-stranded RNA genomes. The members of the largest subgroup, the DEAD and 
DEAH box proteins, exhibit a strong dependence of the unwinding activity on ATP 
hydrolysis. DEAD box proteins have been associated (as potentially diagnostic, therapeutic, 
causative, and/or related, etc.. .) as reported by with the following disease processes and/or 
genes: 1) ataxia-telangiectasia gene: "A human gene (DDXIO) encoding a putative DEAD- 
box RNA helicase at 1 1 q22-q23" Genomics 33:1 99-206, 1 996, Savitsky et al., (OMIN 
*601235); 2) hematopoetic tumors: "Cloning and expression of a murine cDNA homologous 
to the human RCK/P54, a lymphoma-linked chromosomal breakpoint 1 lq23". Gene 166:293- 
6, 1995, Seto et al. (OMIN ^600326); 3) dermatomyositis: a) "The major dermatomyositis- 
specific Mi-2 autoantigen is a presumed helicase involved in transcriptional activation." 
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Arthritis Rheum. 38: 1389-1399, 1995. Seelig et al. (OMIN ♦603277); b) "Two forms of the 
major antigenic protein of the dermatomyositis-specific Mi-2 autoantigen." (Letter), Arthritis 
Rheum. 39: 1769-1771, 1996.. Seelig et al. (OMIN *603277); c) "The dermatomyositis- 
q)ecific autoantigen Mi2 is a component of a complex containing histone deacetylase and 
nucleosome remodeling activities". Cell 95: 279-289, 1998. Zhang et al. (OMIN *603277); 4) 
Muscular Dystrophy, Pseudohypertrophic Progressive Duchenne and Becker Types (OMIN 
*3 10200); 5) Mucopolysaccharidosis Type IVA (OMIN *253000); 6) Albmism I (OMIN ' 
♦203 1 00); 7) Wilms Tumor 1 (OMIN * 194070); 8) Spinocerebellar Ataxia 7 (OMIN 
*164500). Clones in this category include: fbr2_23blO, fbr2_3cl8, fbr2_6ol7, fbr2_82i24, 
andtes3_14h21. 

Inorganic pvrophosphatase: Inorganic pyrophosphatase (EC 3.6.1.1) (PPase) is the 
enzyme responsible for the hydrolysis of pyrophosphate (PPi) which is formed as the product 
of the many biosynthetic reactions that utilize ATP. All known PPases require the presence of 
divalent metal cations, vnih magnesium conferring the highest activity. Clones in this 
category include: fbr2_64a 15. 

DNA-damage -inducible pr otein (dinPl or Proteins induced bv DNA-Damape : The 
dinB/P pathway is a second SOS-pathway in E.coli, Genes related to this seem to be 
involved in modulating DNA repair and mutagenesis. Clones in this category include: 
fbr2_72bl8^ 

Proteins with mvc-tvpe. helix-loop-helix d imerization domain sip naturefs) Thk 
helix-loop-helix domain mediates protein dimerization has been found in proteins such as the 
myc family of cellular oncogenes, proteins involved in myogenesis and vertebrate proteins 
that bind specific DNA sequences in various immunoglobulin chains enhancers. Therefore, 
these proteins could be novel DNA-binding iwoteins. Clones in this category include: 
fbr2_72112. 

Cytosolic ribosomal nroteins T/tfi- L36 seems to be part of the eukaryotic ribosomal 
peptidyl transferase centra- and can find application in modulation of ribosome assembly, 
maintenance and activity. Clones in this category include: fkd2_3b2. 

Ribonuclease H: Ribonuclease H proteins are RNA raodificating proteins and have 
been associated (as potentially diagnostic, therapeutic, causative, and/or related, etc..) with 
the following diseases as reported by OMIN: 1) Adenomatous Polyposis of the Colon (OMIN 
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* 175100); 2) Retinoblastoma (OMIN * 1 80200) ; and 3) Von Hippel-Lindau Syndrome 
(OMIN * 193300). Clones in this category include: phtes3_15j3. 


Signal transduction 

Cells in higher order organisms need to continuously communicate with its 
environment especially with other cells of the same organism in order to maintain the 
function and specialization of the whole system these cells are part of. This important task of 
communication is performed with help of cell-surface receptors which receive and transmit 
signals from outside into the cell. 
G-proteins 

The largest known family of cell-surface receptors is that of the G-protein-coupled receptors, 
which mediate the transmission of diverse stimuli such as neurotransmitters, glycopeptides, 
hormones, peptides, odorant molecules, and photons. The functional unit of these receptors is 
composed of the receptor molecule itself (GPCR) which is anchored in the cytoplasma 
membrane with seven membrane spanning domains, the heterotrimeric G-protein which is 
composed of a and Py-subunits (Ga and GPy), and the effectors that interact with Ga and / or 
GPy. In particular, the dissociated Ga and GPy can regulate the activities of a number of 
effector molecules such as adenylate cyclases, phopholipase C isoforms, ion channels, and 
tyrosine kinases, resulting in a variety of cellular functions. The process of signal 
transduction must be tightly regulated and reversible in order to avoid overstimulation, to 
achieve signal termination, and render the receptor responsive to subsequent stimuli 
[lacovelly L. et al., (1999) FASEBJ, 13, 1-8, Hamm, H.E. (1998) J. Biol Chem. 273, 669- 
672]. 

G-proteins are GTPases that, upon bindmg of GTP change their conformation which 
in return unmasks structural motives, in particular the so called effector loop, which can 
mediate the interactions to target proteins, or effectors, for the GTPases. This ability enables 
the GTPases to cycle between active, GTP-bound and inactive, GDP bound conformations 
and in the process to function as molecular traffic lights in a multitude of signal transduction 
pathways. The most important of these signal transduction pathways that are regulated with 
help of G-proteins are that of the phospholipase C / protein kinase C and that of the adenylate 
cyclase / protein kinase A. 
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The cycling of GTPases is tightly regulated by three main classes of proteins: The 
exchange of hydrolyzed GDP for a fresh OTP is facilitated by guanosine nucleotide exchange 
factors (GEFs), the hydrolysis of GTP to GDP is sped up by GTPase-activating proteins 
(GAPS), and the dissociation of GDP from the GTPases is inhibited by GDP dissociation 
inhibitors (GDIs) [Tapon and Hall (1997) Curr.Opin. Cell. Biol. 9, 86-92, Van Aelst and D- 
Souza-Schorey (1997) Genes Dev. 11, 2295-2322]. 

SOC-familv 

A conserved motif that was originally identified in proteins tiiat negatively regulate 
the signaling action of cytokines was termed SOCS box, the Suppressor Of Cytokine 
Signaling. Based on homology, five distinct structural protein classes have been identified 
since that cany this motif The function of most of tiiese proteins is presenUy not known. 
Common to the proteins is only the S<XS box which is located near tiie C-terminus of tiie 
respective peptides. Recently, the SOCS box has been demonstrated to induce binding of 
proteins to elongins B and C wWch could target the proteins (and bound substrates) to the 
proteasomal protein degradation pathway (Kamura, T. et al (1 998) Genes Dev. 12, 3872- 
3881; Zhang, y-G. etal (1999)Proc. Natl. Acad. Set. USA 96, 2071-2076). 

The class where tiie SOCS box was originally described contains several members 
(SOCS-l-SOCS-7 and CIS). In addition to tiie SOCS box, tiiese proteins also contain a SH2 
(Src-homology 2) domain and a variable N-terminus. These SOCS proteins appear to form 
part of a classical negative feedback loop tiiat regulates cytokine signal transduction. Upon 
cytokine stimulation, expression of SOCS proteins is rapidly induced and tiie proteins inhibit 
further cytokine action. The mode of action of tiie SOCS proteins is variable, While SOCS-1 
bmds and inhibits tiie JAK (Janus kinases) family of cytoplasmic protein kinases [Narahzaki 
M. etal. (1998) />roc. Natl. Acad. Sci. USA 95, 13130-13134, Nicholson. S.E. etal. (1999) 
EMBO. J. 18, 375-385], CIS appears to act by competing witii signaling molecules such as 
tiie STATs (Transducers and Activators of Transcription) family for binding to 
phosphoiylated receptor cytoplasmic domains [Yoshimuia, A. et al. (1995) EMBO J. 14, 
2816-2826; Matsumoto, A. etal. (1997) B/oorf 89, 3148-3154]. 

A second class of SOCS box protein contains additionally WD-40 repeats which were 
initially identified in tiie mouse WSB-1 and -2 proteins. The fimctions of WD-40 protems are 
not completely understood but seem to be rather divergent. In Cdc4p tiie WD-40 repeats 
probably are necessary for binding tiie substi^te for Cdc34p [Matiiias, N. et al. (1999) Mol. 
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Cell Biol 19, 1759-1767]. Cdc4p is a component of a ubiquitin ligase that tethers the 
ubiquitin-conjugating enzyme Cdc34p to its substrates. The posttranslational modification of 
a protein by ubiquitin usually results in rapid degradation of the ubiquitinated protein by the 
proteasome. The transfer of ubiquitin to substrate is a multistep process where WD-40 repeats 
might play an important function. 

Other WD-40 containing proteins (e,g. the retino blastoma binding protein RbAp48) 
have been shown to bind metal ions (Zinc) and that this metal binding might mediate and/or 
regulate protein-protein interactions which are functionally important in chromatin 
metabolism [Kenzior, A.L. and Folk, W.R. (1998) FEES Lett. 440, 425-429]. These proteins 
are involved in the RAS-cAMP pathway that regulates cellular growth [Ach R.A. et al 
(1997) Plant Cell % 1595-1606]. 

The SPRY domain has been identified in pyrin or marenostrin, a protein which is 
mutated in patients with Mediterranean fever and which is similar to the butyrophilin family. 
While butyrophilins seem to be involved in the lactation process in manmiais, the fiinction 
pyrin is unknovwi. Three proteins (SSB-1 to -3) have been identified to contain both SPRY 
and SOCS box motifs. The function of these proteins is also not known. 

Ankyrin repeat containing proteins share a 33-residue repeating motif, an L-shaped 
structure with protruding P-hairpin tips which mediate specific macromolecular interactions 
with cytoskeletal, membrane, and regulatory proteins. These proteins play fundamental roles 
in diverse biological activities including growth and development, intracellular protein 
trafficking, the establishment and maintenance of cellular polarity, cell adhesion signal 
transduction, and mRNA transcription. Three proteins that contain ankyrin repeats (ASB-1 to 
-3) have been identified to contain a C-terminal SOCS box additionally to the ankyrin 
repeats. The fimction of these proteins or the individual domains remains to be discovered 
[Hilton, DJ. et al (1998) Proc. Natl Acad, ScL USA 95, 1 14-1 19]. 

A few small GTPases (RAR and RAR like) do also contain a SOCS box. GTPases are 
involved in signal transduction during cellular communication. The fkiction of the SOCS box 
in this type of proteins is currently unclear [Hilton, D.J. et al (1998) Proc. Natl Acad. Scl 
i/&4 95, 114-119]. 

Ca as second messenger 

The bivalent cation Ca^^ is, besides cAMP, one of the two major second messengers 

in eukaryotic cells. Its intracellular concentration is tightly regulated and usually kept very 
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low compared to the cell's environment. Ca'"^ binding proteins and transporters (Gap junction. 
Voltage-gated, second messenger-gated) help to sequester huge amounts of the ion in various 
organelles from where Ca^* can be released upon extracellular stimuli. E.g. the contraction of 
the muscle is dependent on the presence of Ca^* ions which are readily transported back into 
the organelles in order for the muscle to relax. In signal transduction, Ca^* functions as a 
second messenger that activates Ca^* dependent processes through the activation of 
Ca'Vcalmodulin dependent protein kinases (CaM kinases) which are the major effector 
molecules of Ca^. In the signaling cascades, the CaM dependent kinases activate 
phospholipases (e.g. phosphoiipase C) that in return activate other protein kinases such as 
protein kinase C. 

cAMP 

The cyclic AMP is produced by the enzyme adenylate cyclase in response to 
extracellular signals. Certain G-proteins stimulate the activity of adenylate cyclase which 
converts ATP to cAMP and PPi. Two molecules of cAMP bind to each of two regulatory 
subunits of cAMP dependent protein kinase which in turn dissociate from the two catalytic 
subunits of the heterotetramer RA- Upon release of the C-subunits, they become active and 
phosphorylate substrate proteins at Ser and Thr residues. The process leading from binding of 
extracellular molecules to their receptors, the transmission of the stimuli into the cell, the 
activation of adenylate cyclase and the subsequent activation of cAMP dependent protein 
kinase is one of two major signal transduction pathways in eukaryotic cells. Since the 
phosphorylation of proteins is a posttranslational modification of proteins, the kinases are 
described in the class "signal transduction." 

SARA 

Members of the transforming growth fector 6 (TGF6) superfemily signal through a 
femily of cell-surface transmembrane serine^jreonine kinases, known as type I and type H 
receptors (Heldin et al., 1997 ; Attisano and Wrana, 1998 ; Kretzschmar and Massagu^, 
1998). Ligand induces foimation of heteromeric complexes of these receptors, and signaling 
is initiated when receptor I is phosphorylated and activated by the constitutively active kinase 
of receptor II (Wrana et al.. 1994 ). The activated type I receptor kinase then propagates the 
signal to a family of intracellular signaling mediators known as Smads (contraction of the 
C.elegans Sma and Drosophila Mad genes which were the first identified members of this 
class of signaling effectors). 


41 


wo 01/12659 PCT/lBOO/01496 
Three classes of Smads with distinct functions have been defined: the receptor- 
regulated Smads, which include Smadl, 2, 3, 5, and 8; the common mediator Smad, Smad4; 
and the antagonistic Smads, which include Smad6 and 7 (Heldin et al., 1997; Attisano and 
Wrana, 1998 ; Kretzschmar and Massague, 1998 ). Receptor-regulated Smads (R-Smads) act 
as direct substrates of specific type I receptors, and the proteins are phosphorylated on the last 
two serines at the carboxyl terminus within a highly conserved SSXS motif (Macias-Silva et 
al., 1996 ; Abdollah et al., 1997 ; Kretzschmar et al., 1997 ; Liu et al., 1997b ; Souchelnytskyi 
et al., 1997 ). Regulation of R-Smads by the receptor kinase provides an important level of 
specificity in this system. Thus, Smad2 and Smad3 are substrates of TGFB or activin 
receptors and mediate signaling by these ligands (Macfas-Silva et al., 1996 ; Liu et al., 1997b 
; Nakao et al., 1997 ), whereas Smadl, 5, and 8 are targets of BMP receptors and propagate 
BMP signals (Hoodless et al., 1996 ; Chen et al., 1997b ; Kretzschmar et al., 1997 ; 
Nishimura et al., 1998 ). Once phosphorylated, R-Smads associate with the common Smad, 
Smad4 (Lagna et al., 1996 ; Zhang et al., 1997 ), and mediate nuclear translocation of the 
heteromeric complex. In the nucleus, Smad complexes then activate specific genes through 
cooperative interactions with DNA and other DNA-binding proteins such as FASTI, FAST2, 
and Fos/Jun (Chen et al., 1996 , Chen et al., 1997a ; Liu et al., 1997a ; Labbe et al., 1998 ; 
Zhang et al., 1998 ; Zhou et al., 1998 ). In contrast to R-Smads and Smad4, the antagonistic 
Smads, Smad6 and 7, appear to fimction by blocking ligand-dependent signaling (reviewed in 
Heldin et al., 1997 ). 

Phosphorylation of R-Smads by the type I receptor is essential for activating the 
TGFB signaluig pathway (Heldin et al., 1997 ; Attisano and Wrana, 1998 ; Kretzschmar and 
Massagu6, 1998 ). However, little is known of how Smad interaction with receptors is 
controlled. A novel Smad2/Smad3 interacting protein has been described (Tsukazaki T. et al., 
1998 ) that contains a double zinc finger, or FYVE domain, and which has been called SARA 
(Smad anchor for receptor activation). The SARA motif recruits Smad2 into distinct 
subcellular domains and co-localizes and interacts with TGFB receptors. TGFB signaling 
mduces dissociation of Smad2 from SARA with concomitant formation of Smad2/Smad4 
complexes and nuclear translocation. Moreover, deletion of the FYVE domain in SARA 
causes mislocalization of Smad2 and inhibits TGFB-dependent transcriptional responses. 
Thus, SARA defines a component of TGFB signaling that fimctions to recruit Smad2 to the 
receptor by controlling the subcellular localization of Smad. 
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Calciimi 

The bivalent cation Ca^* is, along with cAMP, one of the two major second 
messengers in eukaryotic cells. Its intracellular concentration is tightly regulated and usually 
kept very low compared to the cell's environment Ca^* binding proteins and transporters 
(Gap junction, Voltage-gated, second messenger-gated) help to sequester huge amounts of the 
ion in various organelles from where Ca^* can be released upon extracellular stimuli. E.g. the 
contraction of the muscle is dependent on the presence of Ca^* ions which are readily 
transported back into the organelles in order for the muscle to relax. In signal transduction. 
Ca^^ functions as a second messenger that activates Ca^* dependent processes thix)ugh the 
activation of Ca^/cahnodulin dependent protein kinases (CaM kinases) which are the major 
effector molecules of Ca'\ In the signaling cascades, the CaM dependent kinases activate 
phospholipases (e.g. phospholipase C) that in return activate other protein kinases such as 
protein kinase C. 

Rab proteins 

In eukaryotic cells the compartmentalization of processes is a prerequisite for a tight 
regulation of processes and activities. The cells contain a highly dynamic set of membrane 
compartments that are responsible for packaging, sorting, secreting, and recycling proteins 
and other molecules. Trafficking between organelles within the secretory pathway occurs as 
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vesicles derived from a donor compartment fuse with specific acceptor membranes, resulting 
in the directional transfer of cargo molecules. This process is tightly controlled by the 
RabA'pt family of proteins (reviewed by Novick and Zerial, 1997 ), a branch of the 
superfamily of small GTPases. Rab proteins regulate a variety of functions, including vesicle 
translocation and docking at specific fusion sites. Rabs may also play critical roles in higher 
order processes such as modulating the levels of neurotransmitter release in neurons, a likely 
mechanism in synaptic plasticity that underlies learning and memory (Geppert and Siidhof, 
1998 ). 

Small GTPases share a conunon three-dimensional fold that, in the OTP bound state, 
can bind a variety of downstream effector proteins. GTP hydrolysis leads to a conformational 
change in the "switch" regions that renders the GTPase unrecognizable to its effectors. In this 
way, by localizing and activating a select set of effectors, a common structural motif is used 
to control a wide array of distinct cellular processes. 

The final steps in membrane fusion are likely to be driven by a set of proteins known 
as SNAREs. After a vesicle becomes docked, the cytoplasmic domains of VAMP (also 
termed synaptobrevin) and syntaxin on opposing membranes, in combination with a SNAP- 
25 molecule, coalesce into an elongated -helical bundle (Poirier et al., 1998 ; Sutton et al., 
1998 ), which may lead to fusion. Because numerous SNARE isoforms have been identified 
that localize to distinct membrane compartments, it was originally proposed that the 
specificity of interaction between the SNARE proteins accounted for the specificity in 
membrane trafficking. Recent results, however, suggest that SNAREs are not specific in their 
ability to form complexes in vitro, suggesting that trafficking specificity requires additional 
factors (Yang et al., 1999 ). In this regard, Rab proteins are strong candidates for govemmg 
the specificity of vesicle trafficking. Like the SNAREs, many isoforms (40) of the Rab family 
have been identified that localize to specific membrane compartments (reviewed by Novick 
and Zerial, 1997). 

Concomitant with the SNARE cycle, Rab proteins undergo a intricate cycle of 
membrane and protein interactions. Rabs are posttranslationally modified at C-terminal 
cysteines by the addition of two geranylgeranyl groups, which mediate membrane association 
when the Rab is in the GTP-bound state. After guanine nucleotide hydrolysis occurs, the Rab 
is extracted from the membrane upon forming a complex with a cytosolic GDP-dissociation 
inhibitor (GDI). This cytosolic intermediate is then recycled onto a newly forming vesicle. 
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most likely through a secondary factor termed a GDI dissociation factor (GDF), which 
displaces GDI. After the Rab becomes membrane bound, a guanidine nucleotide exchange 
factor (GEF) promotes release of GDP and the subsequent loading of GTP. In its GTP-bound 
conformation, the Rab is then free to associate with its specific set of effectors, v^ich can in 
turn trigger events leading to the eventual fusion of the vesicle with a target membrane. To 
complete the cycle, perhaps after or concurrent with membrane fusion, a GTPase activating 
protein (GAP) accelerates nucleotide hydrolysis, switching oflF the GTPase. The remaining 
GDP-bound Rab can then participate in a new round of fusion. 

Rab interactions with effectors are likely to regulate vesicle targeting and membrane 
fusion in three ways. First, a Rab may specifically facilitate vectorial vesicle transport. 
Vesicles are transported from their site of origin to acceptor compartments likely through 
associations with cytoskeletal elements and transport motors. A protein has been identified 
with a domain structure that suggests a connection between the cytoskeleton and the Rabs. 
This protein, called Rabkinesin-6, contains a kinesin-like ATPase motor domain followed by 
a coiled-coil stalk region and a RBD that specifically binds Rab6 (Echard et al., 1998 ). An 
additional link with the cytoskeleton is provided by the Rab effector, Rabphilin-3A. 
Rabphilin-3A has been shown in vitro to interact with -actinin, an actin-bundling protein, but 
only when not bound to Rab3A (Kato et al., 1996 ). These results raise the intriguing 
possibility that Rab proteins regulate vesicle interactions with the cytoskeleton and thereby 
play an active role in targeting vesicles to their appropriate destinations. 

Second, Rab proteins may regulate membrane trafficking at the vesicle docking step. 
A number of Rab effectors, including Rabaptin-5, EEAl, Rabphilin-3A, and Rim, may serve 
as molecular tethers. Each effector protein contains a RBD, followed by a linker rcgion (some 
having the potential to form elongated coiled-coil structures), and a domain capable of 
interacting with a second Rab or the target membrane. Rabaptin-5, for example, contains two 
RBDs, one near the N terminus that specifically recognizes Rab4 and a second near the C 
terminus that binds RabS (Vitale et al., 1998 ). Both Rim, which is localized to the target 
membrane, and Rabphilin-3A, which is localized to the vesicle, contain N-terminal RBDs and 
C-terminal Ca2+-binding C2 domains, implicating these effectors in synaptic vesicle 
localization or docking in response to Ca2+ influx (Wang et al., 1997 ). Tethering effectors 
may also recognize protein complexes on the acceptor membrane. Sec4p, a yeast Rab3A 
homolog, interacts with the exocyst (Guo et al.. 1999 ), a complex of seven or more subunits 
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that is assembled at sites of vesicle fusion along the plasma membrane. The exocyst complex 
may therefore function as a landmark for Rab/effector-mediated vesicle docking. 

Third, once a vesicle has become tethered to its fusion site, Rab proteins may 
selectively activate the SNARE fusion machinery. The mechanism of this activation is 
unknown but may involve direct interactions of Rabs or, more likely, their effectors with 
SNARES. For example, Hrs-2 is a protein that binds to SNAP-25 and contains a Zn2+-finger 
motif characteristic of Rab-bmding proteins such as Rabphilin-3A, Rim, EEAl, and Noc2, 
suggesting that Hrs-2 may foim a physical link between Rabs and SNAREs (Bean et al., 
1997). In addition, certain mutations in the syntaxin-binding protein Slylp, the Seclp 
homolog utilized in ER to Golgi trafficking, eliminate the requirement for Yptlp, a Rab 
protein that functions at this trafficking step (Dascher et al., 1991 ). Rabs may therefore 
regulate SNARE associations through Seel family members. In support of this idea, a Rab 
effector was recently found to interact with a vacuole Rab, a Seclp homolog, and a SNARE 
protein (Peterson et al., 1999 ), which suggests that this effector serves to connect Rab and 
SNARE function. In this way, Rabs and their effectors may facilitate the correct pairing of 
SNAREs. 

References: Dascher et al. (1991). Mol. Cell. Biol. 11, 872-885; Echard et al. (1998). 
Science. 279, 580-585; Geppert et al. (1998). Annu. Rev. Neurosci. 21, 75-95; Guoet al. 
(1999). EMBO J. 18, 1071-1080; Kato et al. (1996). J. Biol. Chem. 271. 31775-31778; 
Novick et al. (1997). Curr. Opin. Cell Biol. 9, 496-504; Peterson et al. (1999). Curr. Biol. 9, 
159-162; Poirier et al. (1998). Nat. Struct. Biol. 5, 765-769; Vitale et al. (1998). EMBO J. 17, 
1941-1951; Wang et al. (1997). Nature. 388, 593-598; Yang et al. (1999). J. Biol. Chem. 274, 
5649-5653. 

Kinases 

Reversible posttranslational modifications of proteins are major means of regulating 
cellular activities. Among the various modifications that are carried out by the cells, the 
addition of phosphoryl groups to Ser/Thr or Tyr residues is the most important and widely 
used. The phosphorylation of proteins is accomplished by protein kinases, while the reverse 
reaction, the removal of phosphoiyl groups, is carried out by phosphatases. Kinases / 
Phosphatases regulate key positions e.g. in the processes of cell proliferation, differentiation 
and communication/signaling. These processes must be tightly regulated in order to maintain 
a steady state level of cellular fate. Mis-regulation of kinase activities (or that of 
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phosphatases) is made responsible for a multitude of disease processes such as oncogenesis, 
inflammatory processes, arteriosclerosis, and psoriasis. 

Protein kinases constitute the largest protein family that is currently known. Several 
hundred kinases have been identified already. Classically, kinases are subdivided mto two 
classes based on the amino acid residues in their substrates that are phosphorylated by the 
particular enzymes. The kinases specifically add phosphoryl groups fi-om adenosine 
triphosphate (ATP) or, less fi^uently, guanosine triphosphate (GTP), either to serine and/or 
threonine or to tyrosine residues of substrate proteins. An estimated 1,000 to 10,000 proteins 
present in a typical manunalian cell are believed to be regulated also by the action of protein 
kinases. 

Protein kinases are fi-equently integral parts of signaling cascades that transmit 
extracellular stimuli (e.g. hormones, neurotransmitters, growth- or differentiation factors) into 
the cell and result in various responses by the cells. The kinases play key roles in these 
cascades as they constitute a sort of 'molecular switches' turning on or off the activities of 
other enzymes and proteins, e.g. metabolic, regulatory, channels and pumps, receptors, 
cytoskeletal, transcription factors. 

The regulation of kinase activities is accomplished by various means: 

The best characterized example for the regulation via regulatory subunits is the 
cAMP-dependent protein kinase (PKA) which is also a prototype for second messenger 
activated protein kinases. This enzyme consists of a heterotetramer of two catalytic (C) and 
two regulatory (R) subunits. Upon binding of two molecules of second messenger (cAMP) in 
each R subunit, the catalytic subunits are released and active. Both of the catalytic and the 
regulatory subunits several isoforms exist. The combination of catalytic and regulatory 
subunits determines the localization of the holoenzyme and also the substrate spectrum that is 
available for phosphorylation. The consensus pattern necessary to be present m the substrate 
for PKA action is RRXSA' where X can be any amino acid. 

The casein kinase II comprises another examples for holoenzymes that consist of 
catalytic and regulatory subunits. Other kinases that are activated by second messengers are 
cGMP-dependent protein kinase and Protein kinase C (PKC) which is activated by 
diacylglycerol, which in turn is produced by phospholipases by cleavage of 
phosphatidylcholine. 
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Receptor kinases usually consists of an extracellular domain which can bind effector 
molecules (e.g. growth factors and hormones) and transfer the stimulus to the intracellular 
domain of these proteins which usually is a protein tyrosine kinase. Other tyrosine kinases 
lack an extracellular domain but are associated with receptors which transfer the signal after 
effector binding by activating the associated protein kinase enzyme (e.g. Src kinase family; 
Src, Blk, Fgr, Fyn, Lck Lyn, Yes and Janus kinase family; Jakl-3, Tyk2). 

Dysfunction of kinases, e.g. caused by non-functionmg regulation, can be the cause of 
inflammatory diseases and uncontrolled proliferation. v-Src which is a truncated version of 
the C-Src protooncogene tyrosine kinase is a classical example for this process as v-Src does 
not contain the regulatory domain of the cellular gene and is thus constitutively active. 

Several categories of proteins are coded for by clones of the invention within the 
overall group of "Signal transduction"and include, among others, the following: 

Neurocalcin fRccoverin'^ : Neurocalcin is a Ca(2+)-binding protein with three putative 
Ca(2+)-binding domains (EF-hands). In cattle, 6 isoforms are differentially expressed in the 
central nervous system, retina and adrenal gland. Homology with recoverin indicates 
involvement in Ca2+ dependent activation of guanylate cyclase.. These proteins can find 
application in modulating/blocking the guanylate cyclase-pathway. Diseases associated (as 
potentially diagnostic, therapeutic, causative, and/or related, etc..) with these proteins 
include as reported by OMIN 1) autosomal dominant cone dystrophy (OMIN * 600364); 2) 
cone dystrophy 3 (OMIN *600364); 3) cancer associated retinopathy (OMIN * 17961 8). 
Clones in this category include: fbr2_23b21. 

Proteins with a WW Domain : Proteins that contain a WW domain which has been 
originally described as a short conserved region in a number of unrelated proteins, among 
them dystrophin, the gene responsible for Duchenne muscular dystrophy. The domain, v/tdch 
spans about 35 residues, is repeated up to 4 times in some proteins. It has been shown to bind 
proteins with particular proline-motifs, [AP]-P-P-[AP]-Y, and thus resembles somewhat SH3 
domains. This domain is frequently associated with other domains typical for proteins in 
signal transduction processes. Examples of proteins containing the WW domain are 
Dystrophin, Utrophin, vertebrate YAP protein (binds the SH3 domain of the Yes 
oncoprotem), murine NEDD-4 (embryonic development and differentiation of the central 
nervous system), IQGAP (human GTPase activating protein acting on ras). Therefore these 
proteins should be involved in intracellular signal transduction. Diseases associated (as 
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potentially diagnostic, therapeutic, causative, and/or related, etc. . .) with these proteins 
include as reported by OMIN 1) Muscular Dystrophy, Pseudohypertrophic Progressive 
Duchenne and Becker Types (OMIN ♦310200). Clones in this category include: fbr2_23nl6. 

Protein sub strates for cAMP-dependent protein kinase : Acting as a choride channel or 
chloride channel inhibitor these proteins have been associated (as potentially diagnostic, 
therapeutic, causative, and/or related, etc..) as reported by OMIN with Cystic Fibrosis 
(OMIN #219700). Clones in this category include fbr2 J2il7. 

Sphineosine kinase: Sphingosine kinase is a new type of lipid kinase, which is 
regulated by growth factors. The enzyme phosphorylates sphingosine, which subsequently 
exerts intracellular and extracellular actions. Intracellulary, sphingosine 1 -phosphate (SPP) 
promotes proliferation and inhibits apoptosis. In yeast, survival of cells exposed to heat shock 
indicates is dependent on SPP. Extracellulaiy, SPP inhibits cell motility and mfluences cell 
morphology, effects that appear to be mediated by the G protein-coupled receptor EDGl. 
These proteins have been associated (as potentially diagnostic, therapeutic, causative, and/or 
related, etc. . .) as reported by OMIN with Gaucher Disease, Type I (OMIN *230800). Clones 
in this category include fbr2_82m6. 

Vanilloid Receptors : VRl seems to play an important role in the activation and 
sensitization of nociceptors. It is the receptor for e.g. capsaicin, a selective activator of 
nociceptors, a natural product of capsicum peppers. Related can find application as a target 
for the development of new nociception-modulating drugs. Clones in this category include 
tes3_20k2. 

RCCl (Re gulator of chromosome condensation^ RCC1 (regulator of chromosome 
condensation) is a eukaryotic protein which binds to chromatin and interacts with ran, a 
nuclear OTP-binding protein. RCCl promotes the exchange of bound GDP with GTP, acting 
as a guanine-nucleotide dissociation stimulator. These proteins can find application in the 
regulation of gene expression by activition of nuclear GTP-binding proteins. The X-linked 
retmitis pigmentosa is a result of a defect GTPase regulator, which contains a RCCl -type 
repeat. OMIN also reports that RCCl has associations (as potentially diagnostic, therapeutic, 
causative, and/or related, etc. . .) v^th retinitis pigmentosa (OMIN +3 12610). Clones in this 
category include tes3_21d4. 

Ras inhibit or proteins : Ras is a signal transducting molecule involved in the receptor 
tyrosine kinase/RAS/Map kinase signalling cascade. Ras proteins bind GDP/GTP and show 
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intrinsic GTPase activity. Mutations in ras, which change aa 12, 13 or 61 activate the 
potential of ras to transform cultured cells and are implicated in a variety of human tumours. 
Ras inhibitor proteins have been associated (as potentially diagnostic, therapeutic, causative, 
and/or related, etc..) with many disease processes as reported by OMIN including: 1) 
Tumors of the lung, breast, brain, pituitary, pancrase, bone, skin, hiladder, kidney, ovary, 
prostate and lymphocyte. Melanoma (OMIN *600160); 2) X-linked non-specific mental 
retardation (OMIN ♦300104); 3)adenomatouspolyposis of the colon (OMIN ♦175100); 4) 
Beckwith-Wieddemann Syndrome (#130650); and 5) Major affective disorder 1 (OMIN 
♦125480). Clones in this category include utel_22g21. 

Mammalian proteins comicon involving the EGF-receptor : Comicon proteins are part 
of a signal transduction pathway involving the EGF-receptor. The EGF-receptor has been 
reported by OMIN to be associated (as potentially diagnostic, therapeutic, causative, and/or 
related, etc. . .) with the following diseases: 1 ) Familial hypercholesterolemia (OMIN 
143890); 2) Leprechaunism (OMIN #246200); 3) Hemophilia B (OMIN ♦306900); 4) 
Ectodermal dysplasia 1 ; 5) Kartagenerer syndrome (OMIN ^244400) and 6) Glioma of the 
brain (OMIN ♦137800). ). Clones in this category include utel_22el2. 

Transmembrane proteins 

Membrane region prediction was effected using the ALOM2 software (Klein et al., 
1985; version 2 by K. Nakai). Sunilar to many other methods, the Kyte & Doolitle (1982) 
amino acid hydrophobicity scale is used in ALOM2 as the primary variable for classifying 
sequences in terms of their localization. High prediction accuracy is achieved through the 
system of intelligent decision rules and the utilization of a carefully selected training data set. 
The method also generates reliability estimates which makes it possible to distinguish 
between membrane-spanning proteins (I, intrinsic) and globular proteins with regions of high 
hydrophobicity buried in the core. 

For a protein of length jL, the block of length / with maximum hydrophobicity is 

found: 


Jt+Z-l 

max// = max(l//) s 

A=l /,-/+! 
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where //, represents the hydrophobicity of an individual residue. 


Let P(I/maxH) and P(E/niaxH) be the conditional probabilities that a protein is 
integral or peripheral, respectively, given its value of maximal hydrophobicity maxH, and let 
P(I) and P(E) be the prior probabilities of intrinsic and extrinsic membrane proteins estimated 
from the training set. Then a sequence is assigned to E if 

P(E/maxH)>P(I/maxH) 

or, after applying the Bayes rule, 

P(E)P(maxH/E) > P(I)P(maxH/I), 

where the conditional probabilities P(maxH/E) and P(maxH/I) can be detemined 
based on the estimates of probability distributions of maxH in both groups. 

Discriminant analysis allows to simplify this task by calculating the odds 
P(E/MaxH):P(I/maxH) as e*, where b is the left-hand side of a linear or quadratic inequality. 
For example, for the window of length 17, the protem is allocated to the peripheral category E 
based on the empirically derived quadratic inequality: 

1.05(maxH)'+12.30maxH+17.49 >0, 

whereas the optimal inequality for assigning membrane proteins (category I) is linear: 
-9.02maxH+ 14.27 >0 

The odds parameter can be made more or less stringent. For example, one can require 
odds at least 1 :10 for a protein to be classified as integral. This leads to higher selectivity but 
less sensitivity. 

The boundaries of membrane-spanning regions in putative membrane proteins are 
detected by means of an iterative procedure whereby the most hydrophobic region 
conesponding to the value maxH is considered to be membrane and removed from the 
sequence. The classification procedure is then repeated again for the remaining sequence, 
and, if such a protein is again classified as integral, the next most hydrophobic region is 
considered. 


51 


wo 01/12659 PCT/IBOO/01496 
Reference: Klein, P., Kanehisa, M., DeLisi, C. (1985) The detection and 
classification of membrane-spanning proteins. Biochem BiophysActa 815: 468-476 


Transcription factors 

Purified eukaryotic RNA polymerase II is unable to initiate promoter-specific 
transcription. A family of factors that collectively confer RNAPII promoter specificity is 
known as the general transcription factors (GTFs). They include the TATA-binding Protein 
(TBP) TFIIB, TFIIE, TFIIF and TFI IH. These factors are conserved among all eukaryotes. 

RNAPII complexes containing the entire set of GTFs or a subset of GTFs together 
with other proteins have been isolated from mammalian and yeast cells. Although purified 
RNAPII and GTFs are sufRcient for promoter-specific initiation, this system fails to respond 
to activators. This is mediated by a fiirther complex termed mediator complex which 
associates with the carboxy-terminal heptapeptide domain (CTD) of the largest subunit of 
RNAPII. 

Purification of human RNAPII complexes resulted in two distinct forms of human 
RNAPII after analysis of fimctional properties. One complex contained chromatin remodeling 
activities but was devoid of GTFs. The other complex did not contain factors that modify 
chromatin but contained a subset of SRB/mediator subunits and GTFs and other polypeptides 
that mediate transcriptional activation, a scenario similar to that reported for yeast. 

A complex designated NAT (-20 SU) for negative regulator of transcription contains 
RNAPII, Cdk8, homologs of the yeast mediator complex as well as Rgrl and SrblO/1 1 
known as negative regulators of transcription. 

A complex with striking similar structural and functional properties to NAT has been 
identified designated SMCC H5 SU) (SRB/mediator coactivator complex), that can also 
mediate transcriptional activation. 

The SMCC complex includes all reported NAT subunits including subunits of the 
TRAP complex. TRAP is a coactivator complex isolated on the basis of its interaction with 
the thyroid hormone receptor. Another coactivator complex DRIP, isolated on the basis of its 
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ability to interact with the vitamin D3 receptor, contains novel subunits as well as subunits of 
NAT/SMCC and TRAP complexes. 

The effects of each of these coactivator complexes is dependent on the TFIID 
complex. It is not known if the T AF subunits of TFIID are required. It is likely that new 
coactivator complexes will be uncovered containing both novel and previously defined 
components. 

Beside the huge amount of transcription factors which can be part of the RNAIIP 
holoenzyme or the coactivator complexes there is an even larger quantity of specific 
transcription factors binding to promoter elements widiin the DNA sequences of a given gene 
leading to activation or repression of transcription. A broad range of cellular responses like 
differentiation, proliferation, cell death and others are elicited through activating or 
repressing the transcription of target genes. 

There are at least five superclasses of transcription factors: 

L Superclass contains me mbers with characteristic basic domains: 

Members are: 

Leucine zipper factors, where the basic domain is followed by a leucine zipper of 
repeated leucine residues at every seventh position. The zipper mediates protein dimerization 
as a prerequisite for DNA-binding. 

Helix-loop-helix factors (bHLH) contain a DNA-binding basic region followed by a 
motif of two potential amphipathic alpha-helices connected by a loop of variable length also 
mediating dimerization. 

Factors with a combination of Helix-loop-helix and leucine zipper. 

Further members of this superclass are NF-1, RF-X, and bHSH like proteins. 

2. Superclass comprises factors containinp ^ nc-coordinating DNA-hindinp domains 

Members are: 
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Proteins with Cys4 zinc finger of nuclear receptor type, where two such motifs 
differing in size, composition and function are present in each receptor molecule. Each finger 
comprises 4 cysteine residues coordinating one zinc ion. The second half including the 
second cysteine pair has alpha-helix conformation and the helix of the first finger binds to the 
DNA through the major groove. The sequence between the first two cysteines of the second 
finger mediates dimerization upon DNA-binding. This class includes the steroid hormone 
receptors and the thyroid hormone receptor-like factors. Other diverse cys4 zinc fingers have 
a motif of GATA-type. 

Proteins with Cys2His2 zinc finger domain(s). Each finger comprises 2 cysteine and 2 
histidine residues coordinating one zinc ion, and in some cases one histidine is replaced by 
another cysteine. The zinc ion is essential for DNA-binding. 

Proteins with Cys6 cysteine-zinc cluster(s). Six cysteine residues coordinate two zinc 
ions, i. e. two of the thiol groups are coordinating two zinc ions each. Present in many fungal 
regulators. 

Zinc fingers of alternating composition. 

3. Superclass contains factors of helix-tum-helix type. 

Members are: 

Proteins with homeo domains. Homeo domains are three consecutive alpha-helix 
structures. Helix 3 contacts mainly the major groove of the DNA, some contacts at the minor 
groove are observed as well. Helix 2 and 3 resemble the helix-tum-helix structure of 
prokaryotic regulators. 

Proteins vsdth Paired box domain(s). This is a DNA-binding domain of approximately 
130 amino acid residues. Its N-termioal half is basic, its C-terminal half is highly charged in 
general. It probably comprises 3 alpha-helices. 

Proteins with Fork head / winged helix domain(s). This domain was identified by 
homology between HNF-3A and fkh. The domain comprises approx. 1 10 AA. Analysis of the 
crystal structure has revealed a compact structure of three alpha-helices, the third alpha-helix 
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being exposed towards the major groove of the DNA. The domain also exerts minor groove 
contacts. Upon binding to DNA, it induces a bend of 13 degree. 

Heat shock factors 

Proteins with Tryptophan clusters. The tryptophan clusters comprise several 
tryptophan residues with a spacing of 12-21 amino acid residues; the subclass of myb-type 
DNA-binding domains typically exhibit a spacing of 19-21 amino acid residues. 

Proteins with TEA domain(s). The TEA domain has been identified as a region which 
is conserved among the transcription factors TEF-1, TECl and abaA. This domain in TEF-1 
has been shown to interact with DNA, although two additional regions may also contribute to 
DNA-binding. h is predicted to fold into three alpha-helices, with a randomly coiled region of 
16-18 amino acid residues between helices 1 and 2, and a short stretch between helices 2 and 
3 of 3-8 residues. 

4^ Superclass contains beta-Scaffold Factors vnth Minor Groove Contacts 

Members are: 

Proteins with RHR (Rel homology) region. 

The structure of the Rel-type DBD exhibits a bipartite subdomain structure, each 
subdomain comprising a beta-barrel with five loops that form an extensive contact surface to 
the major groove of the DNA. Particularly, the first loop of the N-terminal subdomain (the 
highly conserved recognition loop) performs contacts with the recognition element on the 
DNA, but other loops are involved. The fact that the main DNA-contacts are made through 
loops has been suggested to provide a high degree of flexibility in binding to a range of 
different target sequences. Augmenting interactions are achieved by two alpha-helices within 
the N-terminal Part that form strong minor groove contacts to the A/T-rich center of the B- 
element. In p65, the sequence between both alpha-helices is much shorter and even helix 2 is 
truncated. The second, C-terminal domain is necessary mainly for protein dimerization. 

p53 proteins 
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MADS (MCMI-agamous-deficiens-SRF) box proteins. Proteins of this class comprise 
a region of homology. The DNA-binding domain also comprises the dimerization capability. 
In the DNA-bound dimer (shown for SRF), two antiparallel amphipathic alpha-helices (alpha- 
I), form a coiled coil and are oriented approximately parallel on the minor groove. These 
helices make minor and major groove contacts, the N-terminal extensions form minor groove 
contacts. The bound DNA is bent and wrapped around the protein. It exhibits a compressed 
minor groove in the center and widened minor groove in the flanks. 

Beta-Barrel alpha-helix transcription factors. 

TATA-binding proteins 

HMG proteins 

Proteins of this class comprise a region of homology with the chromosomal non- 
histone HMG proteins such as HMGl. This region comprises the DNA-binding domain 
which in some instances such as HMGl mediates sequence-unspecific, in other cases such 
LEF-1 sequence-specific bmding to DNA. This domain exhibits a typical L-shaped 
conformation made up of 3 alpha-helices and an extended N-terminal extension of the first 
helix. The latter together with helix 1, which contains a kink, form the long arm of the L, 
whereas helices 1 and 2 form the short arm. Binding to the minor groove induces a sharp 
bending of the DNA by more than 90 degree, away &om the bound protein. The overall 
topology of the DNA-protein complexes resembles somewhat that of the TBP-TATA box 
complex. 

Heteromeric CC AAT factors 

Proteins with Grainyhead domain(s) 

Cold-shock domain factors. Cold-shock domain proteins are characterized by a highly 
conserved region first found in prokaryotic cold-shock proteins. This domain is a single- 
stranded nucleic acid-binding structure mteracting with DNA or RNA. It consists of an 
antiparallel five-stranded beta-barrel, the strands of which are connected by turns and loops. 
Within this structure, a three-stranded beta-strand contains a conserved RNA-binding motif, 
RNPl. Not all CSD proteins are transcription factors. Those which specifically bind to a 
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certain sequence are termed Y-box proteins. Proteins of this class were previously called 
protamine-like domain proteins because of having a highly positively charged domain with 
interspersed proline residues. 

Proteins with Runt homology domain 

The members of this transcription factor class have been identified on the basis of 
their homology to a defined region within the Drosophilia protein Runt The runt domain is 
part of the DNA-buiding domain of these factors. It consists mainly of beta-strands, does not 
contain alpha-helical regions and seems to be most similar to the palm domain found in DNA 
polymerase beta (rat). 

3. Superclass contains other transcrintion fac tors like Topper fist nmteins. HMGim. 
STAT. Pocket domai n proteins and AD2/EREBP-related factors. 

The classification of transcription factors originates fiwrn TRANSFAC database: 
http: //transfacgbf de/TRANSFAC/ 
Reference: Heinemeyer 

Several categories of proteins are coded for by clones of the invention within the 
overall group of "Transcription Factors".and include, among others, the following: 

Dg^: Dcoh is a bifunctional protein, complexed v«th biopterin. It serves as 
dimerization cofactor of hepatocyte nuclear factor-1 and catalyzes the dehydration of the 
biopterin cofector of phenylalanine hydroxylase. The Dcoh protein has been reported by 
OMIN to be associated (as potentially diagnostic, therapeutic, causative, and/or related, 
etc.) with the following diseases: 1) hyperphenylalanemia (OMIN 126090, #264070). 
Clones in this category include fkd2_46kl2. 

Signal transducing proteins: Beta-transducin subunits of G-proteins contain WD-40 
repeats. The beta subunits seem to be required for the replacement of GDP by GTP as well as 
for membrane anchoring and receptor recognition. Due to the zinc finger the novel protein 
seems to be a new molecule involved in signal transduction and transcription. These proteins 
have been reported by OMIN to be associated (as potentially diagnostic, therapeutic, 
causative, and/or related, etc. . .) with the following diseases: 1) essential hypertension 
(OMIN * 1391 30). Clones in this category include utel_li2. 
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* ♦ * 

The invention, therefore, specifically contemplates the following assemblages of 
materials, which track the above-identified fourteen functional groupings, that are useful in 
practicing the profiling aspects of the invention. One type of assemblage is nucleic acid- 
based and can include the following groupings of sequences and their derivatives: all 
sequences; human fetal brain sequences; brain derived sequences; human fetal kidney 
library sequences; kidney derived sequences; human mammary carcinoma library 
sequences; manrunary carcinoma derived sequences; human testis library sequences; testes 
derived sequences; cell cycle genes; ceil structure and motility genes; differentiation and 
development genes; intracellular transport and trafficking genes; metabolism genes; nucleic 
acid management genes; signal transduction genes; transmembrane protein genes; and 
transcription factor genes. Other assemblages contain proteins or their corresponding 
antibodies or antibody fragments, divided along the same groupings. 

Database Applications 

Because they are human genes and gene products, the inventive molecules are useful 
as members of a database. Such a database may be used, for example, in drug discovery 
and rationale drug design or in testing the novelty and non-obviousness of newly sequenced 
materials. In addition, they are particularly suited in designing variants for the profiling 
(and other) applications described herein. Hence, the following discussion of electronic 
embodiments applies equally to such variants, which, naturally, will be generated and 
stored using a computer using known methodologies. 

Accordingly, one aspect of the invention contemplates a database of at least one of 
the inventive sequences stored on computer readable media. Again, the individual 
sequences may be grouped with regard to the individual functional and structural groups 
mentioned above. While the individual sequences of a database may exist in printed form, 
they are preferably in electronic form, as m an ascii or a text file. They may also exist as 
word processing files or they may be stored in database applications like DB2, Sybase, 
Oracle, GCG and GenBank. One skilled in the art will understand the range of applications 
suitable for using and storing the electronic embodiments of the invention. 

"Computer readable media" refers to any medium which can be read and accessed 
by a computer. These include: magnetic storage media, like floppy discs, hard drives and 
magnetic tape; optical storage media, like CD-ROM; elecu-ical storage media, like RAM 
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and ROM; and hybrids of these categories, like magnetic/optical storage media. One 
skilled in the art will readily understand the scope of computer readable media and how to 
implement them. 

Biological Activities and Assays for Implementing Therapeutic and Diagnostic 
Applications 

This section provides assays for biological activity that are usefiil ui characterizing 
and quantifying the biological activity of the inventive molecules and their derivatives, 
which is relevant to the pharmacological effects of the inventive molecules. As used in this 
section, it wUl be understood that "protein" may also refer to the inventive antibodies 
(including fragments). 

Cytokine and Cell Prnl iferation/Differentiatinn Art^vj ty 

A protein of the present invention may exhibit cytokine, cell proliferation (either 
inducing or inhibiting) or cell differentiation (either inducing or inhibiting) activity or may 
induce production of other cytokines in certain cell populations. Many protein factors 
discovered to date, including all known cytokines, have exhibited activity in one or more 
factor dependent cell proliferation assays, and hence the assays serve as a convenient 
confirmation of cytokine activity. The activity of a protein of the present invention is 
evidenced by any one of a mmiber of routine factor dependent cell proliferation assays for 
cell lines including, without limitation, 32D, DA2, DAIG, TIO, B9, B9/11, BaF3, 
MC9/G. M + (preB M + ), 2E8. RB5, DAI, 123. T1165. HT2, CTLL2. TF-1. Mo7e and 
CMK. 

The activity of a protem of the invention may, among other means, be measured by 
the following methods : 

Assays for T-cell or tiiymocyte proliferation include without limitation those 
described in: Current Protocols in Immunology, Ed by J. E. CoHgan, A. M. Kniisbeek, D. 
H. Margulies, E. M. Shevach, W. Strober, Pub. Greene Publishing Associates and Wiley- 
Interscience (Chapter 3, In Vitio assays for Mouse Lymphocyte Function 3.1-3.19; Chapter 
7, Immunologic studies in Humans); Takai et al., J. Immunol. 137:3494-3500, 1986; 
BertagnolU et al., J. Imihunol. 145:1706-1712, 1990; BertagnoUi et al., CeUular 
Immunology 133:327-341, 1991; BertagnoUi, etal., I. Immunol. 149:3778-3783, 1992; 
Bowman etal., I. Immunol. 152:1756-1761, 1994. 
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Assays for cytokine production and/or proliferation of spleen cells, lymph node cells 
or thymocytes include, without limitation, those described in: Polyclonal T cell stimulation, 
Kniisbeek, A. M. and Shevach, E. M. In Current Protocols in Immunology. J. E. e.a. 
Coligan eds. Vol 1 pp. 3.12.1-3.12.14, John Wiley and Sons, Toronto. 1994; and 
Measurement of mouse and human interleukin gamma , Schreiber, R. D. In Current 
Protocols in Immunology. J. E. e.a. Coligan eds. Vol 1 pp. 6.8.1-6.8.8, John Wiley and 
Sons, Toronto. 1994. 

Assays for proliferation and differentiation of hematopoietic and lymphopoietic cells 
include, witiiout limitation, those described in: Measurement of Human and Murine 
Interleukin 2 and Interleukin 4, Bottomly. K., Davis, L. S. and Lipsky, P. E. In Current 
Protocols in Immunology. J. E. e.a. Coligan eds. Vol 1 pp. 6.3.1-6.3.12. John Wiley and 
Sons, Toronto. 1991; deVries et al., J. Exp. Med. 173:1205-1211, 1991; Moreau et al., 
Nanire 336:690-692, 1988; Greenberger et al., Proc. Natl. Acad. Sci. U.S.A. 80:2931- 
2938, 1983; Measurement of mouse and human interleukin 6-Nordan, R. In Current 
Protocols in Immunology. J. E. e.a. Coligan eds. Vol 1 pp. 6.6.1-6.6.5, John WUey and 
Sons, Toronto. 1991; Smith etal., Proc. Natl. Aced. Sci. U.S.A. 83:1857-1861, 1986; 
Measurement of human Interleukin 11-Bennett, F., Giannotti, J.; Clark. S. C. and Turner, 
K. J. In Current Protocols in Immunology. J. E. e.a. Coligan eds. Vol 1 pp. 6.15.1 John 
Wiley and Sons. Toronto. 1991; Measurement of mouse and human Interleukin 9-Ciarletta. 
A., Giannotti, J., Clark, S. C. and Turner, K. J. In Current Protocols in Immunology. J. 
E. e.a. Coligan eds. Vol 1 pp. 6.13.1, John Wiley and Sons, Toronto. 1991. 

Assays for T-cell clone responses to antigens (which will identify, among others, 
proteins that affect APC-T cell interactions as well as direct T-cell effects by measuring 
proliferation and cytokine production) include, without limitation, those described in: 
Current Protocols in Immunology. Ed by J. E. Coligan, A. M. Kniisbeek. D. H. 
Margulies. E. M. Shevach. W Strober. Pub. Greene Publishing Associates and Wiley- 
Interscience (Chapter 3, In Vitro assays for Mouse Lymphocyte Function; Chapter 6, 
Cytokines and their cellular receptors; Chapter 7, Immunologic studies in Humans); 
Weinberger et al., Proc. Natl. Acad. Sci. USA 77:6091-6095, 1980; Weinberger et al., 
Eur. J. Immun. 11:405-411, 1981; Takai etal., J. Immunol. 137:3494-3500, 1986; Takai 
etal., J. Immunol. 140:508-512, 1988. 
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Inmiune Stimulating or Suppressing Activity 

A protein of the present invention may also exhibit immune stimulating or immune 
suppressing activity, including without limitation the activities for which assays are 
described herein. A protein may be useful in the treatment of various immune deficiencies 
and disorders (inchiding severe combined immunodeficiency (SCID)), e.g., in regulating 
(up or dpwn) growth and proliferation of T and/or B lymphocytes, as well as effecting the 
cytolytic activity of NK ceUs and other cell populations. These immune deficiencies may be 
genetic or be caused by vital (e.g., HIV) as weU as bacterial or fungal mfections. or may 
result from autoimmune disorders. More specifically, infectious diseases causes by viral, 
bacterial, lungal or other infection may be treatable using a protein of the present invention, 
including infections by HIV, hepatitis viruses, herpesviruses, mycobacteria, Leishmania 
spp., malaria Sfp. and various fungal infections such as candidiasis. Of course, in this 
regard, a protein of tiie present invention may also be useful where a boost to flie immune 
system generally may be desirable, i.e., in the treatment of cancer. 

Autoimmune disorders which may be ti^eated using a protein of the present invention 
include, for example, connective tissue disease, multiple sclerosis, systemic lupus 
erythematosus, rheumatoid arthritis, autoimmune puhnonaiy inflammation, GuUIaio-Barre 
syndrome, autoimmune thyroiditis, insulin dependent diabetes mellitis, myasthenia gravis, 
graft-versus-host disease and autoimmune inflammatory eye disease. Such a protein of the 
present invention may also to be usefiil in the treatment of allergic reactions and conditions, 
such as asthma (particularly allergic asthma) or other respiratory problems. Other 
conditions, in which inunune suppression is desired (including, for example, organ 
a:ansplantation), may also be treatable using a protein of the present invention. 

Using the proteins of the invention it may also be possible to modify immune 
responses, in a number of ways. Down leguhition may be in the form of inhibiting or 
blocking an immune response already in progress or may involve preventing the induction 
of an immune response. The functions of activated T cells may be inhibited by suppressing 
T cell responses or by inducing specific tolerance in T cells, or both. Immunosuppression 
of T cell responses is generally an active, non-antigen-specific, process which requires 
continuous exposure of tiie T cells to the suppressive agent. Tolerance, which involves 
inducing non-responsiveness or anergy in T cells, is distinguishable from 
immunosuppression in tiiat it is generally antigen-specific and persists after exposure to the 
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tolerizing agent has ceased. Operationally, tolerance can be demonstrated by the lack of a T 
cell response upon reexposure to specific antigen in the absence of the tolerizing agent. 

Down regulating or preventing one or more antigen functions (including without 
limitation B lymphocyte antigen functions (such as, for example, B7)), e.g., preventing 
high level lymphokine synthesis by activated T cells, will be useful in situations of tissue, 
skin and organ transplantation and in graft-versus-host disease (GVHD). For example, 
blockage of T cell function should result in reduced tissue destruction in tissue 
transplantation. Typically, in tissue transplants, rejection of the transplant is initiated 
through its recognition as foreign by T cells, followed by an immune reaction that destroys 
the transplant. The administration of a molecule which inhibits or blocks interaction of a B7 
lymphocyte antigen with its natural ligand(s) on immune cells (such as a soluble, 
monomeric form of a peptide having B7-2 activity alone or in conjunction with a 
monomeric form of a peptide having an activity of another B lymphocyte antigen (e.g., B7- 
1, B7-3) or blocking antibody), prior to transplantation can lead to the binding of the 
molecule to the natural ligand(s) on the immune cells without transmitting the 
corresponding costimulatory signal. Blocking B lymphocyte antigen function in this matter 
prevents cytokine synthesis by immune cells, such as T cells, and thus acts as an 
immunosuppressant. Moreover, the lack of costimulation may also be sufRcient to anergize 
the T cells, thereby inducing tolerance in a subject. Induction of long-term tolerance by B 
lymphocyte antigen-blocking reagents may avoid the necessity of repeated administration of 
these blocking reagents. To achieve sufficient immunosuppression or tolerance in a subject, 
it may also be necessary to block the function of a combination of B lymphocyte antigens. 

The efficacy of particular blocking reagents in preventing organ transplant rejection 
or GVHD can be assessed using animal models that are predictive of efficacy in humans. 
Examples of appropriate systems which can be used include allogeneic cardiac grafts in rats 
and xenogeneic pancreatic islet cell grafts in mice, both of which have been used to 
examine the immunosuppressive effects of CTLA4Ig fusion proteins in vivo as described in 
Lenschow et al., Science 257:789-792 (1992) and Turka et al., Proc. NaU. Acad. Sci USA, 
89:11102-11105 (1992). In addition, murine models of GVHD (see Paul ed.. Fundamental 
Immunology, Raven Press, New York, 1989, pp. 846-847) can be used to determine the 
effect of blocking B lyn^)hocyte antigen function in vivo on the development of that 
disease. 
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Blocking antigen function may also be therapeutically useful for treating 
autoimmune diseases. Many autoimmune disorders are the result of inappropriate activation 
of T cells that are reactive against self tissue and which promote the production of cytokines 
and autoantibodies involved in the pathology of the diseases. Preventing the activation of 
autoreactive T cells may reduce or eliminate disease symptoms. Administration of reagents 
which block costimulation of T cells by disrupting receptor: ligand interactions of B 
lymphocyte antigens can be used to inhibit T cell activation and pi^vent production of 
autoantibodies or T cell-derived cytokines which may be involved in tiie disease process. 
Additionally, blockiiig reagents may induce antigen-specific tolerance of autoreactive T 
cells which could lead to long-term relief from flie disease. The efficacy of blocking 
reagents in preventing or alleviating autoimmune disorders can be determined using a 
number of well-characterized animal models of human autoimmune diseases. Examples 
include murine experimental autoimmune encephalitis, systemic lupus erythmatosis in 
MRL/lpr/lpr mice or NZB hybrid mice, murine autoimmune collagen arthritis, diabetes 
mellitus in NOD mice and BB rats, and murine experimental myastiienia gravis (see Paul 
ed.. Fundamental Immunology, Raven Press, New York, 1989, pp. 840-856). 

Upregulation of an antigen function (preferably a B lymphocyte antigen function), as 
a means of up regulating immune responses, may also be useful in tiierapy. Upregulation of 
immune responses may be in the form of enhancing an existing immune response or 
eliciting an initial immune response. For example, enhancing an immune response tiirough 
stimulating B lymphocyte antigen function may be useful in cases of viral infection. In 
addition, systemic viral diseases such as influenza, the common cold, and encephalitis 
might be alleviated by die administration of stimulatoiy forms of B lymphocyte antigens 
systemically. 

Alternatively, anti-vital immune responses may be enhanced in an infected patient 
by removing T cells from the patient, costimulating tiie T cells in vitto with viral antigen- 
pulsed APCs either expressing a peptide of the present invention or togedier with a 
stimulatory form of a soluble peptide of the present invention and reinti-oducing die in vitro 
activated T cells info the patient. Anotiier metiiod of enhancing anti-viral immune responses 
would be to isolate infected cells from a patient, transfect them with a nucleic acid encoding 
a protein of die present invention as described herein such that Uie cells express all or a 
portion of die protein on Uieir surface, and reintroduce die transfected cells into die patient. 
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The infected cells would now be capable of delivering a costimulatory signal to, and 
thereby activate, T cells in vivo. 

In anodier application, up regulation or enhancement of antigen function (preferably 
B lymphocyte antigen function) may be useful in the induction of tumor immunity. Tumor 
cells (e.g., sarcoma, melanoma, lymphoma, leukemia, neuroblastoma, carcinoma) 
transfected with a nucleic acid encoding at least one peptide of the present invention can be 
administered to a subject to overcome tumor-specific tolerance in the subject. If desired, the 
tumor cell can be transfected to express a combination of peptides. For example, tumor 
cells obtained from a patient can be transfected ex vivo with an expression vector directing 
the expression of a peptide having B7-2-like activity alone, or in conjunction with a peptide 
having B7-l-like activity and/or B7-3-Iike activity. The transfected tumor cells are returned 
to the patient to result in expression of the peptides on the surface of the transfected cell. 
Alternatively, gene therapy techniques can be used to target a tumor cell for transfection in 
vivo. 

The presence of the peptide of the present invention having the activity of a B 
lymphocyte antigen(s) on the surface of the tumor cell provides the necessary costimulation 
signal to T cells to induce a T cell mediated immune response against the transfected tumor 
cells. In addition, tumor cells which lack MHC class I or MHC class II molecules, or 
which fail to reexpress sufficient mounts of MHC class I or MHC class 11 molecules, can 
be transfected with nucleic acid encoding all or a portion of (e.g., a cytoplasmic-domain 
truncated portion) of an MHC class I alpha chain protein and beta 2 microglobulin protein 
or an MHC class II alpha chain protein and an MHC class n beta chain protein to thereby 
express MHC class I or MHC class 11 proteins on the cell surface. Expression of the 
appropriate class I or class n MHC in conjunction with a peptide having the activity of a B 
lymphocyte antigen (e.g., B7-1, B7-2, B7-3) induces a T cell mediated immune response 
against the transfected tumor cell. Optionally, a gene encoding an antisense construct which 
blocks expression of an MHC class II associated protem, such as the invariant chain, can 
also be cotransfected with a DNA encoding a peptide having the activity of a B lymphocyte 
antigen to promote presentation of tumor associated antigens and induce tumor specific 
immunity. Thus, the induction of a T cell mediated immune response in a human subject 
may be sufficient to overcome tumor-specific tolerance in the subject. 
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The activity of a protein of the invention may, among other means, be measured by 
the following methods: 

Suitable assays for tfiymocyte or splenocyte cytotoxicity include, without limitation, 
those described in: Current Protocols in Immunology, Ed by J. E. Coligan, A. M. 
Kruisbeek, D. H. Margulies, E. M. Shevach, W. Strober, Pub. Greene Publishing 
Associates and Wiley-Interscience (Chapter 3, In Vitro assays for Mouse Lymphocyte 
Function 3.1-3.19; Chapter 7, Immunologic studies in Humans); Herrmann et al., Proc. 
Nad. Acad. Sci. USA 78:2488-2492, 1981; Hemnannet al., J. Immunol. 128:1968-1974, 
1982; Handa et al., J. Immunol. 135:1564-1572, 1985; Takai et al., I. Immunol. 137:3494- 
3500. 1986; Takai et al., J. Immunol. 140:508-512, 1988; Herrmann et al., Proc. Natl. 
Acad. Sci. USA 78:2488-2492, 1981; Herrmann et al., J. Immunol. 128:1968-1974, 1982; 
Handa et al., J. Inmiunol. 135:1564-1572, 1985; Takai et al., J. Immunol. 137:3494-3500, 
1986; Bowmanet al., J. Virology 61:1992-1998; Takai et al., J. Immunol. 140:508-512, 
1988; Bertagnolli et al.. Cellular Immunology 133:327-341, 1991; Brown et al., J. 
Immunol. 153:3079-3092, 1994. 

Assays for T-cell-dependent immunoglobulin responses and isotype switching 
(which will identiiy, among odicrs, proteins that modulate T-ccU dependent antibody 
responses and that affect Thl/Th2 proffles) include, without limitation, those described in: 
Maliszewski. J. Immunol. 144:3028-3033, 1990; and Assays for B cell function: In vitro 
antibody production, Mond, J. J. and Brunswick, M. In Current Protocols in Immunology. 
J. E. e.a. Coligan eds. Vol 1 pp. 3.8.1-3.8.16, John WUey and Sons. Toronto. 1994. 

Mixed lymphocyte reaction (MLR) assays (which will identify, among oUiers, 
proteins that generate predominantly Thl and CTL responses) include, without limitation, 
those described in: Current ProK)cob in bnmunology, Ed by J. E. Coligan, A. M. 
Kruisbeek, D. H. Margulies, E. M. Shevach, W. Strober, Pub. Greene Publishing 
Associates and Wiley-Interscience (Chapter 3, In Vitro assays for Mouse Lymphocyte 
Function 3.1-3.19; Chapter 7, Immunologic studies in Humans); Takai et al., J. Immunol. 
137:3494-3500, 1986; Takai et al., J. Immunol. 140:508-512, 1988; Bertagnolli et al., J. 
Immunol. 149:3778-3783, 1992. 

Dendritic cell-dependent assays (which will identify, among oUiers, proteins 
expressed by dendritic cells Uiat activate naive T-cells) include, without limitation, those 
described in: Guery et al., J. Immunol. 134:536-544, 1995; Inaba et al.. Journal of 


65 


wo 01/12659 PCT/IBOO/01496 

Experimental Medicine 173:549-559, 1991; Macatonia et al., Journal of Immunology 
154:5071-5079, 1995; Porgador et al.. Journal of Experimental Medicine 182:255-260, 
1995; Nair et al.. Journal of Virology 67:4062-4069, 1993; Huang et al.. Science 264:961- 
965. 1994; Macatonia et al.. Journal of Experimental Medicine 169:1255-1264. 1989; 
Bhardwaj et al.. Journal of Clinical Investigation 94:797-807, 1994; and Inaba et al.. 
Journal of Experimental Medicine 172:631-640, 1990. 

Assays for lymphocyte survival/apoptosis (which will identify, among others, 
proteins that prevent apoptosis after superantigen induction and proteins that regulate 
lymphocyte homeostasis) include, without limitation, those described in: Darzynkiewicz et 
al.. Cytometry 13:795-808, 1992; Gorczyca et al.. Leukemia 7:659-670, 1993; Gorczyca et 
al., Cancer Research 53:1945-1951, 1993; Itoh et al„ Cell 66:233-243, 1991; Zacharchuk, 
Journal of Immunology 145:4037-4045, 1990; Zamai et aL, Cytometry 14:891-897, 1993; 
Gorczyca et al., International Journal of Oncology 1:639-648, 1992. 

Assays for proteins that influence early steps of T-cell commitment and development 
include, without limitation, those described in: Antica et al.. Blood 84:111-117, 1994; Fine 
et al.. Cellular Immunology 155:111-122, 1994; Galy et al.. Blood 85:2770-2778, 1995; 
Toki et al., Proc. Nat. Acad Sci. USA 88:7548-7551, 1991. 
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Hematopoiesis Regulating Activity 

A protein of the present invention may be usefiil in regulation of hematopoiesis and, 
consequently, in the treatment of myeloid or lymphoid cell deficiencies. Even marginal 
biological activity in support of colony forming cells or of factor-dependent ceU lines 
indicates involvement in regulating hematopoiesis, e.g. in supporting the growth and 
proliferation of erythroid progenitor cells alone or in combmation with other cytokines, 
thereby indicating utility, for example, in treating various anemias or for use in conjunction 
with irradiation/chemotherapy to stimulate the production of erythroid precursors and/or 
erythroid cells; in supporting the growth and proliferation of myeloid cells such as 
granulocytes and monocytes/macrophages (i.e., traditional CSF activity) useful, for 
example, in conjunction with chenwtherapy to prevent or treat consequent myelo- 
suppression; in supporting the growth and proliferation of megakaryocytes and 
consequently of platelets thereby allowing prevention or treatment of various platelet 
disorders such as thrombocytopenia, and generally for use in place of or complimentary to 
platelet transfusions; and/or in supporting the growth and proliferation of hematopoietic 
stem cells which are capable of maturing to any and all of the above-mentioned 
hematopoietic cells and therefore find therapeutic utility m various stem cell disorders (such 
as those usuaUy treated with transplantation, including, without limitation, aplastic anemia 
and paroxysmal nocturnal hemoglobinuria), as well as in repopulating the stem cell 
compartment post irradiation/chemotherapy, either in-vivo or ex-vivo (i.e., in conjunction 
with bone marrow transplantation or with perq)beral progenitor cell transplantation 
(homologous or heterologous)) as normal cells or genetically manipulated for gene therapy. 

The activity of a protein of the invention may, among oflier means, be measured by 
the following methods: 

Suitable assays for proliferation and differentiation of various hematopoietic lines 
are cited above. 

Assays for embryonic stem cell differentiation (which will identify, among others, 
proteins that influence embryonic differentiation hematopoiesis) inchide, without limitation, 
those described in: Johansson et ai. Cellular Biology 15:141-151, 1995; Keller et al.. 
Molecular and Cellular Biology 13:473-486, 1993; McCIanahan et al.. Blood 81:2903- 
2915, 1993. 
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Assays for stem cell survival and differentiation (which will identify, among others, 
proteins that regulate lympho-hematopoiesis) include, without limitation, those described 
in: Methylcellulose colony forming assays, Freshney, M. G. In Culture of Hematopoietic 
Cells. R. 1. Freshney, et al. eds. Vol pp. 265-268, Wiley-Liss, Inc., New York, N.Y. 
1994; Hirayama et al., Proc. Natl. Acad. Sci. USA 89:5907-5911, 1992; Primitive 
hematopoietic colony forming cells with high proliferative potential, McNiece, I. K. and 
Briddell, R. A. In Culture of Hematopoietic Cells. R. I. Freshney, et al. eds. Vol pp. 23- 
39, Wiley-Liss, Inc., New York, N.Y. 1994; Neben et al., Experimental Hematology 
22:353-359, 1994; Cobblestone area forming cell assay, Ploemacher, R. E. In Culture of 
Hematopoietic Cells. R. I. Freshney, et al. eds. Vol pp. 1-21, Wiley-Liss, Inc., New York, 
N.Y. 1994; Long term bone marrow cultures in the presence of stromal cells, Spooncer, 
E., Dexter, M. and Allen, T. In Culture of Hematopoietic Cells. R. I. Freshney, et al. eds. 
Vol pp. 163-179, Wiley-Liss. Inc., New York, N.Y. 1994; Long term culture initiating cell 
assay, Sutherland, H. J. In Culture of Hematopoietic Cells. R. 1. Freshney, et al. eds. Vol 
pp. 139-162, Wiley-Liss, Inc., New York. N.Y. 1994. 

Tissue Growth Activitv 

A protein of the present invention also may have utility in compositions used for 
bone, cartilage, tendon, ligament and/or nerve tissue growth or regeneration, as well as for 
wound healing and tissue repair and replacement, and in the treatment of bums, incisions 
and ulcers. 

A protein of the present invention, which induces cartilage and/or bone growth in 
circumstances where bone is not normally formed, has application in the healing of bone 
fractures and cartilage damage or defects in humans and other animals. Such a preparation 
employing a protein of the invention may have prophylactic use in closed as well as open 
fracture reduction and also in the improved fixation of artificial joints. De novo bone 
formation induced by an osteogenic agent contributes to the repair of congenital, trauma 
induced, or oncologic resection induced craniofacial defects, and also is useful in cosmetic 
plastic surgery. 

A protein of this invention may also be used in the treatment of periodontal disease, 
and in other tooth repair processes. Such agents may provide an environment to attract 
bone-forming cells, stimulate growth of bone-forming cells or induce differentiation of 
progenitors of bone-forming cells. A protein of the invention may also be useful in the 
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treatment of osteoporosis or osteoarthritis, such as through stimulation of bone and/or 
cartilage repair or by blocking inflammation or processes of tissue destruction (coUagenase 
activity, osteoclast activity, etc) mediated by inflammatory processes. 

Another category of tissue regeneration activity that may be attributable to the 
protein of the present invention is tendon/ligament formation. A protein of the present 
invention, which induces tendon/ligament-like tissue or other tissue formation in 
circumstances where such tissue is not normally formed, has application in the healing of 
tendon or ligament tears, deformities and other tendon or ligament defects in humans and 
other animals. Such a preparation employing a tendon/ligament-like tissue inducing protein 
may have prophylactic use in preventing damage to tendon or ligament tissue, as well as 
use in the improved fixation of tendon or ligament to bone or other tissues, and in repairing 
defects to tendon or ligament tissue. De novo tendon/ligament-like tissue formation induced 
by a composition of the present invention contributes to the repair of congenital, trauma 
induced, or other tendon or ligament defects of other origin, and is also usefiil in cosmetic 
plastic surgery for attachment or repair of tendons or ligaments. The compositions of the 
present invention may provide envu-onment to attract tendon- or ligament-forming cells, 
stimulate growth of tendon- or ligament-forming cells, induce differentiation of progenitors 
of tendon- or ligament-forming cells, or induce growth of tendon/ligament cells or 
progenitors ex vivo for return in vivo to effect tissue repair. The conqwsitions of the 
mvention may also be useful in the treatment of tendonitis, carpal tunnel syndrome and 
other tendon or ligament defects. The compositions may also include an appropriate matrix 
and/or sequestering agent as a carrier as is well known in the art. 

The protein of the present invention may also be useful for proliferation of neural 
cells and for regeneration of nerve and bram tissue, i.e. for the treatment of central and 
peripheral nervous system diseases and neuropathies, as well as mechanical and traumatic 
disorders, which involve degeneration, death or trauma to neural cells or nerve tissue. 
More specifically, a protein may be used in the treatment of diseases of the peripheral 
nervous system, such as peripheral nerve injuries, peripheral neuropathy and localized 
neuropathies, and central nervous system diseases, such as Alzheimer's, Parkinson's 
disease, Huntington's disease, amyotrophic lateral sclerosis, and Shy-Drager syndrome. 
Further conditions which may be treated in accordance with the present invention include 
mechanical and traumatic disorders, such as spinal cord disorders, head trauma and 
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cerebrovascular diseases such as stroke. Peripheral neuropathies resulting from 
chemotherapy or other medical therapies may also be treatable using a protein of the 
invention. 

Proteins of the invention may also be useful to promote better or faster closure of 
non-healing wounds, including without limitation pressure ulcers, ulcers associated with 
vascular insufficiency, surgical and traumatic wounds, and the like. 

It is expected that a protein of the present invention may also exhibit activity for 
generation or regeneration of other tissues, such as organs (including, for example, 
pancreas, liver, intestine, kidney, skin, endothelium), muscle (smooth, skeletal or cardiac) 
and vascular (including vascular endothelium) tissue, or for promoting the growth of cells 
comprising such tissues. Part of the desired effects may be by inhibition or modulation of 
fibrotic scarring to allow normal tissue to regenerate. A protein of the invention may also 
exhibit angiogenic activity. 

A protein of the present invention may also be useful for gut protection or 
regeneration and treatment of lung or liver fibrosis, reperfusion injury in various tissues, 
and conditions resulting from systemic cytokine damage. 

A protein of the present invention may also be useful for promoting or inhibiting 
differentiation of tissues described above from precursor tissues or cells; or for inhibiting 
the growth of tissues described above. 

The activity of a protein of the invention may, among other means, be measured by 
the following methods: 

Assays for tissue generation activiQf inchide, without limitation, those described in: 
International Patent Publication No. WO95/16035 (bone, cartilage, tendon); International 
Patent Publication No. WO95/05846 (nerve, neuronal); International Patent Publication 
No. WO91/07491 (skin, endothelium). 

Assays for wound healing activity include, without limitation, those described in: 
Winter, Epidermal Wound Healing, pps. 71-112 (Maibach, H. L and Rovee, D. T., eds.). 
Year Book Medical Publishers, Inc., Chicago, as modified by Eaglstein and Mertz, J. 
Invest. Dermatol 71:382-84 (1978). 

Activin/Inhibin Activitv 

A protein of the present invention may also exhibit activin- or inhibin-related 
activities. Inhibins are characterized by their ability to inhibit the release of follicle 
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Stimulating hormone (FSH), while activins and are characterized by their ability to 
stunulate the release of follicle stimulating hormone (FSH). Thus, a protein of the present 
invention, alone or in heterodimers with a member of the inhibin alpha family, may be 
useful as a contraceptive based on the ability of inhibins to decrease fertility in female 
manmials and decrease spermatogenesis in male mammals. Administration of sufficient 
amounts of other inhibins can induce infertility in these mammals. Alternatively, the protein 
of the invention, as a homodimer or as a heterodimer with other protein subunits of the 
inhibin- beta group, may be useful as a fertility inducing therapeutic, based upon the ability 
of activin molecules in stimulating FSH release from cells of tte anterior pituitary. See, for 
example. U.S. Pat. No. 4.798,885. A protein of the invention may also be useful for 
advancement of the onset of fertility in sexually immature mammals, so as to increase the 
lifetime reproductive performance of domestic animals such as cows, sheep and pigs. 

The activity of a protein of the invention may, among other means, be measured by 
the following methods: 

Assays for activin/inhibin activity include, without limitation, those described m: 
Vale et aL, Endocrinology 91:562-572, 1972; Ling et ah. Nature 321:779-782, 1986; Vale 
et aL, Nature 321:776-779, 1986; Mason et aL, Nature 318:659-663, 1985; Forage et al., 
Proc. Natl. Acad. Sci. USA 83:3091-3095, 1986. 

Chemotactic/Chcmokinetic Activity 

A protein of the present invention may have chemotactic or chemokinetic activity 
(e.g., act as a chemokine) for mammalian cells, including, for example, monocytes, 
fibroblasts, neutrophils, T-cells, mast cells, eosinophils, epithelial and/or endothelial cells. 
Chemotactic and chemokinetic proteins can be used to mobilize or attract a desired cell 
population to a desired site of action. Chemotactic or chemokinetic proteins provide 
particular advantages in treatment of wounds and other trauma to tissues, as well as in 
treatment of localized infections. For exanq)le, attraction of lymphocytes, monocytes or 
neutrophils to tumors or sites of infection may result in improved immune responses against 
the tumor or infecting agent. 

A protein or peptide has chemotactic activity for a particular cell population if it can 
stimulate, directly or indirectiy, the directed orientation or movement of such cell 
population. Preferably, the protein or peptide has the ability to directly stimulate directed 
movement of cells. Whether a particular protein has chemotactic activity for a population of 


71 


wo 01/12659 PCT/IBOO/01496 

cells can be readily determined by employing such protein or peptide in any known assay 
for cell chemotaxis. 

The activity of a protein of the invention may, among other means, be measured by 
the following methods: 

Assays for chemotactic activity (which will identify proteins that induce or prevent 
chemotaxis) consist of assays that measure the ability of a protein to induce the migration of 
cells across a membrane as well as the ability of a protein to induce the adhesion of one cell 
population to another cell population. Suitable assays for movement and adhesion include, 
without limitation, those described in: Current Protocols in Immunology, Ed by J. E. 
Coligan, A. M. Kruisbeek, D. H. Marguiles, E. M. Shevach, W. Strober, Pub, Greene 
Publishing Associates and Wiley-Interscience (Chapter 6.12, Measurement of alpha and 
beta Chemokines 6.12.1-6.12.28; Taub et al. J. Clin. Invest. 95:1370-1376, 1995; Lind et 
al. APMIS 103:140-146, 1995; MuUer et al Eur. J. Immunol. 25:1744-1748; Gruber et al. 
J. of Inmiunol. 152:5860-5867. 1994; Johnston et al. J. of Immunol. 153:1762-1768, 1994. 

Hemostatic and Thrombolytic Activity 

A protein of the invention may also exhibit hemostatic or thrombolytic activity. As a 
result, such a protein is expected to be useful in treatment of various coagulation disorders 
(including hereditary disorders, such as hemophilias) or to enhance coagulation and other 
hemostatic events in treating wounds resulting from trauma, surgery or other causes. A 
protein of the invention may also be useful for dissolving or inhibiting formation of 
thromboses and for treatment and prevention of conditions resulting therefrom (such as, for 
example, infarction of cardiac and central nervous system vessels (e.g., stroke). 

The activity of a protein of the invention may, among other means, be measured by 
the following methods: 

Assay for hemostatic and thrombolytic activity include, without limitation, those 
described in: Linet et al., J. Clin. Pharmacol. 26:131-140, 1986; Burdick et al.. 
Thrombosis Res. 45:413-419, 1987; Humphrey et al.. Fibrinolysis 5:71-79 (1991); Schaub, 
Prostaglandins 35:467^74, 1988. 

Receptor/Lipand Activity 

A protein of the present invention may also demonstrate activity as receptors, 
receptor ligands or inhibitors or agonists of receptor/ligand interactions. Examples of such 
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receptors and ligands include, without limitation, cytokine receptors and their ligands, 
receptor kinases and their ligands, receptor phosphatases and their ligands, receptors 
involved in cell-cell interactions and their ligands (including without limitation, cellular 
adhesion molecules (such as selectins, integrins and their ligands) and recq>tor/ligand pairs 
involved in antigen presentation, antigen recognition and development of cellular and 
humoral immune responses). Receptors and ligands are also useful for screening of 
potential peptide or small molecule inhibitors of the relevant receptor/ligand interaction. A 
protein of the present invention (including, without limitation, fragments of receptors and 
ligands) may tfiemselves be useful as inhibitors of recq)tor/ligand interactions. 

The activity of a protein of the invention may, among other means, be measured by 
the following methods: 

Suitable assays for receptor-ligand activity include without limitation those 
described in:Current Protocols in Immunology, Ed by J. E. Coligan, A. M. Kruisbeek, D. 
H. Margulies, E. M. Shevach, W. Sttober, Pub. Greene Publishing Associates and Wiley- 
Interscience (Chapter 7.28, Measurement of Cellular Adhesion under static conditions 
7.28.1-7.28.22), Takai et al., Proc. Natl. Acad. Sci. USA 84:6864-6868, 1987; Bierer 
al., J. Exp. Med. 168:1145-1156. 1988; Rosenstein et al., J. Exp. Med. 169:149-160 
1989; Stoltenborg et al., J. Immunol. Methods 175:59-68, 1994; Stitt et al., Cell 80:661- 
670, 1995. 

Anti-Inflammatorv Activity 

Proteins of the present invention may also exhibit anti-inflammatory activity. The 
anti-inflanunatory activity may be achieved by providing a stimulus to cells involved in tiie 
inflammatory response, by inhibiting or promoting cell-cell interactions (such as, for 
example, cell adhesion), by inhibiting or promoting chemotaxis of cells involved in the 
inflammatory process, inhibiting or promoting cell extravasation, or by stimulating or 
suppressing production of other factors which more direcUy inhibit or promote an 
inflammatory response. Proteins exhibiting such activities can be used to treat inflammatory 
conditions including chronic or acute conditions), including without limitation intimation 
associated with infection (such as septic shock, sepsis or systemic inflammatoiy response 
syndrome (SIRS)), ischemia-reperfiision injury, endotoxin lethality, arthritis, complement- 
mediated hyperacute rejection, nephritis, cytokine or chemokine-induced lung injury, 
inflammatory bowel disease, Crohn's disease or resulting firom over production of 
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cytokines such as TNF or IL-1. Proteins of the invention may also be useful to treat 
anaphylaxis and hypersensitivity to an antigenic substance or material. 


Tumor Inhibition Activity 

In addition to the activities described above for immunological treatment or 
prevention of tumors, a protein of the invention may exhibit other anti-tumor activities. A 
protein may inhibit tumor growth directly or indirectly (such as. for example, via ADCC). 
A protein may exhibit its tumor inhibitory activity by acting on tumor tissue or tumor 
precursor tissue, by inhibiting formation of tissues necessary to support tumor growth (such 
as, for example, by inhibiting angiogenesis), by causing production of other factors, agents 
or cell types which inhibit tumor growth, or by suppressing, eliminating or inhibiting 
factors, agents or cell types which promote tumor growth. 

Other Activities 

A protein of the invention may also exhibit one or more of the following additional 
activities or effects: inhibiting the growth, infection or function of, or killing, infectious 
agents, including, without limitation, bacteria, viruses, fungi and other parasites; effecting 
(suppressing or enhancing) bodily characteristics, mcluding, without limitation, height, 
weight, hair color, eye color, skin, fat to lean ratio or other tissue pigmentation, or organ 
or body part size or shape (such as, for example, breast augmentation or diminution, 
change in bone form or shape); effecting biorhythms or caricadic cycles or rhythms; 
effecting the fertility of male or female subjects; effecting the metabolism, catabolism, 
anabolism, processing, utilization, storage or elimination of dietary fat, lipid, protein, 
carbohydrate, vitamins, minerals, cofactors or other nutritional factors or component(s); 
effecting behavioral characteristics, including, without limitation, appetite, libido, stress, 
cognition (including cognitive disorders), depression (including depressive disorders) and 
violent behaviors; providing analgesic effects or other pain reducing effects; promoting 
differentiation and growth of embryonic stem cells in lineages other than hematopoietic 
lineages; hormonal or endocrine activity; in the case of enzymes, correcting deficiencies of 
the enzyme and treating deficiency-related diseases; treatment of hyperproliferative 
disorders (such as, for example, psoriasis); immunoglobulin-like activity (such as, for 
example, the ability to bind antigens or complement); and the ability to act as an antigen in 
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a vaccine composition to raise an immune response against such protein or another 
material or entity which is cross-reactive with such protein. 


Particular Applications for Certain Clones 

The following sets out a non-exclusive list of applications for certain embodiments of 
the invention. In the interest of economy, applications relevant to multiple embodiments are 
not duplicated in this list. Other embodiments described in below have similar 
characteristics, as described therein. The artisan is directed, therefore, to this section for 
similar descriptions of the functions of other embodiment. 
Testes 

htes3_l 5c24: The new protein can find application in modulation of 2-hydroxyacid 
dehydrogenases-dependent pathways and as a new enzyme for biotechnologic 
production processes. 

htes3_l 5i5: The new protein can find application in modulating the structure of the 
human spermatozoa radia spoke head and modulation of sperm motility in men. 

htes3_l 5kl 1 : The novel protein contains a protein kinase ATP-binding region 
signature and a serine/threonine protein kinase active-site signature. The new protein 
can find application in modulation of intracellular signal pathways dependent on this 
kinase. 

htes3_l 7nl2: The new protein can find application in modulatingA>locking the 
expression of SOX-controUed genes. 

htes3_20k2: The new protein can find application as a target for the development of 
new nociception-modulating drugs. 

htes3_20ml8: The new protein can find application in modulation of mitochondrial 
DNA replication and maintenance. 

htes3_20d4: The new protein can find application in the regulation of gene 
expression by activition of nuclear GTP-binding proteins. The X-linked retinitis 
pigmentosa is a result of a defect GTPase regulator, which contains a RCCl-type 
repeat. 
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htes3_21j 15: NY-CO-33 is a protein recognised by autologous antibodies of human 
colon cancer patients. The novel protein contains 4 C2H2 Zinc fingers and is a new 
putativ transcription factor. The new protein can find application in 
modulating/blocking the expression of genes controlled by this transcription factor. 

The new protein can find application in modulating chromosome transport in mitosis 
and meiosis and modulation of cell division. 

htes3_26g22: The new protein can find application in modulating chromosome 
transport in mitosis and meiosis and modulation of cell division. The novel TBP- 
binding protem is considered to participate in transcription regulation through the 
interaction with TBP. The new protein can find application in modulation of gene 
transcription. 

htes3_21 116: The new protein can find application in modulation of protein 
translocation into the endoplasmic reticulum. 

htes3_27dl : The novel protein can find application in modulation of ubiquitin- and 
protein metabolism in cells. 

htes3_2ml8: The novel protein can find application as multifunctional nuclease / 
exoribonuclease. 

htes3_35b4: The new protein can find application in modulation of the mitotic 
spindle. 

htes3_35b5: The novel protein can find application in modulating the v-ATPase 
activity in endocytic and secretory organelles. 

htes3_35e21 : Due to the close relationship to human interleukin-7, the novel 
interleukin is expected to act as a new growth factor for human B lineage cells. 
Additionally, the protein should induce the gene rearrangement of the T-cell receptor 
repertoire, leading to thymocyte commitment, and subsequently induce both cytotoxic 
T-cell- and lymphocyte-activated killer cells. This new interleukin could find clinical 
application in a variety of conditions of hematolymphopoietic failure and different 
tumours, because of its recruitment of B cell lineage cells, cytotoxic T-cell- and 
lymphocyte-activated killer cells. 
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htes3_35kl6: Therefore it is a new fatty acid-Co A synthetasese/ligase with unknown 
substrate. The new protein can find application in modulation of fatty acid 
metabolism and as a new enzyme for biotechnologic production processes. 

htes3_35nI2: The new protein can find application in modulation of ADP-transport 
and energy metabolism in cells/mitochondria. 

htes3_35n9: The new protein can find application in modulation of carboxylester 
metabolism and as a new enzyme for biotechnologic production processes. 

htes3_35p22: The novel protein is closely raleted to human tre-2 and other enzymes 
involved in the degradation of ubiquitinated proteins. The human tre-2 oncogene 
encodes a deubiquitinating enzyme, indicating a role for the ubiquitin system in 
mammalian growth control. The novel protein can find application in cancer 
diagnostics and treatment, and in regulating protein stability and growth control via 
regulation of ubiquitination. 

htes3_4h6: The novel kinesin protein can find application in modulating the function 
of kinesin and modulating intracellular transport via/on microtubules. 

htes3_72kl5: FGDl-related F-actin-binding protein (Farbin/FGDl) is a novel F-actin- 
binding protein. The gene locus fgdl seems to be responsible for faciogenital 
dysplasia or Aarskog-Scott syndrome. Frabin binds F-actin and shows F-actin-cross- 
linking activity. Overexpression of fiabin in Swiss 3T3 cells and C0S7 cells induces 
cell shape change and c-Jun N-terminal kinase activation, as described for FGDl . 
Because FGDl has been shown to serve as a GDP/GTP exchange protein for Cdc42 
small G protein, it is likely that frabin is a direct linker between Cdc42 and the actin 
cytoskeleton. Cdc42p is an esin yeast, Cdc42p transduces signals to the actin 
cytoskeleton to initiate and maintain polarized growth and to mitogen-activated 
protein morphogenesis. In mammalian cells, Cdc42p regulates a variety of actin- 
dependent events and induces the JNK/SAPK protein kinase cascade, which leads to 
the activation of transcription factors within the nucleus. The novel protein seems to 
be the human orthologue of rat frabin. 

The new protein can find application in modulating of cell structure and motility as 
well as modulation of the JNK/SAPK pathway. 
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htes3_72pl6: As Mem3, the novel protein is similar to yeast VPS (vacuolar protein 
sorting) 35. The null allele of VPS35 results in yeast in a differential defect in the 
sorting of vacuolar carboxypeptidase Y (CPY), proteinase A (PrA), proteinase B 
(PrB), and alkaline phosphatase (ALP). The new protein can find application in 
modulation the sorting of proteins into different compartments. 

htes3_7b22: The novel protein is related to paramyosin, a major structural component 
of thick filaments and invertebrate muscle. Paramyosins are promising antigens for 
immunization against several parasites, such as Schistosoma mansoni. The new 
protein can find application in modulating cell adhesion/motility and membrane/cyto 
skeleton structure and dynamic. 

htes3_7j3: The new protein is closely related to C-Takl and therefore should be 
involved in cell-cycle regulation, too. The new protein can find application in 
modulating/blocking the cell cycle. 

htes3_7p9: The nuclear domain (ND)IO also described as POD or Kr bodies is 
involved in the development of acute promyelocytic leukemia and virus-host 
interactions. The NDP52 protein is part of this complex structure. In vivo, NDP52 is 
transcribed in all human tissues, but is redistributed upon viral infection and interferon 
treatment. NDIO plays an important role in the viral life cycle. The novel protein is 
similar to NDP52. It contains three leucine zippers and a ROD cell attachment site. 
This protein seems to be a novel part of the ND819) complex. The new protein can 
find application in modulation of viral infections and tumour events. 

htes3_8mlO: The poly(A)-binding protein (PABP) binds to the messenger (mRNA) 
3'-poly(A) tail found on most eukaryotic mRNAs and together with the poly(A) tail 
has been implicated in governing the stability and the translation of mRNA. The new 
protein can find application in modulation of mRNA translation and 
processing/stability. 

Kidney 

hfkd2_24bl 5: The new protein can find application in modulation of hexose 
metabolism pathways and as a new enzyme for biotechnologic production processes. 
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hfkd2_24n20: The new protein seems to be part of the signalling pathway between 
tyrosine kinases and the membrane/cyto skeleton. The new protein can find 
application in modulating cell adhesion/motility and membrane/cyto skeleton 
structure and dynamics. 

hfkd2_3ol7: The new protein can find application in modulation of the respiratory 
electron transport chain pathways of mitochondria. 

hfkd2_46j20: The new protein can find application in modulating the 
homoprotocatechuate degradative pathway and as a enzyme for biotechnologic 
production processes. 

hflcd2_46kl 9: The new protein can find application in modulating/blocking the 
expression of genes controlled by the hepatocyte nuclear factor-1 . 

hflcd2_46m4: SARI proteins are involved in vesicular transport between the 
endoplasmic reticulum and the Golgi apparatus. 

hfkd2_46kl4: rab6 is a ubiquitous ras-like GTPase involved in intra-Golgi transport. 
The new protein can find application in modulating the transport of vesicles inside the 
Golgi apparatus. 

Uterus Associated: 

hutel_18il9: The SREBP-2 protein is embedded in the membranes of the nucleus and 
endoplasmic reticulum. In cholesterol-depleted cells the proteins are cleaved to release 
soluble NH2-tenninal firagments that enter the nucleus and activate genes encoding 
the low density lipoprotein receptor and enzymes of cholesterol synthesis. The new 
protein is a putative transcription factor capable of protein-protein interaction via a 
lim domain and additionally shows similarity to the common sunflower transcription 
factor SF3. 

hutel_1811: The novel protein is similar to several 40S ribosomal proteins and 
therefore seems to part of the corresponding ribosome sub-imit. 

hutel_19g22: The new protein can find application in modulation of tissue- 
calcification, especially the uterus. 
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hutel_19hl7: The new protein can find application in modulating the response of 
cells to oxysterols. 

hutel_20bl9: The novel protein seems to be a novel enzyme with sarcosine oxidase 
activity. The new protein can find application in modulation of sarcosine metabolism 
and as a new enzyme for biotechnologic production processes. 

hutel_20g21 : The novel protein seems to be a new ras inhibitor protein. The new 
protein can find application in modulating/blocking ras dependent signal transduction 
pathways. 

hutel_20hl3; The novel protein is a new human alpha-adaptin. The new protein can 
find application in modulating endocytosis and vesicle trafScking in cells. 

hutel_20ml 1 : The new protein can find application in modulating/blocking the 
activity of protein phosphatase-1 and in modulating the cell cycle. 

hutel_20m24: This protein is a putative mannosyl transferase that is involved in the 
assembly of the core oligosaccharide Glc3Man9GlcNAc2. The new protein can find 
application in modulation of glycosylation of proteins and as a new enzyme for 
biotechnologic production processes. 

hutel_22el2: The new protein can find application in modulating the comichon 
modulated signal transduction way and also the EGF receptor signaling processes. 

hutel_23el3: The novel protein contains a serine protease of the subtilase family with 
an aspartic acid-containing active site. The new protein can find application in 
modulation of proteinase activity in cells and as a new enzyme for proteomics and 
biotechnologic production processes. 

hutel_24j6: The new protein can find application in modulation of cell-cell-adhesion. 

hutel__24h3: The new protein can find application as a useful marker for chondro- 
osteogenic cell differentiation and for the modulation of chondro-osteogenic cell 
differentiation. 

Fetal Brain: 

hfbr2_l 6cl 6; The new protein can find application in modulatingA)locking of cyto 
skeleton-membrane protein interaction. 
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hfbr2_23b21 : The new protein can find application in modulating/blocking the 
guanylate cyclase-pathway. 

hfbr2_23blO: The new protein can find application in modulation of splicing. 

hfbr2_2b5: The novel protein contains the typical (xxG)n repeat of collagen proteins 
and a Pfam von Willebrand factor type A domain. Therefore, the protein seems to be a 
new collagen alpha chain. The new protein can find application in modulation of 
connective tissue, bone and cartilage development and maintainance. 

hfbr2_2cl7: The new protein can find application in modulating/blocking G-protein- 
dependent pathways. 

hfbr2_2dl5: The new protein can find application in modulating early 
spermatogenesis. 

hfbr2_2il 7: The new protein can find clinical application in modulating the transport 
of glycoproteins inside cells, especially of the LDL receptor. 

hfbr2_2kl4: Tumour-suppressor genes are known to be involved in the control of cell 
growth and division, interacting with proteins which control the cell cycle. The N33 
gene is significantly methylated in tumour cells, a mechanism by which tumor- 
suppressor genes are inactivated in cancer. In addition, the novel protein contains a 
RGD cell attachment site. Therefore the novel protein is a new putative tumour- 
suppressor gene. 

hfbr_3cl8: RNA helicases comprise a large family of proteins that are involved in 
basic biological systems such as nuclear and mitochondrial splicing processes, RNA 
editing, rKNA processing, translation initiation, nuclear mRNA export, and mRNA 
degradation. RNA helicases are essential factors in cell development and 
differentiation, and some of them play a role in transcription and replication of viral 
single-stranded RNA genomes. The members of the largest subgroup, the DEAD and 
DEAH box proteins, exhibit a strong dependence of the unwinding activity on ATP 
hydrolysis. The novel protein contains a DEAD-box and is a new member of this 
subgroup, 

hfbr_3g8: The new protein can fmd application modulating NAT assembly and action 
and therefore be important in metabolism of drugs and environmental mutagens. 
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hfbr2_62bl 1 : The rac small GTPase is associated with type-I phosphatidylinositol 4- 
phosphate 5-kinase and regulating the production of phosphatidylinositol 4,5- 
bisphosphate. The new protein is expected to activate p21rac-related small GTPases. 

hfbr2_62ol7: The new protein can find application in modulation of cholesterol 
binding and transport by LDL-receptors and LDL-binding proteins. 

hfbr_6b24: The new protein can find application in modulation of rhamnose 
metabolism and as a new enzyme for biotechnologic production processes. 

hfbr_72bl8: The new protein can find application in modulating DNA repair and 
mutagenesis. 

hfbr_78c4: The new protein can find application in modulating/blocking the response 
of cells to interferons. 

hfbr_78k24: These enzymes are involved in the processing of poly-ubiquitin 
precursors as well as that of ubiquinated proteins. The new protein can find 
application in modulation of protein stability/degradation in cells. 

hfbr_82e4: The new protein can find clinical application in modulating/blocking 
calmodulin-mediated pathways in human neuronal cells. 

VARIANTS OF THE INVENTIVE DNA MOLECULES 
Variants in General 

"Variants/ according to the invention, include DNA and/or protein molecules that 
resemble, structurally and/or functionally, those set forth in herein. Variants may be isolated 
from natural sources ("homologs"), may be entirely synthetic or may be based in part on both 
natural and synthetic approaches. 

The section set forth below presents various structural and fiinctional characteristics of 
molecules within the invention. Preferred molecules are characterized by a combination of 
01^ or more of these characteristics. For instance, some preferred molecules are described 
with reference to at least two structural characteristics, while others may be described with 
reference to at least one structural and at least one fimctional characteristic. 

It will be recognized by the skilled artisan that structure ultimately defines function, 
i.e. the functions of the molecules described herem derives from the structures of those 
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molecules. Accordingly, the structural variants described below that bear the closest 
structural relationship (as variously defined below) to the inventive molecules are the variants 
that most likely will preserve biological function. This relationship between structure and 
function will guide the skilled artisan in identifying the preferred embodiments of the 
invention. 

Splicing Variants 

It is well-known that eukaryotic structural genes are comprised of both protein coding 
and non-coding portions. When the messenger RNA is transcribed from the DNA template, 
it contains introns, which are non-coding, and exons, which are coding. In order to form a 
translation competent mRNA, the introns must be "spliced" out of this initial pre mRNA. 

Specific sequences within the pre mRNA represent "splice junctions" that direct the 
cellular splicing machinery to the appropriate position. The splice junctions are loosely 
conserved sequence regions of the pre mRNA, which almost invariably begin with GT and 
end with AG (DNA perspective). The 5* end of the splice junction typically contains about 
nine somewhat conserved residues, for example, C/AAGTA/GAGT. The 3' end usually 
contains a pyrimidihe rich stretch of at least about 11 nucleotides, followed by NC/TAGG. 
Splicing occurs before the GT and after the AG. Mount, Nucleic Acids Res. 10:459-72 
(1982). 

Interestingly, exons often correspond to discrete functional domains of the protein 
product. The intron/exon arrangement thus creates a linear array of nucleotides which can be 
correlated to discrete, and often interchangeable, fiinctional protein fragments. Go, Nature 
291:90-92 (1981); Branden et al, EMBO J. 3:1307«10 (1984). This linear arrangement 
creates the possibility of generating multiple different fiiU length proteins by rearranging the 
order of the different ftmctional portions in the array. For example, if a set of exons are 
arranged 1-2-3-4, where (-) represents the introns separating the exons, a splicing event need 
not simply produce 1234, but may produce 123, 134, 124 and so on. Production of different 
mRNA products in this way is commonly called "alternative splicing. " Andreadiser a/, , Ann, 
Rev. Cell Biol 3:207-42(1987). 

Some of the present DNA molecules can be represented in modular fashion in terms of 
their coding regions. Essentially, these modules are exons (though each "axon" may in fact 
be made up of several exons), which may be combined in different ways to form a variety of 
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different DNA molecules, each encoding a different functional protein, 
indicated below. 
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Splicing variants are 


Degenerate Variants 

One aspect of the present invention provides "degenerate variants" of the nucleic acid 
fragments of the present invention. A "degenerate variant" is a nucleotide fragment which 
differs from those of inventive molecules by nucleotide sequence, but due to the degeneracy 
of the genetic code, encodes an identical polypeptide sequence. 

Given the known relationship between DNA sequences and the proteins they encode, 
degenerate variants typically are described by reference to this relationship. It is well known 
that die degeneracy of the genetic code results in many possible DNA sequences which 
encode a particular protein. Indeed, of the three bases which comprise an amino acid- 
encoding triplet, the third position, and often the second, atoiost always may vary. This fact 
alone allows for a class of variant DNA molecules which encode protein sequences identical 
to those disclosed herein, yet have about 30% sequence variation. In other words, the variant 
DNA molecules are about 70% identical to the inventive DNAs, having no additional or 
deleted sequences. Thus, one aspect of the invention provides degenerate variant DNA 
molecules encoding the inventive protein sequences. 

In one embodiment, these variants have at least about 70% sequence identity with the 
DNA molecules described herein. In a preferred embodiment, these variants have at least 
about 80% sequence identity to the inventive molecules. In a more preferred embodiment 
these variants have at least about 90% sequence identity with the inventive molecules. 

Conservative Amino Acid Variants 

Variants according to the invention also may be made that conserve the overall 
molecular strucnire of the encoded protems. Given the properties of the individual amino 
acids comprising the disclosed protein products, some rational substitutions will be recognized 
by the skilled worker. Amino acid substitutions, i.e, "conservative substitutions," may be 
made, for instance, on the basis of similarity in polarity, charge, solubility, hydrophobicity, 
hydrophilicity, and/or the amphipathic nature of the residues involved. 

For example: (a) nonpolar (hydrophobic) amino acids include alanine, leucine, 
isoleucine, valine, proline, phenylalanine, tryptophan, and methionine; (b) polar neutral 
amino acids include glycine, serine, tfireonine, cysteine, tyrosine, asparagine, and glutamine; 
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(c) positively charged (basic) amino acids include arginine, lysine, and histidine; and (d) 
negatively charged (acidic) amino acids include aspartic acid and glutamic acid. Substitutions 
typically may be made within groups (a)-(d). In addition, glycine and proline may be 
substituted for one another based on their ability to disrupt a-helices. Similarly, certain 
amino acids, such as alanine, cysteine, leucine, methionine, glutamic acid, glutamine, 
histidine and lysine are more commonly found in a-helices, while valine, isoleucine, 
phenylalanine, tyrosine, tryptophan and threonine are more commonly found in P-pleated 
sheets. Glycine, serine, aspartic acid, asparagine, and prolme are commonly found in turns. 
Some preferred substitutions may be made among the following groups: (i) S and T; (ii) P and 
G; and (iii) A, V, L and 1. Given the known genetic code, and recombinant and synthetic 
DNA techniques, the skilled scientist readily can construct DNAs encodmg the conservative 
amino acid variants. 

As used herein, "sequence identity" between two polypeptide sequences indicates the 
percentage of amino acids that are identical between the sequences. "Sequence similarity" 
indicates the percentage of amino acids that either are identical or that represent conservative 
amino acid substitutions. 

Functionally Equivalent Variants 

Yet another class of DNA variants within the scope of the invention may be described 
with reference to the product they encode. As shown below, some of the inventive DNA 
molecules encode a protein having a degree of homology with known protems, or protein 
domains. It is expected, therefore, that they will have some or all of the requisite functional 
features of such molecules. These "functionally equivalent variants" products are 
characterized by the fact that they are functionally equivalent, with respect to biological 
activity, to certain known molecules. 

The instant invention provides information on common structural motifs, inchiding 
consensus sequences that will guide the artisan in constructing functionally equivalent 
variants. It will be understood that the motifs, identified for each inventive protein, may be 
modified within the identified consensus sequences. Thus, the invention contemplates the 
proteins disclosed herein that contain variability in the consensus sequences identified, and the 
invention further contemplates the full range of nucleic acids encoding them, and the 
complements of those nucleic acids. 
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Hybridizing Variants 

DNA variants within the invention also may be described by reference to their 
physical properties in hybridization. One skilled in the field will recognize that DNA can be 
used to identify its complement and, since DNA is double stranded, its equivalent or 
homolog, using nucleic acid hybridization techniques. It will also be recognized that 
hybridization can occur with less than 100% complementarity. However, given appropriate 
choice of conditions, hybridization techniques can be used to differentiate among DNA 
sequences based on their structural relatedness to a particular probe. For guidance regarding 
such conditions see, for example, Sambrooker a/., 1989, MOLECULAR CLONING, A 
LABORATORY MANUAL, Cold Spring Harbor Press, N.Y.; and Ausubel et a/., 1989, 
CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, Green Publishing Associates and 
Wiley Interscience, N. Y. 

Structural relatedness between two polynucleotide sequences can be expressed as a 
function of "stringency" of the conditions under which the two sequences will hybridize with 
one another. As used herein, the term "stringency" refers to the extent that the conditions 
disfavor hybridization. Stringent conditions strongly disfavor hybridization, and only the 
most structurally related molecules will hybridize to one another under such conditions. 
Conversely, non-stringent conditions favor hybridization of molecules displaying a lesser 
degree of structural relatedness. Hybridization stringency, therefore, directly correlates with 
the structural relationships of two nucleic acid sequences. The following relationships are 
useful in correlating hybridization and relatedness (where T„ is the melting temperature of a 
nucleic acid duplex): 

a. T„ = 69.3 + 0.41(G+C)% 

b. The T„ of a duplex DNA decreases by TC with every increase of 1 % in the 
number of mismatched base pairs. 

c. (TJ^ - (TJ,, = 18.5 log,oM2/^l 

where |il and \i2 are the ionic strengths of two solutions. 

Hybridization stringency is a function of many factors, including overall DNA 
concentration, ionic strength, temperature, probe size and the presence of agents which 
disrupt hydrogen bonding. Factors promoting hybridization include high DNA 
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concentrations, high ionic strengths, low temperatures, longer probe size and the absence of 
agents that disrupt hydrogen bonding. 

Hybridization usually is done in two stages. First, in the "binding" stage, the probe is 
bound to the target under conditions favoring hybridization. Stringency is usually controlled 
at this stage by altering the temperature. For high stringency, the temperature is usually 
between 65^C and TOC, unless short (<20 nt) oligonucleotide probes are used. A 
representative hybridization solution comprises 6X SSC, 0.5% SDS, 5X Denhardfs solution 
and 100ng of non-specific carrier DNA. See Ausubel et aL, supra, section 2.9, supplement 
27 (1994). Of course many different, yet functionally equivalent, buffer conditions are 
known. Where the degree of relatedness is lower, a lower temperature may be chosen. Low 
stringency binding temperatures are between about IS'C and 4CrC. Medium stringency is 
between at least about 4(rC to less than about 6yc, High stringency is at least about 6S*C. 

Second, the excess probe is removed by washing. It is at this stage that more stringent 
conditions usually are applied. Hence, it is this "washing" stage that is most important in 
determining relatedness via hybridization. Washing solutions typically contain lower salt 
concentrations. One exemplary medium stringency solution contains 2X SSC and 0. 1 % SDS. 
A high stringency wash solution contains the equivalent (in ionic strength) of less than about 
0.2X SSC, with a preferred stringent solution containing about 0. IX SSC. The tenq)eratures 
associated with various stringencies are the same as discussed above for "binding." The 
washing solution also typically is replaced a number of times during washing. For exanq)le, 
typical high stringency washing conditions comprise washmg twice for 30 minutes at 5y C. 
and three times for 15 minutes at &f C. 

The present invention includes nucleic acid molecules that hybridize to the mventive 
molecules under high stringency binding and washing conditions. More preferred molecules 
(from an mRN A perspective) are those that are at least 50 % of the length of any one of those 
depicted in below. Particularly preferred molecules are at least 75 % of the length of those 
molecules. 

Substitutions, Insertions, Additions md Deletions 

In a general sense, the preferred DNA variants of the invention are those that retain 
the closest relationship, as described by "sequence identity" to the inventive DNA molecules. 
According to another aspect of the invention, therefore, substitutions, insertions, additions 
and deletions of defined properties are contemplated. It will be recognized that sequence 
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identity between two polynucleotide sequences, as defined herein, generally is determined 
with reference to the protein coding region of the sequences. Thus, this definition does not at 
all limit the amount of DNA, such as vector DNA, that may be attached to the molecules 
described herein. Preferred DNA sequence variants include molecules encoding proteins 
sharing some or all of any relevant biological activity of the native molecule. 

In creating these variants, the skilled worker will be guided by reference to the protein 
structure. First, insertions and deletions in any recognized functional domain, above, 
generally should be avoided, except as noted below in the section entitled "Proteins," where 
this domain is discussed in detail. Alterations in such domains usually will be limited to 
conservative amino acid substitutions. In addition, where insertions and deletions are desired, 
this may be accomplished at the N- and/or C-terminus of the protein molecule (or the 
corresponding coding regions of the DNA). If insertions or deletions are made within the 
protein, deletions of major structural feamres usually should be avoided. Thus, a preferred 
place to make insertion or deletion variants is in non-structural regions, such as linker regions 
between two alpha helices. 

"Substitutions" generally refer to alterations in the DNA sequence which do not 
change its overall length, but only alter one or more nucleotide positions, substituting one for 
another in the common sense of the word. One class of preferred substitutions, "degenerate 
substitutions, " are those that do not alter the encoded amino acid sequence. Some subsitutions 
retains 50%, 55%, 60% or 65% identity. Preferred substitutions retain at least about 70% 
identity, more preferably at least 70% or 75 % identity, with the inventive DNAs. Some more 
preferred molecules have at least about 80% identity, more preferably at least 80% or 85% 
identity. Particularly preferred DNAs share at least about 90% identity, more preferably at 
least 90% or 95 % identity. 

"Insertions," unlike substitutions, alter the overall length of the DNA molecule, and 
thus sometimes the encoded protein. Insertions add extra nucleotides to the interior (not the 
5' or 3* ends) of the subject DNAs. Preferred insertions are made with reference to the 
protein sequence encoded by the DNA. Thus, it is most preferred to provide an insertion in 
the DNA at a location that corresponds to an area of the encoded protein which lacks 
structure. For instance, it typically would not be beneficial, if the preservation of biological 
activity is desired, to provide an insertion within an alpha-helical region or a beta-pleated 
sheet. Accordingly, non-structural areas, such as those containing helix-breaking glycines 
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and proline residues, are most preferred sites of insertion. Other preferred sites of insertion 
are the splice sites, which are indicated above in the description of the inventive DNA 
molecules. 

While the optimal size of insertions will vary depending upon die site of insertion and 
its effect on the overall conformation of the encoded protein, some general guides are useful. 
Generally, the total insertions (hrespective of their number) should not add more than about 
30% (or preferably not more than 30%) to the overall size of the encoded protein. More 
preferably, the msertion adds less than about 10-20% (yet more preferably 10-20%) in size, 
with less than about 10% being most preferred. The number of insertions is limited only by 
the number of suitable insertions sites, and secondarily by the foregoing size preferences. 

"Additions," like insertions, also add to the overall size of die DNA molecule, and 
usually the encoded protein. However, instead of being made within the molecule, they are 
made on die 5* or 3' end, usually corresponding to the or C- terminus of the encoded 
protein. Unlike deletions, additions are not very size-dependent. Indeed, additions may be of 
virtually any size. Preferred additions, however, do not exceed about 100% of the size of the 
native molecule. More preferably, they add less dian about 60 to 30% to the overall size, 
with less than about 30% being most preferred. 

"Deletions" diminish the overall size of the DNA and, therefore, also reduce the size 
of the protein encoded by that DNA. Deletions may be made from either end of the molecule 
or internal to it. Typical preferred deletions remove discrete structural features of the 
encoded protein. For example, some deletions will comprise the deletion of one or more 
exons which may define a structural feature. Preferred deletions remove less than about 30% 
of the size of the subject molecule. More preferred deletions remove less dian about 20% and 
most preferred deletions remove less than about 10% . 

Computer-Defined Variants and Definition of Sequence Identity" 

In general, both the DNA and protein molecules of the invention can be defmed with 
reference to "sequence identity." As used herein, "sequence identity" refers to a comparison 
made between two molecules using, for exan^)le, the standard Smith-Waterman algorithm 
that is well known in the art. 

Some molecules have at lease about 50%, 55% or 60% identity. Preferred molecules 
are those having at least about 65% sequence identity, more preferably at least 65% or 70% 
sequence identity. Other preferred molecules have at least about 80%, more preferably at 
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least 80% or 85%, sequence identity. Particularly preferred molecules have at least about 
90% sequence identity, more preferably at least 90% sequence identity. Most preferred 
molecules have at least about 95%, more preferably at least 95%, sequence identity. As used 
herein, two nucleic acid molecules or proteins are said to "share significant sequence identity" 
if the two contain regions which possess greater than 85% sequence (amino acid or nucleic 
acid) identity. ' 

"Sequence identity" is defined herein with reference the Blast 2 algorithm, which is 
available at the NCBI (http://www.ncbi.nlm.nih.gov/BLAST), using default parameters. 
References pertaining to this algorithm include: those found at 

http://www.ncbi.nlm.nih.gov/BLAST/blast_references.html; Altschul, S.F., Gish, W., Miller, 
W., Myers, E.W. & Lipman, D.J. (1990) "Basic local alignment search tool." J. Mol. Biol. 
215:403-410; Gish, W. & States, D.J. (1993) "Identification of protein coding regions by 
database similarity search." Nature Genet. 3:266-272; Madden, T.L., Tatusov, R.L. & Zhang, 
J. (1996) "Applications of network BLAST server" Meth. Enzymol. 266:131-141; Altschul, 
S.F., Madden, T.L., Schaffer, A. A.. Zhang, J., Zhang, Z., Miller, W. & Lipman, DJ. (1997) 
"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs." 
Nucleic Acids Res. 25:3389-3402; and Zhang, J. & Madden, T.L. (1997) "PowerBLAST: A 
new network BLAST application for interactive or automated sequoice analysis and 
annotatioa" Genome Res. 7:649-656. 

METHODS OF MAKING VARIANTS 

It will be recognized that variants of the inventive molecules can be constructed in 
several different ways. For example, they may be constracted as completely synthetic DNAs. 
Methods of cflRciently syndiesizing oligonucleotides in the range of 20 to about 150 
nucleotides are widely available. See Ausubel et al., supra, section 2.11, Supplement 21 
(1993). Overlapping oligonucleotides may be synthesized and assembled in a fashion first 
reported by Khorana et al, J. Mol. Biol. 72:209-217 (1971); see also Ausubel e/ al. Section 
8.2. The synthetic DNAs are designed with convenient restriction sites engineered at the 5* 
and 3' ends of the gene to facilitate cloning into an appiopriate vector. 

An alternative method of generating variants is to start wiUi one of the inventive 
DNAs and tiien to conduct site-directed mutagenesis. See Ausubel et al., siqtra, chapter 8, 
Supplement 37 (1997). In a typical metiiod, a target DNA is cloned into a single-stranded 
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DNA bacteriophage vehicle. Single-stranded DNA is isolated and hybridized with a 
oligonucleotide containing the desired nucleotide alteration(s). The complementary strand is 
synthesized and the double stranded phage is introduced into a host. Some of the resulting 
progeny will contain the desired mutant, which can be confirmed using DNA sequencing. In 
addition, various methods are available that increase the probability that the progeny phage 
will be the desired mutant. These methods are well known to those m the field and kits are 
conunercially available for generating such mutants. 

ISOLATING HOMOLOGS 

Methods 

By using the sequences disclosed herein as probes or as primers, and techniques such 
as PGR cloning and colony/plaque hybridization, one skilled in the art can obtain homologs. 
^'Homologs" are essentially naturally-occurring variants and include allelic, species-specific 
and tissue-specific variants. 

Region-specific primers or probes derived fi-om the nucleotide sequence(s) provided 
can be used to prime DNA synthesis and PGR amplification, as well as to identify colonies 
containing cloned DNA encoding a homolog using known methods (Innis et al, PCR 
Protocols. Academic Press, San Diego, GA (1990)). Such an application is useful in 
diagnostic methods, as described in more detail below, as well as in preparing full-length 
DNAs fi-om various sources. The PGR primers are preferably at least 15 bases, and more 
preferably at least 18 bases in length. When selecting a primer sequence, it is preferred that 
the primer paks have approximately the same G/G ratio, so that melting temperatures are 
approximately the same. As a general guide, the formula 3(G+G) + 2(A+T) = *»C, is 
useful. 

When using primers derived from the inventive sequences, one skilled in the art will 
recognize that by employing high stringency conditions (e.g., annealing at 50-60°C), only 
sequences with greater than 75% sequence identity to the primer will be amplified. By 
employing lower stringency conditions (e.g., annealing at 35-3T*C), sequences which have 
greater than 40-50% sequence identity to the primer also will be amplified. 

The PCR product may be subcloned and sequenced to confirm that it indeed displays 
the expected sequence identity. The PCR fragment may then be used to isolate a fiiU length 
cDNA clone by a variety of methods. For example, the amplified ftagment may be labeled 
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and used to screen a bacteriophage cDNA library. Alternatively, the labeled fragment may be 
used to screen a genomic library. 

PCR technology may also be utilized to isolate full length cDNA sequences. For 
example, RNA may be isolated, following standard procedures, from an appropriate cellular 
or tissue source. A reverse transcription reaction may be performed on the RNA using an 
oligonucleotide primer specific for the most 5* end of the amplified fragment for the pruning 
of first strand synthesis. The resulting RNA/DNA hybrid may then be "tailed" with guanines 
using a standard termmal transferase reaction, the hybrid may be digested with RNAase H, 
and second strand synthesis may then be primed with a poly-C primer. Thus, cDNA 
sequences upstream of the amplified fragment may easily be isolated. For a review of cloning 
strategies which may be used, see e.g. , Sambrook et al. , 1989, si^ra. 

When using DNA probes derived from the inventive sequences for colony/plaque 
hybridization, one skilled in the art will recognize that by employing medium to high 
stringency conditions (e.g., hybridizing at SO-^S'C in 5X SSPC and 50% formamide, and 
washing at 50-65°C in 0.5X SSPC), sequences having regions with greater than 90% 
sequence identity to the probe can be obtained, and that by employing lower stringency 
conditions (e.g., hybridizing at 35-3T*C in 5X SSPC and 40-45% formamide, and washmg at 
42°C in SSPC), sequences having regions with greater than 35-45% sequence identity to the 
probe will be obtained. 

Suitably, genomic or cDNA libraries can be constructed and screened in accord with 
the previous paragraph. The libraries should be derived firom a tissue or organism that is 
known to express the gene of interest, or that is suspected of expressing the gene. The clone 
containing the homolog may then be purified through methods routinely practiced in the art, 
and subjected to sequence analysis. 

Additionally, an expression library can be constructed utilizmg DNA isolated from or 
cDNA synthesized from a tissue or organism that is known to express the gene of interest, or 
that is suspected of expressing the gene. In this manner, clones may be induced and screened 
using standard antibody screening techniques in conjunction with antibodies raised against the 
normal gene product, as described herein. (For screening techniques, see, for example, 
Harlow, E. and Lane, eds., 1988, ANTIBODIES: A LABORATORY MANUAL, Cold 
Spring Harbor Press, Cold Spring Harbor Press.) 

Human Homologs 
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Any organism or tissue can be used as the source for homologs of the present 
invention so long as the organism or tissue naturally expresses such a protein or contains 
genes encoding the same. The most preferred organism for isolating homologs is human. 

PROTEINS OF THE INVENTION 

One class of proteins included within the invention is encoded by the inventive DNA 
molecules presented. Other proteins according to the invention are those encoded by the 
DNA variants described above. As noted, these variants are designed with the encoded 
proteins in mind. 

A preferred class of protein fragments includes those fragments which retain any 
biological activity. These molecules share functional features common the family of proteins, 
although these characteristics may vary in degree. 

According to one aspect of the invention fragments of the inventive proteins are 
contemplated. Some preferred fragments are those which are capable of eliciting an immune 
response. Generally these "antigenic" fragments will be from about five amino acids in 
length to about fifty amino acids in length. Some preferred antigenic fragments are from five 
to about twenty amino acids long. "Antigenic" response may refer to a T ceil response, a B 
cell response or a response by cells of the macrophage/monocyte lineages. In most cases, 
however, it wUl refer to the immune response involved in the generation of antibodies. In 
other words, die relevant immune response is that of helper T cells and/or B cells. These 
preferred molecules comprise one or more T cell and /or B cell epitopes. 

ANTIBODIES OF THE INVENTION 

Antibodies raised against the proteins and protein fragments of the invention also are 
contemplated by the invention. Described below are antibody products and methods for 
producing antibodies capable of specifically recognizing one or more epitopes of the presently 
described proteins and their derivatives . 

Antibodies include, but are not limited to polyclonal antibodies, monoclonal antibodies 
(mAbs), humanized or chimeric antibodies, single chain antibodies including single chain Fv 
(scFv) fragments. Fab fragments, F(ab')2 fragments, fragments produced by a Fab expression 
library, anti-idiotypic (anti-Id) antibodies, epitope-binding fragments, and humanized forms of 
any of the above. 
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As known to one in the art, these antibodies may be used, for example, in the 
detection of a target protein in a biological sample. They also may be utilized as part of 
treatment methods, and/or may be used as part of diagnostic techniques whereby patients may 
be tested for abnormal levels or for the presence of abnormal forms of the such proteins. 

In general, techniques for preparing polyclonal and monoclonal antibodies as well as 
hybridomas capable of producing the desired antibody are well known in the art (Campbell, 
A.M., Monoclonal Antibody Technology: Laboratory Techniques in Biochemistry and 
Molecular Biology, Elsevier Science Publishers, Amsterdam. The Netherlands (1984); St. 
Groth et aL, / Immunol Methods 35:1-21 (1980); Kohler and Milstein, Nature 256:495-497 
(1975)), the trioma technique, the human B-cell hybridoma technique (Kozbor et aL, 
Immunology Today 4:72 (1983); Cole et aL, in Monoclonal Antibodies and Cancer Therapy, 
Alan R. Liss, Inc. (1985), pp. 77-96). Antibodies may also be generated by the known 
techniques of phage display and in vitro immunization. 

Polyclonal Antibodies 

Polyclonal antibodies are heterogeneous populations of antibody molecules derived 
from the sera of animals mununized with an antigen, such as an inventive protein or an 
antigenic derivative thereof. 

Polyclonal antiserum, containing antibodies to heterogeneous epitopes of a single 
protein, can be prepared by immunizing suitable animals with the expressed protein described 
above, which can be unmodified or modified, as known in the art, to enhance 
immunogenicity. Immunization methods include subcutaneous or intraperitoneal injection of 
the polypeptide. 

Effective polyclonal antibody production is affected by many factors related both to 
the antigen and to the host species. For example, small molecules tend to be less 
immunogenic than other and may require the use of carriers and/or adjuvant. In addition, 
host animal response may vary with site of inoculation. Both inadequate or excessive doses 
of antigen may result in low titer antisera. In general, however, small doses (high ng to low 
levels) of antigen administered at multiple intradermal sites appears to be most reliable. 
Host animals may include but are not limited to rabbits, mice, chickens and rats, to name but 
a few. An effective immunization protocol for rabbits can be found in Vaimkaitis. J. et aL , J. 
Clin. Endocrinol Metab, 33:988-991 (1971), 
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The protein immunogen may be modified or administered in an adjuvant in order to 
increase the protein's antigenicity. Methods of increasing the antigenicity of a protein are 
well known in the art and include, but are not limited to coupling the antigen with a 
heterologous protein (such as globulin p-galactosidase) or through the inclusion of an adjuvant 
during immunization. Adjuvants inchide Freund's (complete and incomplete), mineral gels 
such as aluminum hydroxide, surface active substances such as lysolecitiiin, pluronic polyols, 
polyanions, peptides, oU emulsions, keyhole limpet hemocyanin, dinitrophenol, and 
potentially useful human adjuvants such as BCG (baciUe Calmette-Guerin) and 
Corynebacteriumparmm. 

Booster injections can be given at tegular intervals, with at least one usually being 
required for optimal antibody production. The antisenim may be harvested when tiie 
antibody titer begins to fall. Titer may be determined semi-quantitatively, for example, by 
double immunodiffusion in agar against known concentrations of tiie antigen. See, for 
example, Ouchterlony et al.. Chap. 19 in: Handbook of Experimental Immunology, Wier, ed, 
Blackwell (1973). Plateau concentration of antibody is usually in the range of 0.1 to 0.2 
mg/ml of serum (about 12 ^M). The antiserum may be purified by affinity chromatography 
using the immobilized immunogen carried on a solid support. Such methods of affinity 
chromatography are well known in the art. 

Afifmity of the antisera for die antigen may be determined by preparing competitive 
binding curves, as described, for example, by Fisher, Chap. 42 in: Manual of CUnical 
Immunology, second edition. Rose and Friedman, eds., Amer. Soc. For Mfcrobiology, 
Washington, D.C. (1980). 

In addition to using protein an the inmunogen, DNA molecules may be used directly. 
In this manner, a DNA encoding the protein immunogen is administered. Boosting and 
harvesting is done in a manner analogous to that detailed above. Yet anodier metiiod of 
producing antibodies entails immunizing chickens and harvesting the antibodies from tiKir 
eggs. 

MonoclondlAntiboMes 

Monoclonal antibodies (MAbs), are homogeneous populations of antibodies to a 
particular antigen. They may be obtained by any technique which provides for the production 
of antibody molecules by continuous cell lines in culture or in vivo. MAbs may be produced 
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by making hybridomas which are immortalized cells capable of secreting a specific 
monoclonal antibody. 

Monoclonal antibodies to any of the proteins, peptides and eptopes thereof described 
herein can be prepared from murine hybridomas according to the classical method of Kohler, 
G. and Milstein, C, Nature 256:495-497 (1975) (and U.S. Patent No, 4,376,110) or 
modifications of the methods thereof, such as the human B-cell hybridoma technique (Kosbor 
et aL, 1983, Immunology Today 4:72; Cole et al, 1983, Proc. Natl. Acad. Sci. USA 80: 
2026-2030), and the EBV-hybridoma technique (Cole et ai, 1985, MONOCLONAL 
ANTIBODIES AND CANCER THERAPY, Alan R. Liss, Inc., pp. 77-96). 

In one method a mouse is repetitively inoculated with a few micrograms of the 
selected protein over a period of a few weeks. The mouse is then sacrificed, and the antibody 
producing cells of the spleen are isolated. 

The spleen cells are fused, typically using polyethylene glycol, with mouse myeloma 
cells, such as SP2/0-Agl4 myeloma cells. The excess, unfused cells are destroyed by growth 
of the system on selective media comprising aminopterin (HAT media). The successftilly 
fused cells are diluted, and aliquots are plated to microliter plates where growth is continued. 

Antibody-producing clones (hybridomas) are identified by detection of antibody in the 
supernatant fluid of the wells by immunoassay procedures. These include ELISA, as 
originally described by Engvall, Meth. EnzymoL 70:419 (1980), western blot analysis, 
radioimmunoassay (Lutz et al, Exp. Cell Res. 175:109-124 (1988)) and modified methods 
thereof. 

Selected positive clones can be expanded and their monoclonal antibody product 
harvested for use. Detailed procedures for monoclonal antibody production are described in 
Davis, L. et al BASIC METHODS IN MOLECULAR BIOLOGY, Elsevier, New York. 
Section 21-2 (1989). The hybridoma clones may be cultivated//! vitro or in Wvo, for instance 
as ascites. Production of high titers of mAbs in vivo makes this the presently preferred 
method of production. Alternatively, hybridoma culture in hollow fiber bioreactors provides 
a continuous high yield source of monoclonal antibodies. 

The antibody class and subclass may be determined using procedures known in the art 
(Campbell, A.M., Monoclonal Antibody Technology: Laboratory Techniques in Biochemistry 
and Molecular Biology, Elsevier Science Publishers, Amsterdam, The Netherlands (1984)). 
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MAbs may be of any immunoglobulin class including IgG, IgM, IgE, IgA, IgD and any 
subclass thereof. Methods of purifying monoclonal antibodies are well known in the art. 


Antibody Derivatives and Fragments 

Fragments or derivatives of antibodies include any portion of the antibody which is 
capable of binding the target antigen, or a specific portion thereof. Antibody derivatives 
include poly-specific {e,g., bi-specific) antibodies, which contain binding sites specific for two 
or more different epitopes. These epitopes may be from the same or different inventive 
molecules or one or more epitope may be from a molecule not specifically disclosed here. 

Antibody fragments specifically include F(ab*)j, Fab, Fab* and Fv fragments. These 
can be generated from any class of antibody, but typically are made from IgG or IgM. They 
may be made by conventional recombinant DNA techniques or, using the classical method, by 
proteolytic digestion with papain or pepsin. See CURRENT PROTOCOLS IN 
IMMUNOLOGY, chapter 2, Coliganefa/., eds., (JohnWiley & Sons 1991-92). 

F(ab*)2 fragments are typically about 110 kDa (IgG) or about 150 kDa (IgM) and 
contain two antigen-binding regions, joined at the hinge by disulfide bond(s). Virtually all, if 
not all, of the Fc is absent in these fragments. Fab' fragments are typically about 55 kDa 
(IgG) or about 75 kDa (IgM) and can be formed, for example, by reducing the disulfide 
bond(s) of an F(ab')2 fragment. The resulting free sulfhydryl group(s) may be used to 
convenientiy conjugate Fab* fragments to other molecules, such as detection reagents (e.^., 
enzymes). 

Fab fragments are monovalent and usually are about 50 kDa (from any source). Fab 
fragments include the light (L) and heavy (H) chain, variable (Vl and Vh, respectively) and 
constant (Cl C„, respectively) regions of the antigen-binding portion of the antibody. The H 
and L portions are linked by an intramolecular disulfide bridge. 

Fv fragments are typically about 25 kDa (regardless of source) and contain the 
variable regions of both the light and heavy chains (V^ and V„, respectively). Usually, the Vl 
and Vh chains are held together only by non-covalent interacts and, thus, they readily 
dissociate. They do, however, have the advantage of small size and they retain the same 
binding properties of the larger Fab fragments. Accordingly, methods have been developed 
to crosslink the V^^ and V„ chains, using, for exanq>le, glutaraldehyde (or other chemical 
crosslinkers), intermolecular disulfide bonds (by incorporation of cysteines) and peptide 
linkers. The resulting Fv is now a single chain (i.e. , SCFv). 
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Other antibody derivatives include single chain antibodies (U.S. Patent 4,946,778; 
Bird, Science 242:423-426 (1988); Huston era/., Proc. Natl. Acad. Sci. USA 85:5879-5883 
(1988); and Ward et aL, Nature 334:544-546 (1989)). Single chain antibodies are formed by 
linking the heavy and light chain fragments of the Fv region via an amino acid bridge, 
resulting in a single chain FV (SCFv). 

One preferred method involves the generation of scFvs by recombinant methods, 
which allows the generation of Fvs with new specificities by mixing and matching variable 
chains from different antibody sources. In a typical method, a recombinant vector would be 
provided which comprises the appropriate regulatory elements driving expression of a cassette 
region. The cassette region would contain a DNA encoding a peptide linker, with convenient 
sites at both the 5* and 3' ends of the linker for generating fusion proteins. The DNA 
encoding a variable region(s) of interest may be cloned in the vector to form fusion proteins 
with the linker, thus generating an scFv. 

In an exemplary alternative approach, DNAs encoding two Fvs may be ligated to the 
DNA encoding the linker, and the resulting tripartite fusion may be ligated directly into a 
conventional expression vector. The scFv DNAs generated any of these methods may be 
expressed in prokaryoticor eukaryotic cells, depending on the vector chosen. 

Antibody fragments which recognize specific epitopes may be generated by Icnown 
techniques. For example, such fragments include but are not limited to: the F(ab';^ fragments 
which can be produced by pepsin digestion of the antibody molecule and the Fab fragments 
which can be generated by reducing the disulfide bridges of the F(ab]^ fragments. 
Alternatively, Fab expression libraries may be constructed (Huse et al., 1989, Science, 
246:1275-1281) to allow rapid and easy identification of monoclonal Fab fragments with the 
desired specificity. 

Derivatives also include "chimeric antibodies" (Morrison et al, Proc, Natl. Acad, 
ScL, 81:6851-6855 (1984); Neuberger et al. Nature, 312:604-608 (1984); Takeda et al. 
Nature, 314:452-454 (1985)). These chimeras are made by splicing the DNA encoding a 
mouse antibody molecule of appropriate specificity with, for instance, DNA encoding a 
hirnian antibody molecule of appropriate specificity. Thus, a chimeric antibody is a molecule 
in which different portions are derived from different animal species, such as those having a 
variable region derived from a murine mAb and a human immunoglobulin constant region. 
These are also known sometimes as "humanized" antibodies and they offer the added 
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advantage of at least partial shielding from the human immune system, 
particularly useful in therapeutic//! vivo applications. 
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They are, therefore. 


Labeled Antibodies 

The present invention further provides the above-described antibodies in detectably 
labeled form. Antibodies can be detectably labelled through the use of radioisotopes, affmity 
labels (such as biotin, avidin, etc.), enzymatic labels (such as horseradish peroxidase, alkaline 
phosphatase, etc.) fluorescent labels (such as FITC or rhodamine, etc.). paramagnetic atoms, 
etc. Procedures for accomplishing such labeling are well-known in the art, for example see 
(Stemberger et al., J. Histochem, Cytochem, 18:315 (1970); Bayer et al,, Meth. Enzym. 
62:308 (1979); Engval et aL, ImmunoL 109:129 (1972); Coding, J. Immunol. Meth. 13:215 
(1976)). The labeled antibodies of the present invention can be used form vitro, in vivo, and 
in situ diagnostic assays. 

Immobilized Antibodies 

The foregoing antibodies also may be immobilized on a solid support. Examples of 
such solid supports include plastics such as polycarbonate, complex carbohydrates such as 
agarose and sepharose, acrylic resins and such as polyacrylamide and latex beads. 
Techniques for coupling antibodies to such solid supports are well known in the art QNtiret 
ah, ""Handbook of Experimental Immunology"' 4th Ed., Blackwell Scientific Publications, 
Oxford, England, Chapter 10 (1986); Jacoby et aL, Meth. Enzym. 34 Academic Press, N.Y. 
(1974)). The inunobilized antibodies of the present invention can be used ioxin vitro, in vivo, 
and in situ assays as well as for immunoaffinity purification of the proteins of the present 
invention. 

THERAPEUTIC AND DIAGNOSTIC COMPOSmONS 

The proteins, antibodies and polynucleotides of the present invention can be 
formulated according to known methods to prepare pharmaceutically useful compositions, 
whereby these materials, or their functional derivatives, are combined in admixture with a 
pharmaceutically acceptable carrier vehicle. Suitable vehicles and their formulation, inclusive 
of other human proteins, e.g., human serum albumin, are described, for example, in 
Remington's Pharmaceutical Sciences (16th ed., Osol, A., Ed., Mack, EastonPA (1980)). In 
order to form a pharmaceutically acceptable composition suitable for effective administration, 
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such compositions will contain an effective amount of one or more of the agents of the present 
invention, together with a suitable amount of carrier vehicle. 


Pharmaceutical compositions for use in accordance with the present invemion may be 
formulated in conventional manner using one or more physiologically acceptable carriers or 
excipients. Thus, the compounds and their physiologically acceptable salts and solvate may 
be formulated for administration by inhalation or insufflation (either through the mouth or the 
nose) or oral, buccal, parenteral or rectal administration. 

For oral administration, the pharmaceutical compositions may take the form of, for 
example, tablets or capsules prepared by conventional means with pharmaceutically 
acceptable excipients such as binding agents (e.g., pregelatinised maize starch, 
polyvinylpyrrolidone or hydroxypropyl methylcellulose); fillers {e.g., lactose, 
microcrystalline cellulose or calcium hydrogen phosphate); lubricants (e.g., magnesium 
stearate, talc or silica); disintegrants (e.g., potato starch or sodium starch glycolate); or 
wetting agents (e.g., sodium lauryl sulphate). The tablets may be coated by methods well 
known in the art. Liquid preparations for oral administration may take the form of, for 
example, solutions, syrups or suspensions, or they maybe presented as a dry product for 
constinition with water or other suitable vehicle before use. Such liquid preparations may be 
prepared by conventional means with pharmaceutically acceptable additives such as 
suspending agents (e.g., sorbitol syrup, cellulose derivatives or hydrogenated edible fats); 
emulsifying agents (e,g., lecithin or acacia); non-aqueous vehicles (e.g., almond oil, oily 
esters, ethyl alcohol or fractionated vegetable oils); and preservatives (e.g., methyl or propyl- 
p-hydroxybenzoates or sorbic acid). The preparations may also contain buffer salts, 
flavoring, coloring and sweetening agents as appropriate. 

Preparations for oral administration may be suitably formulated to give controlled 
release of the active compound. For buccal administration the composition may take the form 
of tablets or lozenges formulated in conventional manner. 

For administration by inhalation, the con^)ounds for use according to the present 
invention are conveniently delivered in the form of an aerosol spray presentation from 
pressurized packs or a nebuliser, with the use of a suitable propellant, e.g., 
dichlorodifluoromethane, trichlorofluoromethane, dichlorotetrafluoroethane, carbon dioxide 
or other suitable gas. In the case of a pressurized aerosol the dosage unit may be determined 
by providing a valve to deliver a metered amount. Capsules and cartridges of, e.g. gelatin for 
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use in an inhaler or insufflator may be formulated containing a powder mix of the compound 
and a suitable powder base such as lactose or starch. 

The compounds may be formulated for parenteral administration by injection, e.g., by 
bolus injection or continuous infusion. Formulations for injection may be presented in unit 
dosage form, e.g., in ampules or in multi-dose containers, with an added preservative. The 
compositions may take such forms as suspensions, solutions or emulsions in oily or aqueous 
vehicles, and may contain formulatory agents such as suspending, stabilizing and/or 
dispersing agents. Alternatively, the active ingredient may be in powder form for constitution 
with a suitable vehicle, e.g., sterile pyrogen-free water, before use. 

The compounds may also be formulated in rectal compositions such as suppositories 
or retention enemas, e.g., containmg conventional suppository bases such as cocoa butter or 
other glycerides. 

In addition to the formulations described previously, the compounds may also be 
formulated as a depot preparation. Such long acting formulations may be administered by 
implantation (for example subcutaneously or intramuscularly) or by intramuscular injection. 
Thus, for example, the compounds may be formulated with suitable polymeric or 
hydrophobic materials (for example as an emulsion in an acceptable oil) or ion exchange 
resins, or as sparingly soluble derivatives, for example, as a sparingly soluble salt. 

The compositions may, if desired, be presented in a pack or dispenser device which 
may contain one or more unit dosage forms containing the active ingredient. The pack may 
for example comprise metal or plastic foil, such as a blister pack. The pack or dispenser 
device may be accompanied by instructions for administration. 

RECOMBINANT CONSTRUCTS AND EXPRESSION 

The present invention further provides recombinant DNA constructs comprising one 
or more of the nucleotide sequences of the present invention. The recombinant constructs of 
the present invention comprise a vector, such as a plasmid or viral vector, into which a DNA 
or DNA fragment, typically bearing an open reading frame, is inserted, in either orientation. 

The gene products encoded by the subject DNAs may be produced by recombinant 
DNA technology using techniques well known in the art. See, for example, the techniques 
described in Sambrook et al., 1989, supra, and Ausubel et al., 1989, supra. Alternatively, 
the DNA sequences may be chemically synthesized using, for example, synthesizers. See, for 
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example, the techniques described m OLIGONUCLEOTIDE SYNTHESIS, 1984, Gait, ed., 
IRL Press, Oxford, which is incorporated by reference herein in its entirety. They may be 
assembled from fragments and short oligonucleotide linkers, or from a series of 
oligonucleotides. The are preferably made by RT-PCR methods. The resulting synthetic 
gene is capable of being expressed in a recombinant vector. 

In some cases the lecombinant constructs will be expression vectors, which are 
capable of expressing the RNA and/or protein products of the encoded DNA(s). Thus, the 
vector may further comprise regulatory sequences, including for example, a promoter, 
operably linked to the open reading frame (ORF). The vector may further comprise a 
selectable marker sequence. 

Specific initiation signals may also be required for efficient translation of inserted 
target gene coding sequences. These signals include the ATG initiation codon and adjacent 
sequences. In cases where a target DNA includes its own initiation codon and adjacent 
sequences is inserted into the appropriate expression vector, no additional translation control 
signals may be needed. However, in cases where only a portion of an ORF is used, 
exogenous translational control signals, including, perhaps, the ATG initiation codon, must be 
provided. Furthermore, the initiation codon must be in phase with the reading frame of the 
desired coding sequence to ensure translation of the entire target. These exogenous 
translational control signals and initiation codons can be of a variety of origins, both natural 
and synthetic. The efficieiKy of expression may be enhanced by the inclusion of appropriate 
transcription enhancer elements, transcription terminators, etc. (see Bittner£f a/.. Methods in 
EmymoL 153:516-544 (1987)). Some appropriate cloning and expression vectors for use 
with prokaryotic and eukaryotic hosts are described by Sambrook, et al, in Molecular 
Cloning: A Laboratory Manual, Second Edition. Cold Spring Harbor, New York (1989), the 
disclosure of which is hereby incorporated by reference. 

If desired, to enhance expression and facilitate proper protein folding, the codon 
context and codon pairing of the sequence may be optimized for the particular expression 
organism, as explained by Hatfield a/. , U.S. Patent No. 5,082,767. 

The present invention further provides host cells containing at least one of the DNAs 
of the present invention. The host cell can be virtually any cell for which expression vectors 
are available. It may be, for example, a higher eukaryotic host cell, such as a mammalian 
cell, a lower eukaryotic host cell, such as a yeast cell, or the host cell can be a prokaiyotic 
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cell, such as a bacterial cell. Introduction of the recombinant construct into the host cell can 
be effected by calcium phosphate transfection, DEAE, dextran mediated transfection, or 
electroporation (Davis et aL, Basic Methods in Molecular Biology (1986)). 

A wide variety of expression systems are available, such as: yeast {e.g. 
Saccharomyces, Pichia) transformed with recombinant yeast expression vectors contaimng the 
target DNA; insect cell systems infected with recombinant virus expression vectors (e.^., 
baculovirus) containing the target DNA sequences; plant cell systems infected with 
recombinant virus expression vectors {e.g. , cauliflower mosaic virus, CaMV; tobacco mosaic 
virus, TMV) or transformed with recombinant plasmid expression vectors (e.^. Ti plasmid) 
containing target DNA coding sequences; or mammalian cell systems {e.g. COS, CHO, 
BHK, 293, 3T3) harboring recombinant expression constructs containing promoters derived 
from the genome of mammalian cells (e.g., metallothionein promoter) or from mammalian 
viruses {e.g. , the adenovirus late promoter; the vaccinia virus 7.5K promoter). 

Depending on the system chosen, the resulting product may differ. For example, 
proteins expressed in most bacterial cultures, e.g., E. colU will be free of glycosylation 
modifications; polypeptides or proteins expressed in yeast will have a glycosylation pattern 
different from that expressed in mammalian cells. 

Vectors 

Generally, recombinant expression vectors will include origins of replication and 
selectable markers permitting selection of the host cell, e.g. , the ampicillin resistance gene of 
E. coli and S. cerevisiae TRPl gene, and a promoter derived from a highly-expressed gene to 
direct transcription of a downstream structural sequence. Such promoters can be derived 
from operons encoding glycolytic enzymes such as 3-phosphoglycerate kinase (PGK), 
a-factor, acid phosphatase, or heat shock proteins, among others. The heterologous 
structural sequence is assembled in appropriate phase with translation initiation and 
termination sequence, and in one aspect of the invention, a leader sequence capable of 
directing secretion of translated protein into the periplasmic space or extracellular medium. 
Optionally, the heterologous sequence can encode a fusion protein includmg an N-terminal or 
C-terminal identification peptide imparting desired characteristics, e.g., stabilization or 
simplified purification of expressed recombinant product. 
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Bacterial Expression 

Useful expression vectors for bacterial use are constructed by inserting a structural 
DNA sequence encoding a desired protein together with suitable translation initiation and 
termination signals in operable reading phase with a functional promoter. The vector will 
comprise one or more phenoQ^pic selectable markers and an origin of replication to ensure 
maintenance of the vector and, if desirable, to provide amplification within the host. Suitable 
prokaryotic hosts for transformation include E. coli. Bacillus subtilis. Salmonella 
typhimurium and various species within the genera Pseudomonas, Streptomyces, and 
Staphylococcus, although others may, also be employed as a matter of choice. 

Bacterial vectors may be, for example, bacteriophage-, plasmid- or cosmid-based. 
These vectors can comprise a selectable marker and bacterial origin of replication derived 
from commercially available plasmids typically containing elements of the well known 
cloning vector pBR322 (ATCC 37017). Such commercial vectors include, for example, 
GEM 1 (Promega Biotec, Madison, WI, USA), pBs, phagescript, PsiX174, pBluescript SK, 
pBs KS, pNH8a, pNH16a, pNHlSa, pNH46a (Stratagene); pTrc99A, pKK223-3, pKK233-3, 
pKK232-8, pDR540, andpRITS (Pharmacia). 

These "backbone" sections are combined with an appropriate promoter and the 
structural sequence to be expressed. Bacterial promoters include lac, T3. T7, lambda Pr or 
Pi^y trp, and ara. 

Following transformation of a suitable host strain and growth of the host strain to an 
appropriate cell density, the selected promoter is derepressed/induced by appropriate means 
(e.g., temperature shift or chemical induction) and cells are cultured for an additional period. 
Cells are typically harvested by centrifiigation, disrupted by physical or chemical means, and 
the resulting crude extract retained for farther purification. 

In bacterial systems, a number of expression vectors may be advantageously selected 
depending upon the use mtended for the protein being expressed. For example, when a large 
quantity of such a protein is to be produced, for the generation of antibodies or to screen 
peptide libraries, for example, vectors which direct the expression of high levels of fusion 
protein products that are readily purified may be desirable. Such vectors include, but are not 
limited, to the £. coli expression vector pUR278 (Ruther et al., 1983, EMBO 7. 2:1791), in 
which the coding sequence may be ligated into the vector in frame with the lac Z coding 
region so that a fasion protein is produced; pIN vectors (Inouye et al. 1985, Nucleic Acids 
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Res. 13:3101-3109; Van Heeke et al, 1989, J. Biol. Chem, 264:5503-5509); pET vectors, 
Studier et al , Methods in Enzymology 185: 60-89 (Academic Press 1990); and the like. 

Moreover, pGEX vectors may be used to express foreign polypeptides as fusion 
proteins with glutathione S-transferase (GST). In general, such fusion proteins are soluble 
and easily can be purified from lysed cells by adsorption to glutathione-agarose beads 
followed by elution in the presence of free glutathione. The pGEX vectors are designed to 
include thrombin or factor Xa protease cleavage sites so that the cloned target gene protein 
can be released from the GST moiety. 

In a one embodiment, full length cDNA sequences are appended with in-frdmeBamHl 
sites at the amino terminus and EcoRi sites at the carboxyl terminus using standard PCR 
methodologies (Innis et al., 1990, sidpra) and ligated into the pGEX-2TK vector (Pharmacia, 
Uppsala, Sweden). The resulting cDNA construct contains a kinase recognition site at the 
amino terminus for radioactive labeling and glutathione S-transferase sequences at the 
carboxyl terminus for affmity purification (Nilsson, et al 1985, EMBO /. 4: 1075; Zabeau 
and Stanley, mi.EMBOJ. 1:1217. 

Eukaryotic Expression 

Various mammalian cell culture systems can also be employed to express recombinant 
protein. Examples of mammalian expression systems include the COS-7 lines of monkey 
kidney fibroblasts, described by Gluzman, Cell 25:175 (1981), and other cell lines capable of 
expressing a compatible vector, for example, the C127, 3T3, CHO, HeLa and BHK cell 
lines. Mammalian expression vectors will comprise an origin of replication, a suitable 
promoter and enhancer, and also any necessary ribosome binding sites, polyadenylation site, 
splice donor and acceptor sites, transcriptional termination sequences, and 5' flanking 
nontranscribed sequences. DNA sequences derived from the SV40 viral genome, for 
example, SV40 origin, early promoter, enhancer, splice, and polyadenylation sites may be 
used to provide the required nontranscribed genetic elements. 

Manmialian promoters include CMV immediate early, HSV thymidme kinase, early 
and late SV40, LTRs from retrovirus, and mouse metallothionein-I. Exemplary mammalian 
vectors mclude pWLneo, pSV2cat, pOG44, pXTl, pSG (Stratagene)pSVK3, pBPV, pMSG, 
and pSVL (Pharmacia). Selectable markers include CAT (chloramphenicol transferase). 

In manunalian host cells, a number of viral-based expression systems may be utilized. 
In cases where an adenovirus is used as an expression vector, the coding sequence of interest 
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may be ligated to an adenovirus transcription/translation control complex, e.g., the late 
promoter and tripartite leader sequence. This chimeric gene may then be inserted in the 
adenovirus genome by in vitro or in vivo recombination. Insertion in a non-essential region of 
the viral genome (e.g., region El or E3) will result in a recombinant virus that is viable and 
capable of expressing a target protein in infected hosts. (E.^., See Logan et al, 1984, Proc. 
Natl. Acad. Sd. USA 81:3655-3659). 

In one embodiment, cDNA sequences encoding the full-length open reading frames 
are ligated into pCMVB replacing the 6-galactosidase gene such that cDNA expression is 
driven by the CMV promoter (Alam, 1990, .4/ia/. Biochem. 188: 245-254; MacGregore/ al, 
1989, NucL Acids Res. 17: 2365; Norton a/. 19%5, MoL Cell Biol. 5: 281). 

In addition, a host cell strain may be chosen which modulates the expression of the 
inserted sequences, or modifies and processes the gene product in the specific fashion desired. 
Such modifications (e.g., glycosylation) and processing (e.g., cleavage) of protein products 
may be important for the function of the protein. Different host cells have characteristic and 
specific mechanisms for the post-translational processing and modification of proteins. 

Appropriate cell lines or host systems can be chosen to ensure the correct modification 
and processing of the foreign protein expressed. To this end, eukaryotic host cells which 
possess the cellular machinery for proper processing of the primary transcript, glycosylation, 
and phosphorylation of the gene product may be used. Such mammalian host cells include 
but are not limited to CHO, VERO, BHK, HeLa, COS, MDCK, 293, 3T3, WI38, etc. 

For long-term, high-yield production of recombinant proteins in eukaryotic cells, 
stable expression is preferred. Rather than using expression vectors which contain viral 
origins of replication, host cells can be transformed with DNA controlled by appropriate 
expression control elements (e.g., promoter, enhancer, sequences, transcription terminators, 
polyadenylationsites, etc.), and a selectable marker. 

Following the introduction of the foreign DNA, engineered cells may be allowed to 
grow for 1-2 days in an enriched media, and then are switched to a selective media. The 
selectable marker in the recombinant plasmid confers resistance to the selection and allows 
cells to stably integrate the plasmid into their chromosomes and grow to form foci which in 
turn can be cloned and expanded into cell lines. This method may advantageously be used to 
engineer cell lines which express the target protein. Such engineered cell lines may be 
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particularly useful in screening and evaluation of compounds that affect the endogenous 
activity of the protein. 

A number of selection systems may be used, including but not limited to the herpes 
simplex virus thymidine kinase (Wigler, et al. Cell 11:223 (1977)), hypoxanthine-guaninc 
phosphoribosyltransferase(SzybaIskae/ a/., Proc, Natl. Acad, ScL USA 48:2026 (1962)), and 
adenine phosphoribosyltransferase(Lowy, et al , Cell 22:817 (1980)) genes can be employed 
in tk*, hgprt" or aprt* cells, respectively. Also, antimetabolite resistance can be used as the 
basis of selection for dhfr, which confers resistance to methotrexate (Wigler, et al, Proc, 
Natl Acad, ScL USA 77:3567 (1980)); O'Harc, et al., 1981, Proc. Natl. Acad. ScL USA 
78:1527); gpt, which confers resistance to mycophenolic acid (Mulligan ai, Proc. Natl 
Acad. ScL USA 78:2072 (1981)); neo, which confers resistance to the aminoglycoside G-418 
(Colberre-Garapin, et aL , 1981, J. MoL BioL 150:1); and hydro, which confers resistance to 
hygromycin (Santerre, et aL , 1984, Gene 30: 147) genes. 

An alternative fusion protein system allows for the ready purificationof non-denatured 
fusion proteins expressed in human cell lines (Janknecht, et aL, Proc. NatL Acad. ScL USA 
88: 8972-8976 (1991)). In this system, the gene of interest is subcloned into a vaccinia-based 
plasmid such that the gene's open reading frame is translationally fused to an amino-terminal 
tag consisting of six histidine residues. Extracts from cells infected with recombinant 
vaccinia virus are loaded onto NP'*' nitriloacetic acid-agarose colunms and histidine-tagged 
proteins are selectively eluted with imidazole-contauungbufifers. 

In an insect system, Autographa califomica nuclear polyhedrosis virus (AcNPV) is 
used as a vector to express foreign genes. The virus grows in Spodoptera frugiperda cells. 
The target coding sequence may be cloned individually into non-essential regions (for 
example the polyhedrin gene) of the virus and placed under control of an AcNPV promoter 
(for example the polyhedrin promoter). Successful insertion of a target gene codmg sequence 
will result in inactivation of the polyhedrin gene and production of non-occluded recombinant 
virus (i.e., virus lacking the protemaceous coat coded for by the polyhedrin gene). These 
recombinant viruses are then used to infect Spodoptera frugiperda cells in which the inserted 
gene is expressed. {E.g., see Smith et al., 1983, J. ViroL 46: 584; Smith, U.S. Patent No, 
4,215,051). 
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While the present proteins can be expressed in recombinant systems, as described 
above, cell-free translation systems can also be employed to produce such proteins using 
RNAs derived from die DNA constructs of the present invention. 

Purification of Recombinant Proteins 

Recombinant proteins produced may be isolated by host cell lysis. This may be 
followed by one or more salting-out, aqueous ion exchange or size exclusion chromatography 
steps. Finally, high performance liquid chromatography (HPLC) can be employed for final 
purification steps. Microbial cells employed in expression of proteins can be disrupted by any 
convenient method, including freeze-thaw cycling, sonication, mechanical disruption, or use 
of cell lysing agents, like lysozyme and chelators. 

If inclusion bodies are formed in bacterial systems, they may be extracted from cell 
pellets using, for example, detergents, reducing agents, salts, urea, guanidinium chloride and 
extremes of pH {e,g, <4 or >10). If denaturation occurs, protein refolding steps (e.g., 
dialysis) can be used, as necessary, in completing configuration of the mature protein. If 
disulfide bridges are present in the native protein, they may be reoxidized using known 
methods. 

By way of specific non-limiting example, the recombinant bacterial cells, for example 
E. coli, are grown in any of a number of suitable media, for example LB, and the expression 
of the recombinant protein induced by adding IPTG (e.g., lac operator-promoter) to the media 
or switching incubation to a higher temperature (e,g. , X cl*"). After culturing the bacteria for 
a further period of between 2 and 24 hours, the cells are collected by centriiiigation and 
washed to remove residual media. The bacterial cells are then lysed, for exanq>le, by 
disruption in a cell homogenizer and centrifiiged to separate the cell membranes from the 
sohible cell components. If the protein aggregates into inclusion bodies, this centrifiigation 
can be performed under conditions whereby the dense inclusion bodies are selectively 
enriched by incorporation of sugars such as sucrose into the buffer and centrifugation at a 
selective speed. The inclusion bodies can then be washed in any of several solutions to 
remove some of the contaminating host proteins, then solubilized in solutions containing high 
concentrations of urea (e.g. 8M) or chaotropic agents such as guanidinium hydrochloride in 
the presence of reducing agents such as 6-mercaptoethanolor DTT (dithiothreitol). 

At this stage it may be advantageous to incubate the protein for several hours under 
conditions suitable for the protein to undergo a refolding process into a conformation which 
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more closely resembles that of the native protein. Such conditions generally include low 
protein concentrations less than 500 ^ig/ml), low levels of reducing agent, concentrations of 
urea less than 2 M and often the presence of reagents such as a mixture of reduced and 
oxidized glutathione which facilitate the interchange of disulphide bonds within the protein 
molecule. The refolding process can be monitored, for example, by SDS-PAGE or with 
antibodies which are specific for the native molecule. Following refolding, the protein can 
then be purified further and separated from the refolding mixture by chromatography on any 
of several supports including ion exchange resins, gel permeation resins or on a variety of 
affinity coliunns. 

Labeling Proteins 

When used as a component in assay systems such as those described, below, the target 
protein may be labeled, either directly or indirectly, to facilitate detection of the presentres- 
like molecules either in vitro or in vivo. Any of a variety of suitable labeling systems may be 
used including but not limited to radioisotopes such as '^I; enzyme labeling systems that 
generate a detectable colorimetric signal or light when exposed to substrate; and fluorescent 
labels. 

Where recombinant DNA technology is used for protein production the, it may be 
advantageous to engineer fusion proteins that can facilitate labeling, immobilization and/or 
detection. These fusion proteins may, for example, add amino acids which facilitate further 
chemical modification. They also may add a functional moiety, such as an enzyme, which 
directly facilitates detection. 

TRANSGENIC ANIMALS 

The invention further contemplates animal models for studying the function of the 
present molecules and for overproducing the protein products. The disclosed DNA sequences 
may be used in conjunction with techniques for producing transgenic animals that are well 
known to those of skill in the art. 
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To prepare transgenic animals, target gene sequences may for example be introduced 
into, and overexpressed in, the genome of the animal of interest, or, if endogenous target 
gene sequences are present, they may either be overexpressed or, alternatively, be disrupted 
in order to underexpress or inactivate target gene expression, such as described for the 
disruption of apoE in mice (Plumer a/., Cell 71: 343-353 (1992)). 

In order to overexpress a target gene sequence, the coding portion of the target gene 
sequence may be ligated to a regulatory sequence which is capable of driving gene expression 
in the animal and ceil type of interest. Such regulatory regions will be well known to those of 
skill in the art, and may be utilized in the absence of undue experimentation. 

For underexpressionof an endogenous target gene sequence, such a sequence may be 
isolated and engineered such that when reintroduced into the genome of the animal of interest, 
the endogenous target gene alleles will be inactivated. Preferably, the engineered target gene 
sequence is introduced via gene targeting such that the endogenous target sequence is 
disrupted upon integration of the engineered target gene sequence into the animal's genome. 

Animals of any species, including, but not limited to, mice, rats, rabbits, guinea pigs, 
pigs, micro-pigs, goats, and non-human primates, e.g., baboons, monkeys, and chimpanzees 
may be used to generate cardiovascular disease animal models. Goats, cows and sheep are 
particularly preferred for producing protein in vivo. 

Any technique known in the art may be used to introduce a target gene transgene into 
animals to produce the founder lines of transgenic animals. Such techniques include, but are 
not limited to pronuclear microinjection (Hoppe et al., U.S. Pat. No. 4,873,191 (1989)); 
retrovirus mediated gene transfer mto germ lines (Van der Puttenef a/., Proc. NatL Acad. 
ScL, USA 82:6148-6152 (1985)); gene targeting in embryonic stem cells (Thompson a/., 
Cell 56:313-321 (1989)); electroporation of embryos (Lo, Mol. Cell. Biol 3:1803-1814 
(1983)); and spenm-mediated gene transfer (Lavitrano et al.. Cell 57:717-723 (1989)); etc. 
For a review of such techniques, see Gordon, Transgenic Animals, //ir/. Rev. Cytol. 115:171- 
229 (1989). 

The present invention provides for transgenic animals that carry the transgene in all 
their cells, as well as animals which carry the transgene in some, but not all their cells, /.e., 
mosaic animals. The transgene may be integrated as a single transgene or in concatamers, 
e,g.^ head-to-head tandems or head-to-tail tandems. The transgene may also be selectively 
introduced into and activated in a particular cell type by following, for example, the teaching 
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of Lasko et al. (Lasko et al,, Proc. Natl. Acad, ScL USA 89:3232-6236 (1992)). The 
regulatory sequences required for such a cell-type specific activation will depend upon the 
particular cell type of interest, and will be apparent to those of skill in the art. When it is 
desired that the target gene be integrated into the chromosomal site of the endogenous target 
gene, gene targeting is preferred. Briefly, when such a technique is to be utilized, vectors 
containing some nucleotide sequences homologous to the endogenous target gene of interest 
are designed for the purpose of integrating, via homologous recombination with chromosomal 
sequences, into and disrupting the function of the nucleotide sequence of the endogenous 
target gene. 

The transgene may also be selectively introduced into a particular cell type, thus 
inactivating the endogenous gene of interest in only that cell type, by following, for example, 
the teaching of Gu et al. Science 265: 103-106 (1994)). The regulatory sequences required 
for such a cell-type specific inactivation will depend upon the particular cell type of interest, 
and will be apparent to those of skill in the art. 

Once transgenic animals have been generated, the expression of the lecombinant target 
gene and protein may be assayed utilizing standard techniques. Initial screening may be 
accomplished by Southern blot analysis or PCR techniques to analyze animal tissues to assay 
whether integration of the transgene has taken place. The level of mRNA expression of the 
transgene in the tissues of the transgenic animals may also be assessed using techniques which 
include but are not limited to Northern blot analysis of tissue samples obtained from the 
animal, in situ hybridization analysis, and RT-PCR. Samples of target gene-expressing 
tissue, may also be evaluated immunocytochemically using antibodies specific for the target 
gene transgene gene product of interest. 

The transgenic animals that express target gene mRNA or target gene transgene 
peptide (detected immunocytochemically, using antibodies directed against the target gene 
product's epitopes) at easily detectable levels should then be further evaluated to identify those 
animals which display characteristic increased susceptibility to carcinogenesis. Additionally, 
specific cell types within the transgenic animals may be analyzed and assayed in vitro for 
cellular phenotypes characteristic of mutant phenotype. 

Once target gene transgenic founder animals are produced, they may be bred, inbred, 
outbred, or crossbred to produce colonies of the particular animal. Examples of such 
breeding strategies include but are not limited to: outbreeding of founder animals with more 
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than one integration site in order to establish separate lines; inbreeding of separate lines in 
order to produce compound target gene transgenics that express the target gene transgene of 
interest at higher levels because of the effects of additive expression of each target gene 
transgene; crossing of heterozygous transgenic animals to produce animals homozygous for a 
given integration site in order both to augment expression and eliminate the possible need for 
screening of animals by DNA analysis; crossmg of separate homozygous lines to produce 
compound heterozygous or homozygous lines; breeding animals to different inbred genetic 
backgrounds so as to examine effects of modifying alleles on expression of the target gene 
transgene and the possible development of carcinogenesis. One such approach is to cross the 
target gene transgenic founder animals with a wild type strain to produce an Fl generation 
that exhibits increased susceptibility to carcinogenesis. The Fl generation may then be inbred 
in order to develop a homozygous line, if it is found that homozygous target gene transgenic 
animals are viable. 

Methods of generating "knockout" mice using homologous recombination in 
embryonic stem cells are well known in the art. Suitable methods are described, for example, 
in Mansour et aL, Nature, 336:348 (1988); Zijlstra et al. Nature, 342:435 (1989) and 
344:742 (1990); and Hasty et al. Nature, 350:243 (1991). This genomic DNA can be 
obtained by conventional methods using the cDNA sequence as a probe in a commercially- 
available genomic DNA library. 

Briefly, a genomic fragment is cleaved with a restriction endonuclease and a 
heterologous cassette containing a neomycin-resistancegene is inserted at the cleavage site. A 
suitable cassette is the GTHI neo cassette described by Lufkin et al,. Cell 66:1105 (1991). 
The modified genomic fragment is cloned into a suitable targeting vector that is introduced 
into murine embryonic stem cells by electroporation. Cells that have undergone homologous 
recombination (and hence disruption of the gene) are selected by resistance to G418, and used 
to generate chimeric mice using well known methods. See Lufkin et al, supra. Traditional 
breeding methods then can be used to generate mice that are homozygous for the disrupted 
gene. 

The phenotype of mice that are homozygous for the mutation then can be studied to 
provide insights into the role of the protein in, for example, carcmogenesis. These mice also 
can be used as models for developing new treatments for cancers. If this mutation is lethal in 
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homozygous mice (for example during embryogenesis) heterozygous mice, which express 
only half the amount of the protein can also be studied. 


GENE THERAPY APPLICATIONS 

When mutations in the inventive protein, or in the elements controlling expression of 
that protein, are found to be associated with a malignant phenotype. control of cellular 
proliferation can be restored by gene therapy methods. For example, overexpression of the 
protein can be counteracted by concurrent expression of an antisense molecule that binds to 
and inhibits expression of the mRNA encoding the protein. Alternatively, overexpression can 
be inhibited in an analogous manner using a ribozyme that cleaves the mRNA. In another 
embodiment, where expression of a mutated protein induces the malignant phenotype, 
concomitant expression of the non-mutated molecule via introduction of an exogenous gene 
may be used. Methods of using antisense and ribozyme technology to control gene 
expression, or of gene therapy methods for expression of an exogenous gene in this manner 
are well known in the art. 

Each of these methods requires a system for introducing a vector into the cells 
containing the mutated gene. The vector encodes either an antisense or ribozyme transcript of 
the inventive protein. The construction of a suitable vector can be achieved by any of the 
methods well-known in the art for the insertion of exogenous DNA into a vector. See, e.g. , 
Sambrook et al. Molecular Cloning (Cold Spring Harbor Press 2d ed. 1989), which is 
incorporated herein by reference. In addition, the prior art teaches various mediods of 
introducing exogenous genes into cells in vivo. See Rosenberg et al , Science 242: 1575-1578 
(1988) and Wolff et al, PNAS 86:9011-9014 (1989), which are incorporated herein by 
reference. The routes of delivery include systemic administration and administration//! situ. 
Well-known techniques include systemic administration with cationic liposomes, and 
administration in situ with viral vectors. Any one of the gene delivery methodologies 
described in the prior art is suitable for the introduction of a recombinant vector containing an 
inventive gene according to the invention into a MTX-resistant, transport-deficient cancer 
cell. A listing of present-day vectors suitable for the purpose of this invention is set forth in 
Hodgson, Bio/T echnology 13: 222 (1995). which is incorporated by reference. 

For example, liposome-mediated gene transfer is a suitable method for the 
introduction of a recombinant vector containing an inventive gene according to the invention 
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into a MTX-resistant, transport-deficient cancer cell. The use of a cationic liposome, such as 
DC-Chol/DOPE liposome, has been widely documented as an appropriate vehicle to deliver 
DNA to a wide range of tissues through intravenous injection of DNA/cationic liposome 
complexes. See Caplen et al. , Nature Med. 1:39-46 (1995) and Zhue/^/., Science 267:209- 
211 (1993), which are herein incorporated by reference. Liposomes transfer genes to the 
target cells by fusing with the plasma membrane. The entry process is relatively efficient, but 
once inside the cell, the liposome-DNA complex has no inherent mechanism to deliver the 
DNA to the nucleus. As such, the most of the lipid and DNA gets shunted to cytoplasmic 
waste systems and destroyed. The obvious advantage of liposomes as a gene therapy vector is 
that liposomes contain no proteins, which thus minimizes the potential of host immune 
responses. 

As another example, viral vector-mediated gene transfer is also a suitable method for 
the introduction of the vector into a target cell. Appropriate viral vectors include adenovirus 
vectors and adeno-associated virus vectors, retrovirus vectors and herpesvirus vectors. 

Adenoviruses are linear, double stranded DNA viruses complexed with core proteins 
and surrounded by capsid proteins. The common serotypes 2 and 5, which are not associated 
with any human malignancies, are typically the base vectors. By deleting parts of the virus 
genome and inserting the desired gene under the control of a constitutive viral promoter, the 
virus becomes a replication deficient vector capable of transferring the exogenous DNA to 
differentiated, non-proliferating cells. To enter cells, the adenovirus fibre interacts with 
specific receptors on the cell surface, and the adenovirus surface proteins interact with the cell 
surface integrins. The virus penton-cell integrin interaction provides the signal that brings the 
exogenous gene-containing virus into a cytoplasmic endosome. The adenovirus breaks out of 
the endosome and moves to the nucleus, the viral capsid falls apart, and the exogenous DNA 
enters the cell nucleus where it fimctions, in an epichromosomal fashion, to express the 
exogenous gene. Detailed discussions of the use of adenovual vectors for gene therapy can 
be found in Berkner, Biotechniques 6:616-629 (1988) and Trapnell, Advanced Drug Delivery 
Rev. 72:185-199 (1993), which are herein incorporated by reference. Adenovirus-derived 
vectors, particularly non-replicative adenovirus vectors, are characterized by their ability to 
accommodate exogenous DNA of 7.5 kB, relative stability, wide host range, low 
pathogenicity in man, and high titers (10* to 10* plaque fonning units per cell). See Stratford- 
Perricaudetera/., PNAS89:25Sl (1992). 
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Adeno-associated virus (AAV) vectors also can be used for the present invention. 
AAV is a linear single-stranded DNA parvovirus that is endogenous to many mammalian 
species. AAV has a broad host range despite the limitation that AAV is a defective 
parvovirus which is dependent totally on either adenovirus or herpesvirus for its reproduction 
in vivo. The use of AAV as a vector for the introduction into target cells of exogenous DNA 
is well-known in the art. See, e.g., Lebkowski et al. Mole. & Cell. Biol. 8:3988 (1988), 
which is incorporated herein by reference. In these vectors, the capsid gene of AAV is 
replaced by a desired DNA fragment, and transcomplementation of the deleted capsid 
function is used to create a recombinant virus stock. Upon infection the recombinant virus 
uncoats in the nucleus and integrates into the host genome. 

Another suitable virus-based gene delivery mechanism is retroviral vector-mediated 
gene transfer. In general, retroviral vectors are well-known in the art. See Breakfield et aL , 
Mole. Neuro. Biol. 7:339 (1987) and Shih et al., in Vaccines 85: 177 (Cold Spring Harbor 
Press 1985). A variety of retroviral vectors and retroviral vector-producing cell lines can be 
used for the present invention. Appropriate retroviral vectors include Moloney Murine 
Leukemia Virus, spleen necrosis virus, and vectors derived from retrovuuses such as Rous 
Sarcoma Virus, Harvey Sarcoma Virus, avian leukosis virus, human immunodeficiency virus, 
myeloproliferative sarcoma virus, and mammary tumor virus. These vectors include 
replication-competent and replication-defective retroviral vectors. In addition, amphotropic 
and xenotropic retroviral vectors can be used. In carrying out the invention, retroviral 
vectors can be introduced to a tumor directly or in the form of free retroviral vector 
producing-cell lines. Suitable producer cells include fibroblasts, neurons, glial cells, 
keratinocytes, hepatocytes, connective tissue cells, ependymal cells, chromaffin cells. See 
Wolff effl/., PNAS84:33U (1989). 

Retroviral vectors generally are constructed such that the majority of its structural 
genes are deleted or replaced by exogenous DNA of interest, and such that the likelihood is 
reduced that viral proteins will be expressed. See Bender et al., J, Virol. 61:1639 (1987) ami 
Armento et aL, J. Virol. 57:1647 (1987), which are herein incorporated by reference. To 
facilitate expression of the antisense or ribozyme molecule, of the inventive protein, a 
retroviral vector employed in the present invention must integrate into the genome of the host 
cell genome, an event which occurs only in mitotically active cells. The necessity for host 
cell replication effectively limits retroviral gene expression to tumor cells, which are highly 
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replicative, and to a few normal tissues. The normal tissue cells theoretically most likely to 
be transduced by a retroviral vector, therefore, are the endothelial cells that line the blood 
vessels that supply blood to the tumor. In addition, it is also possible that a retroviral vector 
would integrate into white blood cells both in the tumor or in the blood circulating through 
the tumor. 

The spread of retroviral vector to nonnal tissues, however, is limited. The local 
administration to a tumor of a retroviral vector or retroviral vector producing cells will 
restrict vector propagation to the local region of the tumor, minimizing transduction, 
integration, expression and subsequent cytotoxic effect on surrounding cells that are 
mitotically active. 

Both replicatively deficient and replicatively competent retroviral vectors can be used 
in the invention, subject to then" respective advantages and disadvantages. For instance, for 
tumors that have spread regionally, such as lung cancers, the direct mjection of cell lines that 
produce replication-deficient vectors may not deliver the vector to a large enough area to 
completely eradicate the tumor, since the vector will be released only form the original 
producer cells and their progeny, and diffusion is limited. Similar constraints apply to the 
application of replication deficient vectors to tumors that grow slowly, such as human breast 
cancers which typically have doubling times of 30 days versus the 24 hours common among 
human gliomas. The much shortened survival-time of the producer cells, probably no more 
than 7-14 days in the absence of immunosuppression, limits to only a portion of their 
replicative cycle the exposure of the tumor cells to the retroviral vector. 

The use of replication-defective retroviruses for treating tumors requires piXKiucer 
cells and is limited because each replication-defective retrovirus particle can enter only a 
single cell and cannot productively infect others thereafter. Because these replication- 
defective retroviruses cannot spread to other tumor cells, they would be unable to completely 
penetrate a deep, muhilayered tumor in vivo. See Markert et al, Neurosurg. 77: 590 (1992). 
The injection of replication-competentretroviral vector particles or a cell line that produces a 
replication-competent retroviral vector virus may prove to be a more effective therapeutic 
because a replication competent retroviral vector will establish a productive infection that will 
transduce cells as long as it persists. Moreover, replicatively competent retroviral vectors 
may follow the tumor as it metastasizes, carried along and propagated by transduced tumor 
cells. The risks for complications are greater, with replicatively competent vectors, however. 
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Such vectors may pose a greater risk then replicatively deficient vectors of transducing normal 
tissues, for instance. The risks of undesired vector propagation for each type of cancer and 
affected body area can be weighed against the advantages in the situation of replicatively 
competent verses replicatively deficient retroviral vector to determine an optimum treatment. 

Both amphotropic and xenotropic Fetroviral vectors may be used in the invention. 
Amphotropic viruses have a very broad host range that includes most or all mammalian cells, 
as is well known to the art. Xenotropic vuiises can infect all mammalian cell cells except 
mouse cells. Thus, anq)hotropic and xenotropic retroviruses from many species, including 
cows, sheep, pigs, dogs, cats, rats, and mice, inter alia can be used to provide retroviral 
vectors in accordance with the invention, provided the vectors can transfer genes into 
proliferating human cells in vivo. 

Clinical trials employing retroviral vector therapy treatment of cancer have been 
approved in the United States. See Culver, Clin, Chem, 40: 510 (1994). Retroviral vector- 
containing cells have been implanted into brain tumors growing in human patients. See 
Oldfield et al. Hum, Gene Then 4: 39 (1993). These retroviral vectors carried the HSV-1 
thymidine kinase (HSV-tk) gene into the surrounding brain tumor cells, which conferred 
sensitivity of the tumor cells to the antiviral drug ganciclovir. Some of the limitations of 
current retroviral based cancer therapy, as described by Oldfield are: (1) the low titer of vims 
produced, (2) virus spread is limited to the region surrounding the producer cell implant, (3) 
possible immune response to the producer cell line, (4) possible insertional mutagenesis and 
transformation of retroviral infected cells, (5) only a single treatment regimen of pro-diug, 
ganciclovir, is possible because the "suicide" product kills retrovirally infected cells and 
producer cells and (6) the bystander effect is limited to cells in direct contact with retrovirally 
transformed cells. See Bi et al , Himum Gene Therapy 4: 725 (1993). 

Yet another suitable virus-based gene delivery mechanism is herpesvirus vector- 
mediated gene transfer. While much less is known about the use of herpesvirus vectors, 
replication-competent HSV-1 viral vectors have been described in the context of antitumor 
therapy. See Martuza et aL, Science 252: 854 (1991), which is incorporated herein by 
reference. 
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The present invention also contemplates, for certain molecules described below, 
methods for diagnosis of human disease. In particular, patients can be screened for the 
occurrence of cancers, or likelihood of occurrence of cancers, associated with mutations in 
the encoded protein. DNA from tumor tissue obtained from patients suffering from cancer 
can be isolated and the gene encoding the protein can be sequenced. By examining a number 
of patients in this manner, mutations in the gene that are associated with a malignant cellular 
phenotype can be identified. In addition, correlation of the nature of the observed mutations 
with subsequent observed clinical outcomes allows development of prognostic model for the 
predicted outcome in a particular patient. 

Screening for mutations conveniently can be carried out at the DNA level by use of 
PCR, although the skilled artisan will be aware that many other well known methods are 
available for the screening. PCR primers can be selected that flank known mutation sites, and 
the PCR products can be sequenced to detect the occurrence of the mutation. Alternatively, 
the 3' residue of one PCR primer can be selected to be a match only for the residue found in 
the unmutated gene. If the gene is mutated, there will be a mismatch at the 3' end of the 
primer, and primer extension cannot occur, and no PCR product will be obtained. 
Alternatively, primer mixtures can be used where the 3' residue of one primer is any 
nucleotide other than the nonmutated residue. Observation of a PCR product then indicates 
that a mutation has occurred. Otfier methods of using, for example, oligonucleotide probes to 
screen for mutations are described, or example, in U.S. Patent No. 4,871,838, which is 
herein incorporated by reference in its entirety. 

Alternatively, antibodies can be generated that selectively bind either mutated or non- 
mutated protein. The antibodies then can be used to screen tissue samples for occurrence of 
mutations in a manner analogous to the DNA-based methods described j^pm. 

The diagnostic methods described above can be used not only for diagnosis and for 
prognosis of existing disease, but may also be used to predict the likelihood of the future 
occurrence of disease. For example, clinically healthy patients can be screened for mutations 
in the inventive molecule that correlate with later disease onset. Such mutations may be 
observed in the heterozygous state in healthy individuals. In such cases a single mutation 
event can effectively disable proper fimctioning of the gene and induce a transformed or 
malignant phenotype. This screening also may be carried out prenatally or neonatally. 
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DNA molecules according to the invention also are well suited for use in so-called 
"gene chip" diagnostic applications. Such applications have been developed by, inter alia, 
Synteni and Affymetrix. Briefly, all or part of the DNA molecules of the invention can be 
used either as a probe to screen a polynucleotide array on a "gene chip," or they may be 
immobilized on the chip itself and used to identify other polynucleotides via hybridization to 
the surface of the chip. In this manner, for example, related genes can be identified, or 
expression patterns of the gene in various tissues can be simultaneously studied. Such gene 
chips have particular application for diagnosis of disease, or in forensic analysis to detect the 
presence or absence of an analyte. Suitable chip technology is described for example, in 
Wodicka et al. Nature Biotechnology, 15:1359 (1997) which is hereby incorporated by 
reference in its entirety, and references cited therein. 

PROTEIN-PROTEIN INTERACTIONS 

Due to their similarity to certain known proteins, it is anticipated that some of the 
inventive protein molecules will interact with another class of cellular proteins. This is 
particularly true of those molecule containing leucine zipper motifs. 

Any method suitable for detecting protein-protein interactions can be employed for 
identifying interacting targets. Among the traditional methods which can be enq)loyed are co- 
immunoprecipitation, crosslinking and co-purification through gradients or chromatographic 
columns. Utilizing procedures such as these allows for the identification of GAP gene 
products. Once identified, a GAP protein can be used, in conjunction with standard 
techniques, to identify its corresponding pathway gene. For example, at least a portion of the 
amino acid sequence of the pathway gene product can be ascertained using techniques weD 
known to those of skill in the art, such as via the Edman degradation technique (see, e.g, . 
Creighton, 1983, PROTEINS: STRUCTURES AND MOLECULAR PRINCIPLES, W.H. 
Freeman & Co. , N. Y. , pp. 34-49). The amino acid sequence obtained can be used as a guide 
for the generation of oligonucleotide mixtures that can be used to screen for pathway gene 
sequences. Screening can be accomplished, for example, by standard hybridization or PCR 
techniques. Techniques for the generation of oligonucleotide mixtures and for screening are 
well-known. (See e.g. , Ausubel, supra, and PCR PROTOCOLS: A GUIDE TO METHODS 
AND APPUCATIONS, 1990, Innis et al , eds. Academic Press, Inc., New York). 
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Additionally, methods can be employed which result in the simultaneous identification 
of interacting target genes. One method which detects protein interactions i>j vivOy the two- 
hybrid system, is described in detail for illustration purposes only and not by way of 
limitation. One version of this system has been described (Chien et al, Proc, Natl Acad. 
ScL USA, 88: 9578-9582 (1991)) and is commercially available from Clontech (Palo Alto, 
CA). 

Briefly, utilizing such a system, plasmids are constructed that encode two hybrid 
proteins: one consists of the DNA-binding domain of a transcription activator protein fused to 
a known protein, in this case an inventive protein, and the other contains the activator 
protein's activation domain fused to an unknown protein (a putative GAP, for instance) that is 
encoded by a cDNA which has been recombined into this plasmid as part of a cDNA library. 
The plasmids are transformed into a strain of the yodiStSaccharomyces cerevisiae that contains 
a reporter gene (e.g., lacZ) whose regulatory region contains the transcription activator's 
binding sites. Either hybrid protein alone cannot activate transcription of the reporter gene, 
the DNA-binding domain hybrid cannot because it does not provide activation function, and 
the activation domain hybrid cannot because it cannot localize to the activator's binding sites. 
Interaction of the two hybrid proteins reconstitutes the functional activator protein and results 
in expression of the reporter gene, which is detected by an assay for the reporter gene 
product. 

The two-hybrid system or related methodology can be used to screen activation 
domain libraries for proteins that interact with a known "bait" gene product. By way of 
example, and not by way of limitation, gene products knovm to be involved in TH cell 
subpopulation-related disorders and/or differentiation, maintenance, and/or effector function 
of the subpopulations can be used as the bait gene products. Total genomic or cDNA 
sequences are fused to the DNA encoding on activation domain. This library and a plasmid 
encoding a hybrid of the bait gene product fused to the DNA-binding domain are 
cotransformed into a yeast reporter strain, and the resulting transformants are screened for 
those that express the reporter gene. For example, and not by way of limitation, the bait gene 
can be cloned into a vector such that it is translationally fused to the DNA encoding the DNA- 
binding domain of the GAL4 protein. These colonies are purified and the library plasmids 
responsible for reporter gene expression are isolated. DNA sequencing is then used to 
identify the proteins encoded by the library plasmids. 
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The present invention, thus generally described, will be understood more readily by 
reference to the following examples, which are provided by way of illustration and are not 
intended to be limiting of the present invention. 

The examples below are provided to illustrate the subject invention. These examples 
are provided by way of illustration and are not included for the purpose of limiting the 
invention. 

EXAMPLES 
EXAMPLE I: cDNA Library Construction 

cDNA library plates and clones originated from five cDNA libraries that were 
constructed by directional cloning. These are available through the Resource Center 
(http7/www.r2pd.de) of the German Genome Project. In particular, the hfbr2 (human fetal 
brain; RZPD number DKFZp564) and hfkd2 (human fetal kidney; DKFZp566) libraries were 
generated using the Smart kit (Clontech), except that PGR was carried out with primers that 
contained uracil residues to permit directional cloning without restriction digestion and 
ligation, and were complementary with the pAMPl (LifeTechnologies) cloning sites for 
directional cloning. The htes3 (human testes; DKFZp434), hutel (human uterus; DKFZp586) 
and hmcfl (human mammary carcinoma; DKFZp727) libraries are conventional (Gubler, U.. 
Hoffinan, B.J., (1983), A simple and very efficient method for generating cDNA Ubraries. 
Gene 25, 263-269), size-selected cDNA libraries. They are cloned into pSPORTl 
(LifeTechnologies) via a NotI site which is infroduced during reverse transcription 
downstream of the oligo dT primer and a Sail site that is introduced by the ligation of a 
adq)ters. The human mammary carcinoma library was constmcted fgrom MCF7 cells. 

The cDNA sequences of this application were first identified among the sequences 
comprising various libraries. Technology has advanced considerably since the first cDNA 
libraries were made. Many small variations in both chemicals and machinay have been 
instituted over time, and these have improved both the efficiency and safety of the process. 
Although the cDNAs could be obtained using an older procedure, the procedure presented in 
this application is exemplary of one currently being used by persons skilled in the art. For the 
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purpose of providing an exemplary method, the mRNA isolation and cDNA library 
construction described here is for the MCF-7 library (DKFZp727) from which the clones 
named DKFZphmcfl^xxyyxx were obtained. 

The human cell line MCF-7 was grown in DMEM supplemented with 10% fetal calf 
serum until confluency. 3X10* cells were harvested with a cell scraper in PBS. Cells were 
lysed in buffer containing 0.5 % NP-40 to leave the nuclei intact. The debris was pelleted by 
centrifugation at 15 000 x g for 10 minutes at 4 degrees Celsius. Proteins in the supernatant 
were degraded in presence of SDS and Proteinase K (30 minutes at 56 degrees Celsius). 
Precipitation of proteins was done in a Phenol/Chloroform extraction, RNA was precipitated 
from the aqueous phase with Na-acetate and Ethanol. Polyadenylated messages were isolated 
using Qiagen Oligotex (QIAGEN, Hilden Germany). 

First strand cDNA synthesis was accomplished using an oligo (dT) primer which also 
contained an NotI restriction site. Second strand synthesis was performed using a 
combination of DNA polymerase I, £. coli ligase and RNase H, followed by the addition of a 
Sail adaptor to the blunt ended cDNA. The Sail adapted, double-stranded cDNA was then 
digested with NotI restriction enzyme, and fractionated by size on an agarose gel. DNA of the 
appropriate size was cut from the gel and cast into a second gel in a 90^ angle. After 
electrophoresis in the second dimension, cDNA of the appropriate size was cut from the gel. 
The agarose block was broken down with help of gelase. The cDNA was purified with help of 
two phenol extractions and an ethanol precipitation. The cDNA was ligated into Sall/NotI 
pre-digested pSportl vector (LifeTechnologies) and transformed into DHIOB bacteria. 

The libraries were arrayed into 384-well microtiter plates and spotted on high density 
nylon membranes for hybridization analysis. Filters and clones are available through the 
Resource Center. Whole plates were distributed to the sequencing partners of the consortium 
for systematic sequencing. 

EXAMPLE II: Sequencing of cDNA Clones 

All clones in the 384-well microtiter plates were sequenced from the 5' end. 
Sequencing was done preferentially using dye terminator chemistry (ABD or Amersham) on 
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ABI automated DNA sequencers (ABI 377, Applied Biosystems), one partner used EMBL 
prototype instruments (Arakis) mainly with dye primer chemistry. 

The resulting expressed sequence tag (EST) sequences ("rl ESTs" = sequenced from 
5 '-end) were analysed for: 

a) the lack of identical matches with known genes. 

For this, the EST-sequence was blasted against the cDNA consortiums own 
database and after that against public databases and (with BLASTn and BLASTx against 
EMBL/EMBLNEW and assembled ESTs, please refer to EXAMPLE III: Bioinformatics 
analysis of full length cDNAs, for description and parameter settings). ESTs which were 
identical to known genes in more than 100 bp, with less than 2 mismatches, were excluded 
from further analysis. 

b) the presence of an open reading frame 

Open reading frames (ORFs) were detected with an tool developed by Munich 
Information Center for Protem Sequences (MIPS) called ORF-map. ORF-map visualises 
potential start and stop-codons. If an ORF M^thout a stop codon was detected in a rl-EST, 
the sequence was processed fiirther. 

c) the presence of GC rich sequences 

A script developed by MIPS computed the GC-content of the rl -sequence, which 
should be >40%. Writing similar scripts is withm the ordinary skill of one in bioinformatics. 

d) the lack of repeat structures 

Repeats such as Alu, Lme or CA-repeats were detected by blasting (BLASTn and 
BLASTx, please refer to EXAMPLE III: Bioinformatics analysis of full length cDNAs, for 
description and parameter settings) against a repeat-database compiled by MIPS. If a repeat 
was present within the rl -sequence, the sequence were not processed further. 

Novel clones that met all criteria were identified to the sequencers, who then 
performed 3'-end sequencing of these clones. The resulting 3' ESTs ("si ESTs" = sequenced 
from 3 '-end) were checked for 
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a) the lack of matches with known genes in public databases, and sequences already 
generated by us. 


This was done by blasting against EMBL/EMBLNEW and assembled EST (BLASTn 
and BLASTx, please refer to EXAMPLE III: Bioinformatics analysis of fiill length cDNAs, 
for description and parameter settings). 

b) the presence of polyadenylation signals. 

Again only clones matching the selection criteria were chosen to be sequenced 
completely by the sequencers. Clones were selected after the following criteria: 

A very good ORF had at least one BLASTx match to other proteins, A "good ORE" 
should extend to the 3* end and be longer than -40 codons. If the ORF started in the rl 
sequence, in front of the potential start codon, there should not exist too many competing start 
codons in frame with the ORF start codon and the start should match the Kozak consensus 
ATG. If the EST sequence was to short to decide according to the potential ORF, and there 
were only a few or no start codons in the sequence the GC content of the Sequence should be 
greater than 40%. The rl sequences needed not contain an polyA-tail at the 3' end. In 
addition, the results of the blasting against the assembled human ESTs could help in 
questionable cases to decide whether to stop or to continue. A hit against these ESTs was an 
indication to go further. 

Clones passing the above-described screening were sequenced in fiill. Sequencing was 
done preferentially using dye terminator chemistry (ABD or Amersham) on ABI automated 
DNA sequencers (ABI 377, Applied Biosystems), one partner used EMBL prototype 
instruments (Arakis) mainly with dye primer chemistiy. Primer walking (Strauss et aL, 1986, 
Specific-primer-directed DNA sequencing. Anal Biochem. 1 54, 353-360) was the preferred 
sequencing strategy because of the lower redundancy possible compared to random shotgun 
(Messing, J., Crea, R., Seeburg, H.P. (1981) A system for shotgun DNA sequencing. Nucleic 
Acids Res. 9, 32-39) methods. Walking primers were generally designed using software (e.g. 
Haas, S., Vingron, M., Poustka, A., Wiemann, S. (1998) Primer design in large-scale 
sequencing. Nucleic Acids Res. 26, 3006-3012, Schwager, C, Wiemarm, S., Ansorge, W. 
(1995) GeneSkipper: integrated software environment for DNA sequence assembly and 
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alignment. HUGO Genome Digest 2, 8-9) that permitted complete automation of this usually 
time consuming process and helped in the parallel processing of large numbers of clones. 

EXAMPLE III: Bioinformatics analysis of full length cDNAs 

Each sequence obtained was compared on nucleotide level in a stepwise manner to 
sequences in EMBL/EMBLNEW, EMBL-EST. EMBL-STS using the BLASTn algorithm. 
Basic Local Alignment Search Tool (BLAST. Altschul S. F. (1993) J Mol Evol 36:290-300; 
Altschul. S. F. et al (1990) J Mol Biol 215:403-10) is used to search for local sequence 
alignments. BLAST produces alignments of both nucleotide (BLASTn) and amino acid 
sequences (BLASTp or BLASTx) to determine sequence similarity. BLAST is especially 
useful in determining exact matches or in identifying homologs, because of the local nature of 
the alignments. While it is useful for matches which do not contain gaps, it is inappropriate 
for performing motif-style searching. The fundamental unit of BLAST algorithm output is the 
High-scoring Segment Pair (HSP). 

An HSP consists of two sequence fragments of arbitrary but equal lengths whose 
alignment is locally maximal and for which the alignment BLAST approach is to look 
threshold or cut off score set by the user. BLAST looks for HSPs between a query sequence 
and a database sequence, to evaluate the statistical significance of any matches found, and to 
report only those matches which satisfy the user-selected threshold of significance. The 
parameter E establishes the statistically significant threshold for reporting database sequence 
matches. E is interpreted as the upper bound of die expected frequency of chance occurrence 
of an HSP (or set of HSPs) within the context of the entire database search. Any database 
sequence whose match satisfies E is reported in the program ou^uL Parameter settings for 
the BLAST-operations (BLASTN 2.0al9MP-WashU) described were: EMBL-EMBLNEW: 
H=0 V=5 B=5 -filter seg; EMBL-EST: H=0 E=le-10 B=500 V=500 -filter seg; EMBL-STS: 
H=0V=5B=5. 

Search against EMBL/EMBLNEW was done to determine whether the cDNAs are 
already known, and also to find out whether die cDNAs are encoded by genomic sequences 
already sequenced and published/submitted to these databases. 
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Search against EMBL-EST was performed to get a first impression how abundant a 
particular cDNA would be and to get information on tissue specificity (so-called "electronic 
Northern-Blot", e.g. some of the cDNAs derived of the testis library show only hits to ESTs 
also derived of testis libraries). 

The cDNA-sequences were blasted against EMBL-STS to determine STS-sequence- 
match to the cDNA, thus providing a mapping information to the new cDNA. 

The potential protein-sequences were generated automatically by a script searching 
for the longest open reading frame (ORF) in each of the three forward frames with a 
minimum length of 90 codons. Next, the automatically generated ORFs were translated into 
protein sequences. These protein sequences were searched against the non redundant protein 
data set of PIR/SwissProt/Trembel/Tremblnew (BLASTP 2.0al9MP-WashU, parameter 
setting: V=7 B=7 H=0 -filter seg). If the script generated more than one ORF, one ORF was 
chosen manually by the annotater according to the degree of similarity to known proteins, the 
location of the ORF in the cDNA, the length, the amino acid composition and the content of 
Prosite-Motifs. 

Additionally there was a BLASTx (BLASTX 2.0al9MP-WashU against non 
redundant protein database comprising PIR/SWISSPROT/TREMBL/TREMBLNEW; 
parameter-settings were: matrix/home/data/blast/matrix/aa/BLOSUM62 H=0 V=5 B=5 -filter 
seg) search to find potential fi-ame shift in the complementary cds of the cDNAs and to 
identify unspliced or partly spliced cDNAs. The protein sequence was then transferred to the 
PEDANT system, in order to generate additional information on the new proteins. PEDANT 
(Protein Extraction, Description, and ANalysis Tool, Frishman, D. & Mewes, H.-W. (1997) 
PEDANTic genome analysis. Trends in Genetics , 13, 415-416) is a platform developed at the 
Munich Information Center for Protein Sequences (MIPS, Munich, Germany), which 
incorporates practically all bioinfoimatics methods important for the functional and structural 
characterisation of protein sequences. Computational methods used by PEDANT are: 
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Very sensitive protein sequence database searches with estimates of statistical 
significance. Pearson W.R. (1990) Rapid and sensitive sequence comparison with FASTP 
and FASTA. Methods Enzymol. 183, 63-98. 

BLAST2 

Very sensitive protein sequence database searches with estimates of statistical 
significance. Altschul S.F., Gish W., Miller W., Myers E.W., and Lipman D.J. Basic local 
alignment search tool. Journal of Molecular Biology 215, 403-10. 

PREDATOR 

High-accuracy secondary structure prediction from smgle and multiple sequences. 
Frishman, D. and Argos, P. (1997) 75% accuracy in protein secondary structure prediction. 
Proteins, 27, 329-335. Frishman, D. and Argos, P.(1996) Incorporation of long-distance 
interactions in a secondary structure prediction algorithm. Prot. Eng. 9, 133-142. 

STRIDE 

Secondary structure assignment from atomic coordinates. Frishman, D. and Argos, 
P.(1995) Knowledge-based secondary structure assignment. Proteins 23, 566-579. 

CLUSTALW 

Multiple sequence alignment Thompson, J.D., Higgins, D.G. and Gibson, T.J. (1994) 
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through 
sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids 
Research, 22:4673-4680. 

TMAP 

Transmembrane region prediction from multiply aligned sequences. Persson, B. and 
Argos, P. (1994) Prediction of transmembrane segments in proteins utilising multiple 
sequence alignments. J. Mol. Biol. 237, 182-192. 
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Transmembrane region prediction from single sequences. Klein, P., Kanehisa, M., 
and DeLisi, C. Prediction of protein function from sequence properties: A discriminant 
analysis of a database. Biochim. Biophys. Acta 787, 221-226 (1984). Version 2 by Dr. K. 
Nakai. 

SIGNALP 

Signal peptide prediction Nielsen, H., Engelbrecht, J., Brunak, S., and von Heijne, G 
(1997). Identification of prokaryotic and eukaryotic signal peptides and prediction of their 
cleavage sites. Protein Engineering 10, 1-6. 

SEG 

Detection of low complexity regions in protein sequences. Wootton, J.C., Federhen, 
S. (1993) Statistics of local complexity in amino acid sequences and sequence databases. 
Computers & Chemistry 17, 149-163. 

COILS 

Detection of coiled coils. Lupas, A., M. Van Dyke, and J. Stock, "Predicting Coiled 
Coils from Protein Sequences." Science (1991) 252, 1 162-1 164. 

PROSEARCH 

Detection of PROSITE protein sequence patterns. Kolakowski L.F. Jr., Leunissen 
J. A.M., Smith J.E. (1992) ProSearch: fast searching of protein sequences with regular 
expression patterns related to protein structure and function. Biotechniques 13, 919-921. 

BLIMPS 

Similarity searches against a database of ungapped blocks. J.C. Wallace and Henikofif 
S., (1992) P ATM AT: a searching and extraction program for sequence, pattern and block 
queries and databases, CABIOS 8, 249-254. Written by Bill Alford. 

HMMER 
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Hidden Markov model software , Sonnhammer E.L.L., Eddy S.R., Durbin R. (1997) 
Pfam: A Comprehensive Database of Protein Families Based on Seed Alignments. Proteins 
28, 405-420. 

pl 

Perl script that retums the amino acid composition, molecular weight, theoretical pi, and 
expected extinction coeflFicient of an amino acid sequence. By Fred Lindberg. The 
parameter-settings were as follows: knownSd: score > 100; BLAST: E-value < 10; SCOP: <= 
50 Alignments, E-Value < 0.0001; signalp: Y?=0.7; untersucht vom N-Teiminus her: 50 aa; 
funcat: E-value < 0.001; BLOCKS: <= 10 hits; BLIMPS: threshold 1100.0; COILS: threshold 
0.95; SEG: threshold 20.0; BLAST in report: E-value < 0.001; PIR-KW, superfamilies, EC- 
Nummem in report: E-value < 0.00001; known3d in report: score > 120 

The results of PEDANT analysis, together with the results of the similarity searches, 
constitute the basis for the structural and functional annotation of the cDNAs and the encoded 
proteins, as specified below. 

EXAMPLE III: CELLULAR LOCALIZATIONS OF GFP-FUSION PROTEINS 

Plasmids of cDNA-GFP fusions were transfected into mammalian tissue culture cells 
and allowed to express the proteins for up to 48 hours. Live cells were imaged at 24 hours 
and 48 hours after transfection and the localisations recorded. The chart, below, depicts the 
apparent final cellular localisations of 107 cDNA-GFP fusions. 

In order to minimize the possibility of the GPP interfering with protein function 
and/or localization, two separate populations of cDNAs were generated encoding N-terminal 
or C-terminal GFP fusions. Clearly this appears to be a crucial strategy, since overall only 
56% of the proteins localised to a specific compartment irrespective of the position of the 
GFP. In the instances where only one fusion localized, the complementary fusion either gave 
no expression or a nuclear and cytosolic staining - characteristic for GFP alone expression. 

Each cDNA in turn was subjected to bioinformatic analysis. Where possible, the 
potential subcellular localisations of the expressed proteins were determined. This 
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information was then compared to the actual localisations determined from expression of the 
GFP-fusion proteins in mammalian cells. 
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DKFZphfbr2_16cl6 
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group: Cell structure and motility 

DK^phfbr2_16cl6.3 encodes a novel 586 amino acid- protein with ..similarity to the human actin 
banding protein MAYVEN and Drosophila Kelch. 

MAVEN is a novel actin binding protein predominantly expressed in brain. Drosophila kelch is 
involved m the maintenance of ring canal organization during oogenesis. The amino half of the 
protein including the BTB domain mediates dimerization. while the amino half might allow 
cross- linking of ring canal actin filaments, thus organising the inner rim cytoskeleton . The 
kelch repeat domain is necessary for ring canal localisation and believed to mediate an 
additional interaction, possibly with actin. The new protein shares the features of both 
proteins and therefore should be involved in the organisation of cyto skeleton binding to 
membrane proteins. 

interactior^^'' ^^""^ application in modulating/blocking of cyto skeleton -membrane protein 


similarity to Drosophila kelch 
complete cDNA, complete cds, EST hits 

on genomic level partly encoded by AC005082 and AC006039 
Sequenced by Qiagen 
Locus: unknown 


Insert length: 3028 bp 

Poly A stretch at pos. 3004, polyadenylation signal at pos. 2984 


1 GGGGGCCCGG GGACX3CAGCC CAGTTGGTAG CGTCX3CTCCC TGAGCGTTTC 
SI TAAGGGGGCC GCCCGGCCCT GTCTTTCGGC AGTGGCCGAG CCACCGCCGC 
101 CTGCCGCGCG TTCCAGAGCT GGGCGCTGCA GCTGCACTGC CGATCGCCGT 
151 GTTTGGTCGA TAGAATCCCC AGTGTGCCCA GAGAGTGCGA CCCCTCGCCC 
201 GGCCCGGCGA GCCCCGGGCG TGAACCGAGC TGAGGGAGGA TGGCAGCCTC 
251 TOGGGTGGAG AAGAGCAGCA AGAAGAAGAC CGAGAAGAAA CTTGCTGCTC 
301 GGGAAGAAGC TAAATTGTTG GCGGGTTTCA TGGGCGTCAT GAATAACATG 
351 06GAAACAGA AAACGTTGTG TGACGTGATC CTCATGGTCC AGGAAAGAAA 
401 GATACCTGCT CATCGTGTTG TTCTTGCTGC AGCCAGTCAT TTTTTTAACT 
451 TAATGTTCAC AACTAACATG CTTGAATCAA AGTCCTTTGA AGTAGAACTC 
501 AAAGATGCTG AACCTGATAT TATTGAACAA CTGGTGGAAT TTGCTTATAC 
551 TGCTAGAATT TCCGTGAATA GCAACAATGT TCAGTCTTTG TTGGATGCAG 
601 CAAACCAATA TCAGATTGAA CCTGTC»AGA AAATGTGTGT TGATTTTTTG 
651 AAAGAACAAG TTGATGCTTC AAATTGTCTT GGTATAAGTG TGCTAGCGGA 
701 GTGTCTAGAT TGTCCTGAAT TGAAAGCAAC TGCAGATGAC TTTATTCATC 
751 AGCACTTTAC TGAAGTTTAC AAAACTGATG AATTTCTTCA ACTTfiATGTC 
801 AAGCGAGTAA CACATCTTCT CAACCAGGAC ACTCTGACTG TGAGAGCAGA 
851 GGATCAGGTT TATGATGCTG CAGTCAGGTG GTTGAAATAt * GATGAGCCTA 
901 ATCGCCAGCC ATTTATGGTT GATATCCTTG CTAAAGTCAG GTTTCCTCTT 
951 ATATCAAAGA ATTTCTTAAG TAAAACGGTA CAAGCTGAAC CACTTATTCA 
10 01 AGACAATCCT GAATGCCTTA AGATGGTGAT AAGTGGAATG AGGTACCATC 
1051 TACTGTCTCC AGAGGACCGA GAAGAACTTG TAGATGGCAC AAGACCTAGA 
1101 AGAAAGAAAC ATGACTACCG CATAGCCCTA TTTGGAGGCT CTCAACCACA 
1151 GTCT TGTA GA TATTTTAACC CAAAGGATTA TAGCTGGACA GACATCCX5CT 
1201 GCCCCTTTGA AAAACGAAGA GATGCAGCAT GCGTGTTTTG GGACAATGTA 
1251 GTATACATTT TGGGAGGCTC TCAGCTTTTC CCAATAAAGC GAATGGACTG 
1301 CTATAATGTA GTGAAGGATA GCTGGTATTC GAAACTGGGT CCTCCGACAC 
1351 CTCGAGACAG CCTTGCTGCA TGTGCTGCAG AAG6CAAAAT TTATACATCT 
1401 GGAGGTTCAG AAGTAGGAAA CTCAGCTCTG TATTTATTTG AGTGCTATGA 
1451 TACGAGAACT GAAAGCTGGC ACACAAAGCC CAGCATGCTG ACCCAGCGCT 
ISOl GCAGCCATGG GATGGTGGAA GCCAATGGCC TAATCTATGT TTGTGGTGGA 
1551 AGTTTAGGAA ACAATGTTTC AGGGAGAGTG CTTAATTCXTT GTCAAGTTTA 
1601 TGATCCTGCC ACAGAAACAT GGACTGAGCT GTGTCC3UITG ATTGAAGCCA 
1651 GGAAGAATCA TGGGCTGGTA TTTGTAAAAG ACAAGATATT TGCTGTGGGT 
1701 GGTCAGAATG GTTTAGGTGG TCTGGACAAT 6TGGAATATT ACXUVTATTAA 
1751 GTTGAACGAA TGGAAGATGG TCTCACCAAT GCCATGGAAG GGTGTAACAG 
1801 TGAAATGTGC AGCAGTTGGC TCTATAGTTT ATGTCTTGGC TGGTTTTCAG 
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2301 AGAAGATTGG CTCATCAGTG AAGCGCAGTA TCTTAGCTCT AGATTCTATT 
2351 TTCATGCATC ACAGAAGTGC TATACGGTTA GGTCTGTTTG TGCTCAGTCA 
2401 AGAACTAAGA AATAGTATGA ATTGTAAGTC AAGATGGGCA ACTCAGATGG 
2451 AGCAGCTTAG TCTCACAGTT TGCTTGTCTA TTTATTTTAT TTAGTGCCAA 
2501 ATGTATTCCA TTTTAAAAGT AAGCCAGAGT GAGTCAAGGC ATATACACAC 
2551 TTTCTCACAA AACTTCCTAA ACAGATTTGG GGGTTTAATA TGTCCAACTC 
2601 CTCATGAAAT ATATTCAATC CACTTAAATA TATTCCATCT TTTTAACATA 
2 651 AAATGTAAAG CTTAGCACCC ATCATTAATT TATGTCTCTG TTTTATCCAG 
2701 TGGTTAAAAA AGGATTCTGC CTCTTTAGTC CTCACTGTTA AATAAAACCC 
2751 AATCATAGTA AGTGATTAAC TAGCAAAAAG TAAAGCTATT TATAGCAAAT 
2801 TTCTAGATCA TTAGAAAAGC ACTGGTAGTT GTACAATATC AGTGTTGACT 
2851 TTGAACTTCT TTAACGAGAT CATGAATTCT TTTCCCTTAG CCAAAACATG 
2901 AAATATTTAA CCTAGTTGTC TCTAAAAGTT TTGTAATCAT GAGTTAGATA 
2951 TATGTCATCT CCTATTCATT GCTTTTATGT GATCAATAAA TCTTTTACAA 
3001 ACCCAAAAGA AAAAAAAAAA AAAAAAAA 


BLAST Results 


Entry AC005082 from database EMBL; 

Homo sapiens clone RG271G13; HTGS phase 1, 7 unordered pieces. 
Score = 6460, P = O.Oe+00, identities = 1292/1292 

4 exons matching Bp 1180-3007 

Entry AC006039 from database EMBL: 

*** SEQUENCING IN PROGRESS *♦* Homo sapiens clone NH0319F03; HTGS phase 
1, 3 unordered pieces. 

Score = 1780, P = 2.0e-117, identities = 368/377 

5 exons matching Bp 6-860 

Entry HSG20603 from database CMBL: 
human STS A005y34. 
Score - 670, P - l.Oe-23, identities - 134/134 


Medline entries 


93201592: 

kelch encodes a component of intercellular bridges in 
Drosophila egg chambers. 

97412177: 

Drosophila keXch is an oligomeric ring canal actin organizer. 
Peptide information for frame 3 


ORE from 240 bp to 1997 bp; peptide length: 586 
Category: strong similarity to known protein 


1 MAASGVEKSS KKKTEKKLAA REEAKLLAGF MGVMNNMRKQ KTLCDVILMV 

51 QERKIPAHRV VLAAASHFFN LMFTTNMLES KSFEVELKDA EPDIIEQLVE 

101 FAYTARISVN SNNVQSLLDA ANQYQIEPVK KMCVDFLKEQ VDASNCLGIS 

151 VLAECLDCPE LKATADDFIH QHFTEVYKTD EFLQLDVKRV THLLNQDTLT 

201 VRAEDQVYDA AVRWLKYDEP NRQPFMVDIL AKVRFPLISK NFLSKTVQAE 

251 PLIQDNPECL KMVISGMRYH LLSPEDREEL VDGTRPRRKK HDYRIALFGG 

301 SQPQSCRYFN PKDYSWTDIR CPFEKRRDAA CVFWDNWYI LGGSQLFPIK 

351 RMDCYNWKD SWYSKLGPPT PRDSLAACAA EGKIYTSGGS EVGNSALYLF 

401 ECYDTRTESW HTKPSMLTQR CSHGMVEANG LIYVCGGSLG NNVSGRVLNS 

451 CEVYDPATET WTELCPMIEA RKNHGLVFVK DKIFAVGGQN GLGGLDNVEY 

501 YDIKLNEWKM VSPMPWKGVT VKCAAVGSIV YVLAGFQGVG RLGHILEYNT 

551 ETDKWVANSK VRAFPVTSCL ICVVDTCGAN EETLET 

BLASTP hits 

Entry KELC_DROME from database SWISSPROT: 
RING CANAL PROTEIN (KELCH PROTEIN). 
Length =689 

Score = 816 (287.2 bits), Expect = 1.9e-81, P - 1.9e-81 
Identities = 187/542 (34%), Positives « 290/542 (53%) 

Entry AC004021_1 from database TREMBL: 

WUGSC:H_DJ0186K10.1''; Human PAC clone DJ0186K10 from 5q31, 
complete sequence. Homo sapiens (human) 
Length - 497 
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Score = 704 (247.8 bits). Expect = 1.4e-69, P = 1.4e-69 
Identities = 163/483 (33%), Positives = 253/483 (52%) 

Entry HSDKG12_1 from database TREMBL: 

"KIAA0132"; Human mRNA for KIAA0132 gene, complete cds. Homo 

sapiens (human) 
Length '= 624 

Score = 692 (243.6 bits). Expect = 2.6e-68, P = 2.6e-68 
Identities = 175/527 (33%), Positives = 272/527 (51%) 

Entry A45773 from database PIR: 

kelch protein, long form - fruit fly (Drosophila melanogaster) 
Length = 1476 

Score = 817 (287.6 bits). Expect = 1.7e-80, P = 1.7e-80 
Identities = 189/549 (34%), Positives = 292/549 (53%) 


Alert BLAST P hits for DKrZphfbr2_16cl6, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_16cl6, frame 3 


Report for DKFZphfbr2_16cl6.3 


(LENGTH] 

586 


(MWJ 

65992.06 


Ipll 

6.08 


IHOMOI4J 

PIR:A45773 kelch protein, long form - fruit fly 

(Drosophila melanogaster) 5e- 

(BLOCKS] 

BL00075D Dihydrofolate reductase proteins 


(SCOP J 

dlgog_3 2.46.1.1.1 (151-537) Galactose oxidase. 

central domai 6e-36 

(PIRKW] 

zinc finger 2e-ll 


fPIRKW] 

DNA binding 9e-10 


(PIRKW J 

transcription factor le-06 


(SUPFAM) 

A55R protein middle region homology le-35 


(SUPFAM] 

P02 domain homology le-35 


[SUPFAM] 

vaccinia virus 59K Hindlll-C protein 5e-15 


[SUPFAM] 

A55R protein le-35 


(SUPFAM] 

myxoma virus M9-R protein 2e-ll 


(SUPFAM) 

A55R protein carboxyl-terminal homology le-35 


[PROSITEJ 

CAMP PHOSPHO SITE 2 


(PROSITEJ 

MYRISTYL 8 


(PROSITEJ 

CK2 PHOSPHO SITE 10 


[PROSITEJ 

TYR PHOSPHO"SITE 1 


[PROSITEJ 

PKC PHOSPHO SITE 11 


[PROSITEJ 

ASNGLYCOSYLATION 1 


[KW] 

Alpha Beta 


[KW] 

LOW_COMPLEXITY 3.75 % 



SEQ MAASGVEKSSKKKTEKKLAAREEAKLLAGFMGVMNNMRKOKTLCDVILMVQERKIPAHRV 

SEG XXXXXXXKXXXXXXXXXXXXXX 

PRD ccceeeeccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhheeeeccccchhhhhe 

SEQ VLAAASHFFNLMFTTNMLESKSFEVELKDAEPDIIEQLVEFAYTARISVNSNNVQSLLDA 

SEG 

PRD eeccccccccccccccchhhhhheeeeccccchhhhhhhhhhhhheeeeccchhhhhhhh 

SEQ ANQYQIEPVKKMCVDFLKEQVDASNCLGISVLAECLDCPELKATADDFIHQHFTEVYKTD 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ EFLQLDVKRVTHLLNQDTLTVRAEDQVYDAAVRWLKYDEPNRQPFMVDILAKVRFPLISK 

SEG 

PRD hhhchhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhccch 

SEQ NFLSKTVQAEPLIQDNPECLKMVISGMRYHLLSPEDREELVDGTRPRRKKHDYRIALFGG 

SEG 

PRD hhhhhhhhhhhccccccchhhhhhhhhhhhccccccccccccccccccccceeeeeeecc 

SEQ SQPQSCRYFNPKDYSWTDIRCPFEKRRDAACVFWDNVVYILGGSQLFPIKRMDCYNVVKD 

SEG 

PRD ccccceeeccccccccccccccccccceeeeeeeceeeeeeccccccccceeeecccccc 

SEQ SWYSKLGPPTPRDSLAACAAEGKIYTSGGSEVGNSALYLFECYDTRTESWHTKPSMLTQR 

SEG : 

PRD cccccccccccccceeeeeccceeeeeccccccccceeeeeecccccccccccccccccc 
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SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 


CSHGMVEANGLIYVCGGSLGNNVSGRVLNSCEVYDPATETWTELCPMIEARKNHGLVFVK 

ccceeeecceeeeeecccccccccccccceeeeccccccccccccccccccccceeeeec 

OKI FAVGGQNGLGGLDNVEYYDI KLNEWKMVSPMPWKGVTVKCAAVGSIVYVLAGFQGVG 

ceeeecccccccccccceeeccccccceeecccccccccceeeeeccceeeeeccccccc 

RLGHILEYNTETDKWVANSKVRAFPVTSCLICVVDTCGANEETLET 

cccceeecccccccccccccccccccceeeeeeeeccccccccccc 


Prosite for DKF2phfbr2_16cl6.3 


PSOOOOl 

442 

->446 

PS00004 

1 

1->15 

PS00004 

188 

->192 

PSO0OO5 


9->12 

PS00005 

10->13 

PS00005 

1- 

4->17 

PS00005 

104- 

->107 

PS00005 

200 

->203 

PSO0OO5 

305 

->308 

PS00005 

370- 

->373 

PS00005 

418- 

->421 

PS00005 

444- 

->447 

PS00005 

520- 

->523 

PS00005 

552- 

->555 

PSO0O06 


4->8 

PS00006 

42->46 

PS00006 

116- 

->12D 

PS00006 

164- 

->168 

PS00006 

273- 

->277 

PS00006 

315- 

->319 

PS00006 

370- 

->374 

PS00006 

405- 

'>409 

PS00006 

460- 

>464 

PS00006 

550- 

>554 

PS00007 

202- 

>209 

PS00008 

5 

.->11 

PS00008 

32 

'->38 

PS00008 

389- 

>395 

PS00008 

424- 

>430 

PS00008 

436- 

>442 

PS00008 

440- 

>446 

PS00008 

487- 

>493 

PS00008 

493- 

>499 


ASN_GLYCOSYLATION 

C AMPPHOS PHO_S I TE 

CAMP_ PHOSPHORS I TE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOS PHO_S I TE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_ PHOS PHO_S I TE 

PKC_PHOSPHq_SITE 

PKC_PHOS PHq_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPH0_SITE 

CK2_PHOSPHO SITE 

CK2_PHOSPH02SITE 

CK2_PHOSPHO SITE 

CK2_PHOSPHO~SITE 

CK2_PHOSPH0_SITE 

CK2_PH0SPH0_SITE 

CK2 PHOS PHO_SI TE 

CK2_PH0S PHO_S I TE 

TYR_PHOSPHq_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 


PDOCOOOOl 

PEXx:oooo4 

PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOCO0006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00008 
PDOCOOOOB 
PDOC00008 
PDOCOOOOB 
PDOCOOOOB 
PDOCOOOOB 
PDOCOOOOB 
PDOCOOOOB 


<No Pfam data available for DKFZphfbr2_16cl6.3) 
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group: brain derived 

finger^protein^2ir^°^ ^ "^^^^ ^"^"^ ^^^^ protein with strong similarity to human zinc 
finger^^ Protein shows strong similarity to the human zinc finger protein 216, but has no Zn 
SCOP^Iotifr^^^"* ""^ ^^""^ finger; No informative BLAST results; no predictive prosite, pfam or 
gener** ^'^^^^^'^ ^^"^ application in studying the expression profile of brain-specific 


strong similarity to zinc finger protein 216 

complete cDNA, complete cds, EST hits 
start matches Kozak consensus ANNatgG, 

Sequenced by Qiagen 

Locus : unknown 

Insert length: 1512 bp 

Poly A stretch at pos. 1490, polyadenylation signal at pos . 1474 


1 GGGAGCAAGC AGGGGTTCGG CGGCATTACC TGTACCCATT CACCGGCGGC 
51 TACCGGCGGC GGCGCGTAGC GTGTCAGGCG GAGAGACCCG CCGCCAGGTG 
101 TGCAACTGAG GAACATGGCT CAAGAAACTA ATCACAGCCA AGTGCCTATG 
151 CTTTGTTCCA CTGGCTGTGG ATTTTATGGA AACCCTCGTA CAAATGGCAT 
201 GTGTTCAGTA TGCTATAAAG AACATCTTCA AAGACAGAAT AGTAGTAATG 
251 GTAGAATAAG CCCACCTGCA ACCTCTGTCA GTAGTCTGTC TGAATCTTTA 
301 CCAGTTCAAT GCACAGATGG CAGTGTGCCA GAAGCCCAGT CAGCATTAGA 
351 CTCTACATCT TCATCTATGC AGCCCAGCCC TGTATCAAAT CAGTCACTTT 
401 TATCAGAATC TGTAGCATCT TCTCAATTGG ACAGTACATC TGTGGACAAA 
451 GCAGTACCTG AAACAGAAGA TGTGCAGGCT TCAGTATCAG ACACAGCACA 
501 GCAGCCATCT GAAGAGCAAA GCAAGCCTCT TGAAAAACCG AAACAAAAAA 
551 AGAATCGCTG TTTCATGTGC AGGAAGAAAG TGGGACTTAC TGGGTTTGAA 
601 TGCCGGTGTG GAAATGTTTA CTGTGGTGTA CACCGTTACT CAGATGTACT 
651 CAATTGCTCT TACAATTACA AAGCCGATGC TGCTGAGAAA ATCAGAAAAG 
701 AAAATCCAGT AGTTGTTGGT GAAAAGATCC AAAAGATTTG AACTCCTGCT 
751 GGAATACAAA ATTCTTGAGC ATCTGCAAAC TAAAAATTGA CTTGAGGTTT 
801 TTTTTTTCCT AGTCATTGGG AATGTAGAGC AGTGTATCTT GCATGTCATC 
851 GGAAGAATAG ATTTTTGTTT TGGTTTTGTT TTGAAAATGA CTCTGAACAT 
901 TTATTTCCAT TGCAATTTCT GTGGCTGAGG AGACTTAAAC TTTACAAGTA 
951 TTATCCTTTT AAGATCATTT TAATTTTAGT TGAGTGCAGA GGGCTTTTAT 
1001 AACAAACGTG CAGAAATTTT GGAGGGCTGT GATTTTTCCA GTATTAAACA 
1051 TGCATGCATT AATCTTGCAG TTTATTTTCT CATTATGTAT GTATATATCG 
1101 CTTTTCTCTG CAGCACGATT TCTCTTTTGA TAATGCCCTT TAGGGCACAA 
1151 CTAGTTATCA GTAACTGAAT GTATCTTAAT CATTATGGCT GCTTCTGTTT 
1201 TTTCATTAAC AAAGGTTATT CATATGTTAG CATATAGTTT CTTTGCACCC 
1251 ACTATTTATG TCTGAATCAT TTGTCACAAG AGAGTGTGTG CTGATGAGAT 
1301 TGTAAGTTTG TGTGTTTAAA CTTTTTTTTG AGCGAGGGAA GAAAAAGCTG 
1351 TATGCATTTC ATTGCTGTCT ACAGGTTTCT TTCAGATTAT GTTCATGGGT 
1401 TTGTGTGTAT ACAATATGAA GAATGATCTG AAGTAATTGT GCTGTATTTA 
1451 TGTTTATTCA CCAGTCTTTG ATTAAATAAA AAGGAAAACC AGAAAAAAAA 
1501 AAAAAAAAAA AA 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 1 
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ORF from 115 bp to 738 bp; peptide length: 208 
Category: strong similarity to known protein 


1 MAQETNHSQV PMLCSTGCGF YGNPRTNGMC SVCYKEHLQR QNSSNGRISP 
51 PATSVSSLSE SLPVQCTDGS VPEAQSALDS TSSSMQPSPV SNQSLLSESV 
101 ASSQLDSTSV DKAVPETEDV QASVSDTAQQ PSEEQSKPLE KPKQKKNRCF 
151 MCRKKVGLTG FECRCGNVYC GVHRYSDVLN CSYNYKADAA EKIRKENPVV 
201 VGEKIQKI 


BLASTP hits 

Entry ATF7H19_1 from database TREMBLNEW: 

gene: "FVHig.lO"; product: "putative protein"; Arabidopsis thaliana DNA 
chromosome 4, BAC clone F7H19 (ESSAII project) >TREMBL:ATT12H17_21 
gene: "T12H17 .aiO**; product: -predicted protein"; Arabidopsis thaliana 
DNA chromosome 4, BAC clone T12H17 (ESSAII project) 
Score - 206, P - 2.1e-24, identities « 51/146, positives « 77/146 

Entry PVPVPr3A_1 from database TREMBL: 

gene: "PVPRa"; P. vulgaris PVPR3 protein mRNA, complete cds. 
Score = 237, P = 4.9e-20, identities = 50/136, positives = 73/136 

Entry AF062072_1 from database TREMBL: 

gene: '•2NF216''; product: "zinc finger protein 216*'; Homo sapiens zinc 
finger protein 216 (ZNF216) gene, complete cds. 

Score = 591, P = 1.6e-57, identities = 124/215, positives = 147/215 


Alert BLAST? hits for DKFZphfbr2_16f21, frame 1 

TREMBL: AFC 62 07 1_1 product: "zinc finger protein ZNF216"; Mus musculus 
zinc finger protein ZNF216 mRNA, complete cds., N = 1, Score = 590, P = 
2.1e-57 


TREMBLNEW :AB001773_1 gene: "pem-6"; product: "PEM-6"; Ciona savignyi 
pem-6 (posterior end marJc 6) jnRNA, complete cds., N =» 1, Score = 421, P 
= 1.7e-39 


>TREKBL:AF06207l_l product; "zinc finger protein ZNF216"; Mus musculus zinc 
finger protein ZNF216 mRNA, complete cds. 
Length « 213 


HSPs: 


Score = 590 (8B.5 bits). Expect - 2.1e-57, P = 2.1e-57 
Identities = 123/213 (57%), Positives = 146/213 (68%) 


Query: 

1 

Sbjct: 

1 

Query: 

58 

Sbjct : 

60 

Query: 

116 

Sbjct: 

119 

Query: 

174 

Sbjct: 

179 


MAQETNHSQVPMLCSTGCGFYGNPRTNGMCSVCYKEHLQRQNSSNGRISPPAT— SVSS 57 
MAQETN + PMLCSTGCGFYGNPRTNGMCSVCYKEHLQRQ +S GR+SP T SB 
MAQETNQTPGPMLCSTGCGFYGNPRTNGMCSVCYKEHLQRQQNS-GRMSPMGTASGSNSP 59 


S+S VQ D + + A STS 


PV+ 


++ S+ D 


+E V 


S + QPS QS 


-KPLEKPKQKKNRCFMCRKKVGLTGFECRCGNVYCGVH 17 3 
K E PK KKNRCFMCRKKVGLTGF+CRCGN++CG+H 


RYSD NC Y+YKA+AA KIRKENPVW EKIQ+I 


Pedant information for DKFZphfbr2 16f21, frame 1 


Report for DKFZphfbr2_16f21 . 1 


[LENGTH} 
(MWJ 
[pll 
[HOHOL] 


208 

22541.23 
6.80 

TREMBL:AF062072 


1 gene: "ZNF216"; product: "zinc finger protein 216"; Homo 


sapiens zinc finger protein 216 (ZNF216) gene, complete cds. 9e-57 


[PIRKWJ 
IPIRKW] 


zinc 8e-13 
zinc finger 8e-13 
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[PIRKW] fusion protein 8e-l3 

[SUPFAMj unassigned ubiquitin-related proteins 8e-13 

[SUPFAM) ubiquitin homology 8e-13 

(PROSITE] MYRISTYL 2 

tPROSITEl CK2_PH0SPH0_SITE 7 

(PROSITEl ASN_GLYC0SYLATION 4 

[KW] Irregular 

tKW) LOW COMPLEXITY 7.21 % 


SEQ 
SEG 
PRO 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 


MAQETNHSQVPKLCSTGCGFYGNPRTNGMCSVCYKEHLQRQNSSNGRISPPATSVSSLSE 

ccccccccccccccccccccccccccccccchhhhhhhhhhccccccccccccccccccc 

SLPVQCTDGSVPEAC3SALDSTSSSMQPSPVSNQSLLSBSVASSQLDSTSVDKAVPETEDV 

xxxxxxxxxxxxxxx 

cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

QASVSDTAQQPS EEQSKPLEK PKQKKNRC FMCRKKVGLTGFECRCGNVYCGVHRYSDVLN 

cccccccccccccccccccccccccccceeecccccccceeecccccccccccccccccc 

CSYNYKADAAEKIRKENPVVVGEKIQKI 

ccchhhhhhhhhhhhhcccccccccccc 


Prosite for DKrzphfbr2_16f21. 1 


PSOOOOl 
PSOOOOl 
PSOOOOl 
PSOOOOl 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00008 
PS00008 


6->10 
42->45 
92->96 
180->184 
57->61 
70->74 
76->80 
103~>107 
108->112 
123->127 
159->163 
22->2B 
166->172 


ASN^GLYCOSYLATION 

ASNGLYCOSYLATION 

ASN_GLYCOSYLATI0N 

ASNGLYCOSYLATION 

CK2_PHOSPHO SITE 

CK2_PHOSPHO^SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO SITE 

MYRISTYL 

MYRISTYL 


FDOCOOOOl 
PDOCOOObl 

PDOCOOOOl 
PDOCOOOOl 
POOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 

PDocooooa 


(No Pfam data available for DKFZphfbr2_16f21.1) 
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DKF2phfbr2_16gl8 


group: cell cycle 

DKFZphfbr2_16gl8,3 encodes a novel 984 amino acid protein with similarity to centromeric 
proteins of yeasts. 

The novel protein shows similarity to S. pombe SPAC17A5.07c and the S. cerevisiae Srat4p 
suppressor of MIF2 gene. MIF2 encodes a centromeric protein with homology to the mammalian 
centromeric protein CENP-C. Mutations in MIF2 stabilise dicentric rainichromosomes and confer 
high instability to chromosomes that bear a cis-acting mutation in element I of the yeast 
centromeric DNA (CDEI) . Therefore the new protein should be involved in centronier 
organisation, too. 

The new protein can find application in modulating/blocking the cell cycle and influencing the 
behavior of chromosomes, both natural and artificial in eukaryotic cells. 


similarity to KIAA0797 and yeast Smt4p 
complete cDNA, complete cds, EST hits 

the yeast Srat4 protein seems to be involved in centromer function 
and microtuble organisation 

Sequenced by Qiagen 

Locus : unknown 

Insert length: 4826 bp 

Poly A stretch at pos. 4756, polyadenylation signal at pos. 4736 


1 GGGTCGAGGT CGACGGTATC GATAAGTTTT t^tTTTTTTT tTTTTTTTTT 
51 TTTTCCTTTC CCCTCCCCCT CCCTCTCCAA GCCGGAGGGG TCCTGAGGTG 
101 ACAGCGCCTG CAACTGAAAT TTCAGCAGCG GGAGAAGATG GACAAGAGAA 
151 AGCTCGGGCG ACGGCCATCT TCATCCGAAA TCATCACAGA AGGAAAAAGG 
201 AAAAAGTCAT CTTCTGATTT ATCGGAGATA AGAAAGATGT TAAATGC7W\A 
251 ACCAGAGGAT GTCCATGTTC AATCACCACT GTCCAAATTC AGAAGCTCAG 
301 AACGCTGGAC TCTCCCTTTG CAGTGGGAAA GAAGCCTAAG GAATAAAGTC 
351 ATCTCTCTAG ACCATAAAAA TAAAAAACAT ATCCGAGGGT GTCCTGTTAC 
401 TTCCAGGTCA TCACCAGAAA GGATACCCAG AGTTATATTG ACGAATGTCC 
4 51 TGGGAACGGA GTTAGGAAGA AAATACATAA GGACCCCACC TGTAACTGAG 
501 GGAAGTTTGA GTGATACAGA CAACTTGCAA TCAGAGCAAC TTTCTTCATC 
551 ATCTGATGGC AGCCTAGAAT CTTATCAAAA TCTAAACCCT CACAAGAGCT 
601 GTTATTTATC TGAAAGGGGC TCACAACGAA GTAAGACAGT AGATGACAAT 
651 TCTGCAAAGC AGACTGCGCA CAATAAAGAA AAACGAAGAA AG6ATGATGG 
701 CATTTCTCTT TTAATATCTG ATACTCAGCC TGAAGACCTT AACAGTGGAA 
751 GTAGAGGTTG TGATCATCTC GAACAGGAAA GCAGAAACAA GGATGTTAAA 
801 TATTCTGATT CAAAAGTGGA ACTCACTCTG ATTTCCAGGA AGACAAAGAG 
851 AAGGCTTAGA AATAATTTAC CTGATTCTCA ATATTGTACT TCTTTGGATA 
901 AGTCAACAGA ACAGACAAAA AAACAAGAAG ATGACTC7VAC AATATCCACT 
951 GAGTTTGAAA GGCCAAGTGA AAACTATCAT CAGGATCCAA AACTGCCTGA 
1001 AGAAATTACA ACTAAACCTA CAAAAAGTGA TTTTACTAAG CTATCCTCAC 
1051 TTAACAGTCA GGAGTTGACT TTGAGTAATG CCACCAAAAG TGCCTCTGCC 
1101 GGTTCAACCA CTGAAACCGT TGAGTACTCT AATTCCATTG ATATTGTGGG 
1151 GATTTCTTCC CTGGTTGAGA AGGATGAGAA TGAGTTGAAT ACCATAGAAA 
1201 AGCCTATTCT AAGAGGACAT AATGAAGGGA ACCAATCACT GATCTCAGCT 
1251 GAACCAATTG TTGTTTCCAG TGATGAAGAA GGACCTGTTG AACATAAAAG 
1301 TTCAGAAATT CTTAAGTTAC AATCTAAGCA AGACCGTGAG ACAACTAATG 
1351 AAAATGAGAG TACTTCTGAA TCAGCATTGT TAGAACTACC ATTGATTACA 
1401 TGTGAATCTG TACAGATGTC ATCTGAATTA TGCCCATATA ATCCTGTCAT 
1451 GGAGAACATT TCCAGTATTA TGCCTAGTAA TGAGATGGAT CTACAACTGG 
1501 ATTTTATATT TACTTCTGTT TATATTGGTA AAATAAAAGG AGCTTCTAAA 
1551 GGTTGTGTTA CAATCACAAA AAAATATATT AAGATCCCAT TTCAAGTGTC 
1601 CCTGAATGAG ATTTCATTGC TAGTGGATAC CACACATTTA AAGCGGTTTG 
1651 GGTTATGGAA AAGTAAGGAT GATAATCACA GTAAAAGGAG TCATGCTATT 
1701 CTTTTCTTCT GGGTCTCTTC AGATTATCTT CAAGAGATTC AGACCCAATT 
1751 AGAACACTCT GTATTAAGCC AGCAATCAAA ATCTAGTGAA TTCATTTTCC 
1801 TTGAACTACA CAATCCTGTT TCACAGAGAG AAGAATTGAA GCTGAAAGAT 
1851 ATTATGACGG AAATAAGTAT AATCAGTGGA GAATTAGAGC TTTCTTACCC 
1901 GTTGTCTTGG GTTCAGGCAT TTCCTTTGTT TCAGAACCTC TCTTCAAAAG 
1951 AAAGTTCTTT TATTCATTAT TACTGTGTTT CAACTTGTTC TTTCCCTGCT 
2001 GGTGTTGCTG TTGCTGAAGA AATGAAGCTG AAATCAGTAT CTCAGCCCTC 
2051 AAACACAGAT GCGGCCAAGC CTACTTACAC CTTCCTGCAG AAGCAAAGTA 
2101 GCGGTTGCTA CTCCCTTTCT ATTACATCTA ATCCAGATGA AGAATGGCGG 
2151 GAAGTCAGGC ACACTGGACT TGTTCAGAAG TTGATTGTAT ATCCTCCACC 
2201 ACCTACTAAG GGGGGATTGG GAGTAACTAA TGAAGATCTG GAGTGTTTAG 
2251 AAGAAGGAGA GTTTCTTAAT GATGTAATCA TTGATTTTTA CCTTAAGTAT 
2301 CTTATATTGG AGAAGGCATC AGATGAACTT GTTGAACGAA GTCACATTTT 
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2351 TAGTAGCTTT TTCTATAAAT GCTTGACAAG AAAGGAAAAT AATTTAACAG 
2401 AAGATAATCC AAATCTTTCA ATGGCACAGA GAAGACATAA AAGAGTAAGA 
2451 ACATGGACTC GTCACATAAA CATTTTTAAT AAAGATTACA TCTTTGTACC 
2501 TGTAAATGAG TCGTCTCACT GGTATCTCGC AGTCATTTGT TTTCCATGGT 
2551 TAGAAGAAGC TGTGTATGAA GATTTTCCAC AAACTGTATC CCAGCAGTCC 
2 601 CAGGCTCAGC AGTCCCAAAG TGACAACAAA ACAATAGATA ATGATCTACG 
2651 TACTACTTCG ACACTGTCTT TGAGTGCAGA GGATTCCCAA AGTACCGAGT 
2701 CGAATATGTC AGTACCAAAG AAAATGTGTA AAAGGCCATG TATTCTTATA 
2751 CTAGACTCCT TGAAAGCTGC TTCTGTACGA AACACAGTTC AGAATTTACG 
2801 AGAGTATTTA GAGGTAGAGT GGGAAGTTAA ACTAAAAACT CATCGTCAAT 
2851 TCAGCAAAAC AAACATGGTG GATCTATGCC CTAAAGTTCC TAAACAGGAC 
2901 AATAGCAGTG ATTGTGGAGT ATATTTATTG CAGTATGTGG AAAGCTTCTT 
2951 CAAGGATCCT ATTGTTAACT TTGAACTTCC AATTCATTTG GAGTWVGTGGT 
3001 TTCCTCGTCA TGTAATAAAG ACCAAACGGG AAGATATTCG AGAGCTCATC 
3051 TTGAAACTTC ATTTACAGCA ACAGAAGGGC AGCAGTAGCT AGTTAATCTG 
3101 TACAAACATG ACACAGATGT TCTCTAAGAT TACTGGAAAG CCCCTTACCA 
3151 GCATTTGTGT TAGCCAGCTC ACAGAGAAGA AAATAACTTG CAGTAGTTTT 
3201 ATAATAAGTC ATTGGAACAT TATTTAAAAT ATGTAGGACA CATTATTAGA 
3251 ATTGTTGGGA TCTCATAGAT GGAATGGGAA TGGGGGTGAT ATAGATAAAC 
3301 TTACTAGATA TAAATTAAAA TTTTATAAAT ATTTCATATT TTTCTGAGTA 
3351 AATATGATTG GATTATGCAA CAGCATATGT AATATGGGAA TGTTTTGTAG 
3401 ATAATAAAAC TTACATGATC TGTACTTCCA CGTGACTGGG TGCTGAGGGG 
3451 AGTTAAAGCC TCCCTGGTGC CAGCCCCAGT GCTTGTCAAA TTTGCTGACA 
3501 GGTCACATCA TATTGTAATT CTATTCTTTG CAGCTCAAGC ATGCAGTATG 
3551 AATACTGTGT ATTTTTTAAA AAAATAATTT AGTATCAAGG CTTCAGAAAA 
3601 TGCCATTTAC GGCATCCCTT CTGTATGTAA CAAAAAGACA TTCATAATGT 
3651 TAGGAAGATG ATAAAAATTC GCTCTTTTAA AGTGCAGCTT ATTATTCTCA 
3701 ATTGCTAAAT ACGATTACTC TGCTTTTTTT TTTTCATTTC TTTTGATGTC 
3751 ATATGTGAGT ATCTTATAAT TTAGTTCATT TGTTCAGGGT AAAATTTGAA 
3801 ACAAAAAATT TTACCTGTGC AAAATAGTTT TTTAAAAATT ATACATGTAG 
3851 CTCAACTTGA GGTACTGCTA TATAAATATT CACTCACATT ATCACGGAAT 
3901 TTATGTATAG TTTCTCTAAT ATAGAAGATA AAATTGGTGT CCTCATAACT 
3951 TTAACAAAGA AAACCCTCAG TCCTATTTAT TAATGGGTAG AATTAAATAT 
4001 ATAATTTTAT AGCTCAGTTT ACCCAGTATT CATCTGCAAA GCCAGATTGC 
4051 TCTCATTGCT TTTATATTTT TAAATTGTAG CTTTTAGAGA CCTATGATCC 
4101 TCATGGAACT TAATTTTTTA TTAAATATTC AGGTAACAGT TCTGAATTCA 
4151 TGTGATAATG GTGGCATTAT ATATGATT/Ui ACACTTCAGA ACTTTCTAAT 
4201 GTTATCAGGA GTATTTTGAG GGAGATATGA TTATATTGTA TTTTCTCAGA 
4251 TAAGAAAAAT GTTTTTTAAC AATATTATTT TAATCTGTTT TAAGCATCTC 
4301 TTAGATTTAC ATTATAACTA CATAAAGCAG TGAAGCAAAG GCAAATTAAG 
4351 ATAAAGCTAG AAAGTCTGAA CATTTTATTT CAAAATCATA CGAATCGGGG 
4401 TCAGTTAAGC CTCAGTATTC TTAGCTTTTG TTGATTTTGG CACTATCTTT 
4451 ATATTATTAA ATATATTTGT TGTTTGGATA TTTCATATAA AGATGGCTAT 
4501 AATTACATAT TTCATTCCCA ATTTGTGTGT GTTGGGGGGT ACTTTTAAAG 
4551 GTGACTATTG TTTTGTACAT CTAATTTTGG GAAACCAAGT CTATAAGACA 
4601 TCTTGTGATT TCTTAATGTT TTTGTTTGTA TGTTTTTCAA AGATATCACT 
4651 GTCCTTTATC ATGTTTTGAA GATTGTTTAA AATTCATTTT CCTAAATTAA 
4701 TGTGCAAGTA ATGTTTTGAG GATATCGGTG TTTTATATTA AACATATTTC 
4751 CAATTCAAAA AAAAAAAAAA AAAAACTTAT CGATACCGTC GACCTCGATG 
4801 ATGATGATGA TGATGATGAT GTCGAC 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 3 


ORF from 138 bp to 3089 bp; peptide length: 984 
Category: similarity to known protein 


1 MDKRKLGRRP SSSEIITEGK RKKSSSDLSE IRKMLNAKPE DVHVQSPLSK 
51 FRSSERWTLP LQWERSLRNK VISLDHKNKK HIRGCPVTSR SSPERIPRVI 
101 LTNVLGTELG RKYIRTPPVT EGSLSDTDNL QSEQLSSSSD GSLESYQNLN 
151 PHKSCYLSER GSQRSKTVDD NSAKQTAHNK EKRRKDDGIS LLISDTQPED 
201 LNSGSRGCDH LEQESRNKDV KYSDSKVELT LISRKTKRRL RNNLPDSQYC 
251 TSLDKSTEQT KKQEDDSTIS TEFERPSENY HQDPKLPEEI TTKPTKSDFT 
301 KLSSLNSQEL TLSNATKSAS AGSTTETVEY SNSIDIVGIS SLVEKDENEL 
351 NTIEKPILRG HNEGNQSLIS AEPIWSSDE EGPVEHKSSE ILKLQSKQDR 
401 ETTNENESTS ESALLELPLI TCESV^^ISSE LCPYNPVMEN ISSIMPSNEM 
451 DLQLDFIFTS VYIGKIKGAS KGCVTITKKY IKIPFQVSLN EISLLVDTTH 
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501 LKRFGLWKSK DDNHSKRSHA ILFFWVSSDY LQEIQTQLEH SVLSQQSKSS 

551 EFIFLELHNP VSQREELKLK DIMTEISIIS GELELSYPLS WVQAFPLFQN 

601 LSSKESSFIH YYCVSTCSFP AGVAVAEEMK LKSVSQPSNT DAAKPTYTFL 

651 QKQSSGCYSL SITSNPDEEW REVRHTGLVQ KLIVYPPPPT KGGLGVTNED 

701 LECLEEGEFL NDVIIDFYLK YLILEKASDE LVERSHIFSS FFYKCLTRKE 

751 NNLTEDNPNL SMAQRRHKRV RTWTRHINIF NKDYIFVPVN ESSHWYLAVI 

801 CFPWLEEAVY EDFPQTVSQQ SQAQQSQSDN KTIDNDLRTT STLSLSAEDS 

851 QSTESNMSVP KKMCKRPCIL ILDSLKAASV RNTVQNLREY LEVEWEVKLK 

901 THRQFSKTNM VDLCPKVPKQ DNSSDCGVYL LQYVESFFKD PIVNFELPIH 

951 LEKWFPRHVI KTKREDIREL ILKLHLQQQK GSSS 

BLAST? hits 

Entry SPAC17A5_7 from database TREMBL: 

*'SPAC17A5.07c"; product: "hypothetical protein"; S.pombe 
chromosome I cosmid cl7A5, Schizosaccharomyces pombe (fission 
yeast) 

Length = 652 

Score = 275 (96.8 bits). Expect - 1.9e-29, Sum P(3) = 1.9e-29 
Identities = 56/120 (46%), Positives = 78/120 (65%) 

Entry S49947 from database PIR: 

SMT4 protein - yeast (Saccharomyces cerevisiae) 
Length - 1034 

Score « 163 (57.4 bits). Expect = 4.6e-16, Sum P(3) - 4.6e-16 
Identities = 46/159 (28%), Positives = 76/159 (47%) 

Entry YQG6_CAEEL from database SWISSPROT: 
HYPOTHETICAL 35.7 KD PROTEIN C41C4.6 IN CHROMOSOME II, 
Length = 342 

Score = 162 (57.0 bits). Expect = 6.1e-13, Sum P(3) = 6.1e-13 
Identities « 37/119 (31%), Positives « 62/119 (52%) 

Entry AB018340_1 from database TREMBL: 

gene; "KIAA0797"; product: "KIAA0797 protein"; Homo sapiens mRNA for 

KIAA0797 protein, partial cds . 

Score « 540, P » 1.9e-50, identities - 120/243, positives « 155/243 


Alert BLASTP hits for DKFZphfbr2_16gl8, frame 3 

TREMBL: ATT 16L1_11 gene: "T16L1.110"; product: "putative protein"; 
Arabidopsis thaliana DNA chromosome 4, BAC clone T16L1 (ESSAII 
project), N « 2, Score = 239, P = 2.1e-18 


>TREMBL:ATT16L1_H gene: "T16L1.110"; product: "putative protein"; 

Arabidopsis thaliana DNA chromosome 4, BAC clone T16L1 (ESSAII project) 
Length -» 710 

HSPs: 

Score - 239 (35.9 bits). Expect = 2.1e-18, Sura P(2) = 2.1e-18 
Identities « 51/135 (37%), Positives = 78/135 (57%) 

Query: 683 IVYPPPPTKGGLGVTNEDLECLEEGEFLNDVIIDFYLKYLILEKASDELVERSHIFSSFF 742 

+VYP + V -J-D+E L+ F+ND IIDFY+KYL + S + R H F+ FF 

Sbjct: 176 LVYPQGEPDAW-VRKQDIELLKPRRFINDTIIDFYIKYL-KNRISPKERGRFHFFNCFF 233 

Query: 743 YKCLTRKENNLTEDNPNLSMAQRRHKRVRTWTRHINIFNKDYIFVPVNESSHWYLAVICF 802 

+ • RK NL + P+ + ++RV+ WT+++++F KDYIF+P+N S HW L +IC 
Sbjct: 234 F RKLANLDKGTPSTCGGREAYQRVQKWTKNVDLFEKDYIFIPINCSFHWSLVIICH 289 

Query: 803 PWLEEAVYEDFPQTV 817 

P + + PQ V 

Sbjct: 290 PGELVPSHVENPQRV 304 

Score » 70 (10.5 bits), Expect « 2.1e-18, Sum P{2) = 2.1e-18 
Identities « 13/28 (46%), Positives » 15/28 (53%) 

Query: 948 PIHLEKWFPRHVIKTKREDIRELILKLH 975 

P HL WFP KR +1 EL+ LK 

Sbjct: 403 PSHLRNWFPAKEASLKRRNILELLYNLH 430 


Pedant information for DKFZphfbr2_16gl8, frame 3 


Report for DKFZphfbr2_16gl8 .3 
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(LENGTH) 984 

[MW] 112265.80 

(pll 6.13 

[HOMOL) TREMBL:AB018340_1 gene; "KIAAOVg?"; 

mRNA for KIAA0797 protein, partial cds . 8e-53 


product: "KIAA0797 protein"; Homo sapiens 


[FUNCATl 
[ FUHCAT] 
[ BLOCKS] 
[PROSITEJ 
(PROSITEJ 
(PROSITE) 
[PROSITEJ 
[PROSITEJ 
[PR0SITE3 
[PROSITEJ 
[KWI 
IKWJ 


03.22 cell cycle control and mitosis (S. cerevisiae, YIL031wJ 9e-17 
99 unclassified proteins (S. cerevisiae, YPL020cl 4e-06 

BL00494C Bacterial luciferase subunits proteins 
AMIDATION 3 
MYRISTYL 9 
CAMP_PHOSPHO_SITE 2 
CK2_PH0SPH0_SITE 30 
TYR_PHOSPHO SITE 1 
PKC_PHOSPHO~SITE 19 
ASN_GLYCOSYLATION 12 
Alpha_Beta 

LOW^COMPLEXITY 4.47 % 


SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 


MDKRKLGRRPSSSEIITEGKRKKSSSDLSEIRKMLNAKPEDVHVQSPLSKFRSSERWTLP 

ccccceeecccceeeeecccccccccchhhhhhhhhhccccccccccccccccccccchh 

LQWERSLRNKVISLDHKNKKHIRGCPVTSRSSPERIPRVILTNVLGTELGRKYIRTPPVT 

hhhhhhhhhheeeeccccceeeccccccccccccceeeeeeeeeccceeeccceeecccc 

EGSLSDTDNLQSEQLSSSSDGSLESYQNLNPHKSCYLSERGSQRSKTVDDNSAKQTAHNK 
xxxxxxxxxxxxxxxx 

cccccccccccccccccccccccccccccccccccccccccccccccccccchhlihhlihh 

EKRRKDDGISLLISDTQPEDLNSGSRGCDHLEQESRNKDVKYSDSKVELTLISRKTKRRL 

hhhhcccceeeeecccccccccccccccccccccccccccccccccceeeeeehhhiihhh 

RNNLPDSQYCTSLDKSTEQTKKQEDDSTISTEFERPSENYHQDPKLPEEITTKPTKSDFT 

hccccccccccccccccchhhhhccccccccccccccccccccccccccccccccccccc 

KLSSLNSQELTLSNATKSASAGSTTETVEYSNSIDIVGISSLVEKDENELNTIEKPILRG 

ccccccccceeehhhhhhhcccccceeeeccceeeceeeccchhhhhhhhhhhcccccc^ 

HNEGNQSLISAEPIWSSDEEGPVEHKSSEILKLQSKQDRETTNENESTSESALLELPLI 

xxxxxxxxxxxxxxxxx 

cccccceeeecceeeeecccccccccchhhhhhhhhhhhhhcccccccchhhhhccccce 

TCESVQMSSELCPYNPVMENI SSIMPSNEMDLQLDFI FTSVYIGKIKGASKGCVTITKKY 

eecccccccccccccccccceeeccccchhhhhhheeeeeeeeeeeeccccceeeeeeee 

IKIPFQVSLNEISLLVDTTHLKRFGLWKSKDDNHSKRSHAILFFWVSSDYLQEIQTQLEH 

eeeeccccceeeeeeecccceeeeeeeecccccccccceeeeeeeeccchhhhhhlihhhh 

S VLSQQSKSSEFI FLELHNPVSQREELKLKDIMTEI S I I SGELELS YPLSWVQAFPLFQN 

hhhhccccceeeeeeeeccccccchhhhiihhhhheeeeeccceeeeccceeeeeeceeec 

LSSKESSFIHYYCVSTCSFPAGVAVAEEMKLKSVSQPSNTDAAKPTYTFLQKQSSGCYSL 

ccccccccceeeeecccccccchhhhhhhhhhhcccccccccccccceeeecccccccce 

SITSNPDEBWREVRHTGLVQKLIVYPPPPTKGGLGVTNEDLECLEEGEFLNDVIIDFYLK 

eeccccccceeeeeeccceeeeeeecccccccccccccchhhhhlihhccchhlihh^^ 

YLILEKASDELVERSHIFSSFFYKCLTRKENNLTEDNPNLSMAQRRHKRVRTWTRHINIF 

hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccchhhhhhhhl^lihh^ 

NKDYI FVPVNESSHWYLAVICFPWLEEAVYEDFPQTVSQQSCAQQSQSDNKT IDNDLRTT 

xxxxxxxxxxx 

cceeeeeccccccceeeeeeeccchhhhhhhccccchhhhhhhhhhcccccccccccccc 

STLSLSAEDSQSTESNMSVPKKMCKRPCILILDSLKAASVRNTVQHLREYLEVEWEVKLK 

cceeeeecccccceeeccccccccccceeeeeccccccccchhhhhhhhhhh^^ 

THRQFSKTNMVDLCPKVPKQDNSSDCGVYLLQYVESFFKDPIVNFELPIHLEKWFPRHVI 
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SEG 

PRD hhhhhccccccccccccccccccccceeeeehhhhhhhcccceeecccccccccccchhh 

SEQ KTKREDIRELILKLHLQQQKGSSS 

SEG 

PRD hhhhhhhhhhhhhhhhhhhccccc 


Prosite for DKFZphfbr2_16gl8.3 


PSOOOOl 

314 

;->318 

PSOOOOl 

. 365 

»->369 

PSOOOOl 

406 

;->4io 

PSOOOOl 

440 

i->444 

PSOOOOl 

513 

->517 

PSOOOOl 

600 

->604 

PSOOOOl 

752 

->756 

PSOOOOl 

759 

->763 

PSOOOOl 

790 

->794 

PSOOOOl 

830 

->834 

PSOOOOl 

856 

->860 

PSOOOOl 

922 

->926 

PS00004 


8->12 

PS00004 

2 

l->25 

PS00005 

54->57 

PS00005 

66->69 

PSO0O05 

88->91 

PS00005 

158 

->161 

PS00005 

162 

->165 

PS00005 

172 

->175 

PS00005 

233 

->236 

PS00005 

236 

->239 

PS00005 

260 

->263 

PS00005 

291 

->294 

PS00005 

477 

->480 

PS00005 

515 

->518 

PS00005 

562- 

->565 

PS00005 

602- 

->605 

PS00005 

747- 

->750 

PS00005 

874 

->877 

PS00005 

879- 

->882 

PS00005 

901- 

->904 

PS00005 

962- 

">965 

PS00006 

1] 

L->15 

PS00005 


l->28 

PS00006 

91->95 

PS 0000 6 

123- 

->127 

PS00006 

125- 

->129 

PS00006 

137- 

->141 

PS00006 

167- 

->17I 

PSOG006 

196- 

->200 

PS00006 

225- 

■>229 

PS00006 

251- 

•>255 

PS00006 

271- 

>275 

PS00006 

295- 

>299 

PS00006 

323- 

>327 

PS00006 

341- 

•>345 

PS00006 

377- 

>381 

PS00006 

396- 

>4O0 

PS00006 

402- 

>406 

PS0C0O6 

408- 

>412 

PS00006 

488- 

>492 

PS00006 

509- 

>513 

PS00006 

536- 

>540 

PS00006 

562- 

>566 

PS00006 

602- 

>606 

PS00006 

638- 

>642 

PS00006 

664- 

>668 

PSOO006 

697- 

>701 

PS00006 

747- 

>751 

PS00006 

826- 

>830 

PS00006 

846- 

>850 

PS00006 

962- 

>966 

PS00007 

216- 

>223 

PS00008 

84 

->90 

psooood 

106- 

>112 

PS00006 

141- 

>147 

PS00008 

161- 

>167 

PS00008 

204- 

>210 

PS00008 

468- 

>474 


ASN^GLYCOSYIATION 

ASNGLYCOSYLATION 

ASN_GLYCOSYLATI0N 

ASN_GLYCOSYLATI0N 

ASN^GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GLYCOS YLAT I ON 

ASN_GLYCOSYLATION 

ASNGLYCOSYLATION 

ASN GLYCOSYLATION 

ASN^GLYCOSYLATION 

ASN_GLYCOSYLATION 

CAMP_PHOSPHO_SITE 

CAMP_PHOS PHO_S I TE 

PKCPHOSPHO^SITE 

PKC PHOSPHOSITE 

PKCPHOSPHOSITE 

PKC_PHOSPHO_SITE 

PKCPHOS PHO_S I T E 

PKC_PHOSPHO_S ITE 

PKCPHOS PHOS ITE 

PKC_PHOSPH0_SITE 

PKC_PHOSPHO_SITE 

PKC_PH0SPH0_S1TE 

PKC_PHOSPH0 SITE 

PKC_PHOS PHO"S I TE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC PHOSPHO_SITE 

PKC~PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO SITE 

CK2_PH0SPH0~S I TE 

CK2_PHOS PHO~S I TE 

CK2_PH0SPH0^SITE 

CK2_PH0SPH0_S ITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPHO_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0 SITE 

CK2_PHOSPHO~SITE 

CK2_PH0SPH0_SITE 

CKZPHOSPHOSITE 

CK2_PH0SPH0_SITE 

CK2_PH0S PHO_S I TE 

CK2_PH0SPH0 SITE 

CK2_PH0SPH0~SITE 

CK2_PH0SPH0]^SITE 

CK2_PHOSPHO_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPHO_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PHOSPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0 SITE 

CK2 PHOSPHO~SITE 

T YR^PHOS PHO*"S I TE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 


PDOCOOOOl 

PDOCOOOOl 

PDOCOOOOl 

PDOCOOOOl 

PDOCOOOOl 

PDOCOOOOl 

PDOCOOOOl 

PDOCOOOOl 

PDOCOOOOl 

PDOCOOOOl 

PDOCOOOOl 

PDOCOOOOl 

PDOC00004 

PDOC00004 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC0O005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOCOOOOS 

PDOC00005 

PDOCOOOOS 

PDOCOOOOS 

PDOC00006 

PDOC00006 

PDOC00006 

PDOCOOOOS 

PDOC00006 

PDOCOOOOS 

PDOC00006 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOC00006 

PDOC00006 

PDOCOOOOS 

PDOCOOOOS 

PDOC00006 

PDOC00006 

PDOC00006 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOC00006 

PDOC000a6 

PDOC00006 

PDOC00006 

PDOCOOOOS 

PDOC00007 

PDOCOOOOS 

PDOC00008 

PDOC00008 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 
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PS00008 505->511 MYRISTYL PDOC00008 

PS00008 622->628 MYRISTYL PDOC00008 

PS00008 693->699 MYRISTYL PDOC00008 

PS00009 6->10 AMIDATION PDOCO0O09 

PS00009 18->22 AMIDATION PDOC00009 

PS00009 109->113 AMIDATION PDOC00009 

(No Pfam data available for DKF2phfbr2_16gl8,3) 
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PCT/IBOO/01496 


group: transmembrane protein 

DKF2phfbr2_16il2 encodes a novel 185 amino acid protein, with strong similarity to PUT2 
protein of Fugu rubripes . 

The novel protein contains 1 transmembrane region. 

PUT 2 is a Fugu rupies protein similar to the neural cell adhesion molecule LI (Ll-CAM) a 
natosis-specific chromosome segregation protein (SMCl) and the calcium channel alpha-1 subunit 
homolog (CCAl) . 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes and as a new marker for neuronal cells. 

strong similarity to Fugu rubripes PUT2 
complete cDNA, complete cds, EST hits, 

TRANSMEMBRANE 1 

Sequenced by LMU 

Locus: /map-"873. 3/875.1 cR from top of Chrl linkage group- 
Insert length: 1552 bp 

Poly A stretch at pos, 1528, polyadenylation signal at pos. 1506 


1 GGGGGGGGAC AACTGGGTCT TTTGCGGCTG CAGCGGGCTT GTAGGCGTCC 
51 GGCTTTGCTG GCCCAGCAAG CCTGATAAGC ATGAAGCTCT TATCTTTGGT 
101 GGCTGTGGTC GGGTGTTTGC TGGTGCCCCC AGCTGAAGCC AACAAGAGTT 
151 CTGAAGATAT CCGGTGCAAA TGCATCTGTC CACCTTATAG AAACATCAGT 
201 GGGCACATTT ACAACCAGAA TGTATCCCAG AAGGACTGTT GTAGCAACTG 
251 CCTGCACGTG GTGGAGCCCA TGCCAGTGCC TGGCCATGAC GTGGAGGCCT 
301 ACTGCCTGCT GTGCGAGTGC AGGTACGAGG AGCGCAGCAC CACCACCATC 
351 AAGGTCATCA TTGTCATCTA CCTGTCCGTG GTGGGTGCCC TGTTGCTCTA 
401 CATGGCCTTC CTGATGCTGG TGGACCCTCT GATCCGAAAG CCGGATGCAT 
451 ACACTGAGCA ACTGCACAAT GAGGAGGAGA ATGAGGATGC TCGCTCTATG 
501 GCAGCAGCTG CTGCATCCCT CGGGGGACCC CGAGCAAACA CAGTCCTGGA 
551 GCGTGTGGAA GGTGCCCAGC AGCGGTGGAA GCTGCAGGTG CAGGAGCAGC 
601 GGAAGACAGT CTTCGATCGG CACAAGATGC TCAGCTAGAT GGGCTGGTGT 
651 GGTTGGGTCA AGGCCCCAAC ACCATGGCTG CCAGCTTCCA GGCTGGACAA 
701 AGCAGGGGGC TACTTCTCCC TTCCCTCGGT TCCAGTCTTC CCTTTAAAAG 
751 CCTGTGGCAT TTTTCCTCCT TCTCCCTAAC TTTAGAAATG TTGTACTTGG 
801 CTATTTTGAT TAGGGAAGAG GGATGTGGTC TCTGATCTCT GTTGTCTTCT 
851 TGGGTCTTTG GGGTTGAAGG GAGGGGGAAG GCAGGCCAGA AGGGAATGGA 
901 GACATTCGAG GCGGCCTCAG GAGTGGATGC GATCTGTCTC TCCTGGCTCC 
951 ACTCTTGCCG CCTTCCAGCT CTGAGTCTTG GGAATGTTGT TACCCTTGGA 
1001 AGATAAAGCT GGGTCTTCAG GAACTCAGTG TTTGGGAGGA AAGCATGGCC 
1051 CAGCATTCAG CATGTGTTCC TTTCTGCAGT GGTTCTTATC ACCACCTCCC 
1101 TCCCAGCCCC AGCGCCTCAG CCCCAGCCCC AGCTCCAGCC CTGAGGACAG 
1151 CTCTGATGGG AGAGCTGGGC CCCCTGAGCC CACTGGGTCT TCAGGGTGCA 
1201 CTGGAAGCTG GTGTTCGCTG TCCCCTGTGC ACTTCTCGCA CTGGGGCATG 
1251 GAGTGCCCAT GCATACTCTG CTGCCGGTCC CCTCACCTGC ACTTGAGGGG 
1301 TCTGGGCAGT CCCTCCTCTC CCCAGTGTCC ACAGTCACTG AGCCAGACGG 
1351 TCGGTTGGAA CATGAGACTC GAGGCTGAGC GTGGATCTGA ACACCACAGC 
1401 CCCTGTACTT GGGTTGCCTC TTGTCCCTGA ACTTCGTTGT ACCAGTGCAT 
1451 GGAGAGAAAA TTTTGTCCTC TTGTCTTAGA GTTGTGTGTA AATCAAGGAA 
1501 GCCATCATTA AATTGTTTTA TTTCTCTCAA AAAAAAAAAA AAAAAAAATA 
1551 TC 


BLAST Results 


Entry HS80834 9 from database EMBL: 
human STS KI-11986. 
Score = 1716, P = 5.7e-73, identities » 364/378 

Entry HS487355 from database EMBL: 
human STS wi- 13088. 
Score = 1358, P = 1.3e-56, identities = 274/277 


Medline entries 
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No Medline entry 


Peptide information for frame 3 


ORF from 81 bp to 635 bp; peptide length: 185 
Category: similarity to unknown protein 


1 MKLLSLVAVV GCLLVPPAEA NKSSEDIRCK CICPPYRNIS GHIYNQNVSQ 

51 KDCCSNCLHV VBPMPVPGHD VEAYCLLCEC RYEERSTTTI KVIIVIYLSV 

101 VGALLLYMAF LMLVDPLIRK PDAYTEQLHN EEENEDARSM AAAAASLGGP 

151 RANTVLERVE GAQQRWKLQV QEQRKTVFDR HKMLS 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKF2phfbr2_16il2, frame 3 

TREMBL:AF026198_5 gene: '♦PUT2"; product: "putative protein 2"; Fugu 
rubripes neural cell adhesion molecule LI homolog (Ll-CAM) gene, 
complete cds; putative protein 1 (PUTl) gene, partial cds; 
mitosis-specific chromosome segregation protein SMCl homolog (SMCl) 
gene, complete cds; and calcium channel alpha-1 subunit homolog (CCAl) 
and putative protein 2 (PUT2) genes, partial cds, complete sequence., N 
* 1, Score = 655, P - 2.8e-64 

TREMBL:CER12C12_5 gene: -R12C12.6"; Caenorhabditis elegans cosmid 
R12C12., N = 1, score = 225, P = Ie-18 


>TREMBL:AF026198_5 gene: "P0T2"; product: "putative protein 2"; Fugu 

rubripes neural cell adhesion molecule LI homolog (Ll-CAM) gene, complete 
cds; putative protein 1 (PUTl) gene, partial cds; mitosis-specific 
chromosome segregation protein SMCl homolog (SMCl) gene, complete cds; and 
calcium channel alpha-1 subunit homolog (CCAl) and putative protein 2 
(PUT2) genes, partial cds, complete sequence. 
Length « 187 


HSPs: 


Score = 655 (98.3 bits). Expect = 2.8e-64, P = 2.8e-64 
Identities = 124/163 (76%), Positives = 140/163 (85%) 

Query: 22 KSSEDIRCKCICPPYRNISGHIYNQNVSQKDCCSNCLHWEPMPVPGHDVEAYCLLCECR 81 

KS +D+RCKCICPPYRNISGHIYN+N +QKDC NCLHW+PMPVPG+DVEAYCLLCEC+ 
Sbjct: 31 KSFDDVRCKCICPPYRNISGHIYNRNFTQKDC— NCLHVVDPMPVPGNDVEAYCLLCECK 88 

Query: 82 YEERSTTTIKVIIVIYLSWGALLLYMAFLMLVDPLIRKPDAYTBQLHNEEENEDARSMA 141 

YEERST TI+V I+I+LSWGALLLYM FL+LVDPLIRKPD + LHNEE++BD + 
Sbjct: 89 YEERSTNTIRVTIIIFLSVVGALLLYMLFLLLVDPLIRKPDPLAQTLHNEEDSEDIQPQM 148 

Query: 142 AAAASLGGP-RANTVLERVEGAQQRWKLQVQEQRKTVFDRHKML 184 

+ G P R NTVLERVEGAQQRWK QVQEQRKTVFDRHKML 

Sbjct: 149 S GDPARGNTVLERVEGAQQRWKKQVQEQRKTVFDRHKML 187 

Pedant information for DKFZphfbr2_16il2, frame 3 

Report for DKFZphfbr2_16il2 . 3 


(LENGTH] 185 

[MWl 20764.29 

tpll 6.21 

[HOMOLJ TREMBL:Ar026198_5 gene: "PUT2'*; product: "putative protein 2"; Fugu rubripes 

neural cell adhesion molecule LI homolog (Ll-CAM) gene, complete cds; putative protein 1 
(PUTl) gene,, partial cds; mitosis-specific chromosome segregation protein SMCl homolog (SMCl) 
gene, complete cds; and calcium channel alpha-1 subunit homolog (CCAl) and putative protein 2 
(PUT2) genes, partial cds, complete sequence. 3e-68 
[PROSITE] MYRISTYL 1 

4 

2 
3 


t PROSITE) 
[PROSITE) 
(PROSITE) 
IKW) 


CK2_PH0SPH0_SITE 
PKC_PHOSPHO_SITE 
ASN_GLYCOS YLAT ION 
SIGNAL PEPTIDE 21 
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[KWj 
IKW] 


TRANSMEMBRAME 1 

LOW COMPLEXITY 2.70 % 


SEQ 
SCG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRO 
MEM 

SEQ 
SEG 
PRD 
MEM 


MKLLSLVAVVGCLLVPPAEANKSSEDIRCKCICPPYRNISGHIYNQNVSQKDCCSNCLHV 
ccceeeeeeeeccccccccccccccceeeeeecccccccccceeeccccccccccceeee 

VEPMPVPGHDVEAYCLLCECRYEERSTTTIKVIIVIYLSVVGALLLYMAFLMLVDPLIRK 

eecccccccccchhhhhhhhhhhhccccceeeeeeehhhhhhhhhhhhhhhhhhhccccc 
MMMMMMMMMHMMMMMMMMMMMMMMKMMMM. . . 

PDAYTEQLHNESENEDARSMAAAAASLGGPRANTVLERVEGAQQRWKLQVQEQRKTVFDR 

xxxxx 

ccchhhhhhhhhcccchhhhhhhhhhccccccchhhhhhhchhhhhhhhhhhhhhhhhhh 

HKMLS 
hhccc 


Prosite for DKFZphfbr2_16il2, 3 


PS 00001 

21->25 

PSOOOOl 

38->42 

PSOOOOl 

47->51 

PS00005 

49->52 

PS00005 

89->92 

PS00006 

23->27 

PS00006 

49->53 

PS00006 

154->158 

PS00006 

176->180 

PS00008 

148->154 


ASN_GLyCOSYLATION 
ASN GLYCOS YLAT I ON 
ASNGLYCOSYLATION 
PKCPHOSPHOSITE 
PKC_PHOSPHO SITE 
CK2_PHOSPH03SITE 
CK2_PH0SPH0_SITE 
CK2_PHOS PHO_S ITE 
CK2_PH0SPH0 SITE 
MYRISTYL 


PDOCOOOOl 
PDOCOOOOl 
PDOCOOOOl 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
POOC00006 
PDOC00006 
PDOC0O008 


{No Pfam data available for DKFZphfbr2_16il2. 3) 
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group: brain derived 

DKF2phfbr2_16k22 encodes a novel 108 amino acid protein with very weak similarity to 
thioredoxm of Bacillus subtilis. ainiiiaricy to 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

genes^" ""^^ ^^"'^ application in studying the expression profile of brain-specific 


weak similarity to thioredoxin 

complete cDNA, complete cds, genomic DNA? 

no EST hits 

Sequenced by BMFZ 

Locus : unknown 

Insert length: 2088 bp 

Poly A stretch at pos. 2065, no polyadenyiation signal found 

1 AAAAGGAAGA AGGAAATAAG GATATTTCAA GGGTTACCAA AGTCGAGGAA 
51 AACTATTTTA AGAAGAAATC TGAATTATTT GTGCACATAG GTTGTAATAA 
101 TAGCATCTTG CATTAAATGG TGTTTTCTAG CTTACAAAGT GGATTCATAT 
151 ACACTATTGT AACTGACTCT CTACAAACTT GCAAGGTTAG CAAGACAAAT 
201 GGTATTTTAA GATAACAAAC TGAGACTCAA AAAAGGCAAG TAACTCGTTC 
251 TACTTCCCAA AGCCAGAAAG TGGCAAAATA GAAAATGGAT CCTGAATCTC 
301 CAACACCATG CAAACTAAGA GAGGGAATCC TCTGTAGAGG GAATGGAAGT 
351 AAAAAGGCAC AAGTGGTGAT GTCACCTTCT GAACAGAGAT GGAACTTTTC 
401 TTCCTCTGAG AAAAAAGAGA AAAGATAGTT TTAAGTGGCA AAAGAACATG 
451 AAGCAATGTG AGGTGAAGAA ACAGAAAAGA CTATGGATGG AATTCCTAGA 
501 TGTGAGATAC ACAAAGTTCC ATTTCAAAGA GAAATATCTA TAGATAGGCA 
551 TAAAGTTACA CACCTGAACT ACCAACTCTG AACCAGTAAC TCAAGAGATA 
601 TTTTGTGTGT CCCACAAGCC ATATGGCTCT GGGGACAAAT TATCTGAAAG 
651 TGCCCAATAA GAAAAATATT TGAGGAAGGG GAGTTGGTGA GTGAATGAAT 
701 TAAAGGACAT CAGAAAGATA CATTGACTGT TCTCCTTCCC AGGAAACAAA 
751 GTGGCTAAGT CAAAACAACG GGCAGCTGTG GGATAGCAAA GAAAAAAAAA 
801 CTTCCAGGCC CAGGTTCTAG TGAAAGCTAC TATGGAAGTT AGCCACTCAA 
851 CTTTAGAACC AGAGGCTTCT TTTCCTCCTC CCTTCTTATC TTTTCTAGTT 
901 TATAGCAAAT TTATATTGAG CCACTTATTC TTTCTGAATG CTAGTTCCCC 
951 TTTAGCATTT CTTTTTCTTC ATTCCCTTTG GACTGGCCCA ATGCTTTGGC 
1001 CCCTTATCAA AGCATTTTCT AAGAAACAGT CTGACAGCTC TAATTTGCAT 
1051 CTGGTTATGC AAGATGTGGT TAAGAACATG GACTCTGGAG GTAAATACAC 
1101 CTTGATTCCA ATTCATTCTC TCATTTATTC ATTCAGCAAA TATTTAGTGA 
1151 ACATCTAACA TGTGCTAGGC ACTGTTCTAG TTGCTGAGGA TACAGCTTCA 
1201 AACAAAATAA GGTCTCTGCA AGGATGCCTT CTCTTACCAC TCCTATTCAG 
1251 CGTAGTATTG GAAGTCCTGG CCAGGGCAAT CAGGCAAGAA AAAGAAATCA 
1301 AGGTCATCCA AATAGGAAGA GAGGAAGTCA AACTATCCCT GTTTACAGAC 
1351 AACATGATCC TACATCTAGA AAAAAACCCA TTGTCTTAGC CCAAAAGCTT 
1401 CTTAGGCTGA TAAACAACTT CAGCAAAGTC TTAGGATACA AAATCCATGT 
1451 GCAAAAAACA CTAGCATTCT TATACACCAA CAACAGTCAA GCCGAGATCC 
1501 AAATCAGGAA CAAACTCCTA TTCACAATTG CCACAAAAAC AATAGAACAG 
1551 GAAAACAGCT AACTAGGAAG GTGAAAGATC TCTACAAGGA GAACTACAAA 
1601 CCACTGCTCA CAGAAATCAG AGATGACACA TATAAATGGA AAAACATTCC 
1651 ATGATCATGG ATAGGAAGAA TGAATATTAC TGAAATGGCT ATACTGTCCA 
1701 AAGCAATTTA TAGATTCAAT GCTATTCCTA GTAAACTACC ATTGAGATTT 
1751 TTTACAGAAC TAGAAAAAAA AAAAACTATT TTAAGGCTGG GCGCAGTGGC 
1801 TCTCACCTGT AATCCCAGCA CTTTGGGAGG CCGAGATGGG TGGATCACGA 
1851 GGTCAGGAGA TGGAAAACAT CCTGGCTAAC ATGGTGAAAC CCCGTCTCTA 
1901 CTAAAAATAC AAAAAATTAG CCAGGCGTGG TGGTGGGCGC CTGTAATCCC 
1951 AGCTGCTCGG GAGGCTGAGG CAGGATAATG GTGTGAACCC GGGAGGCAGA 
2001 GCTTGCAGTG AGCTGAGATT GCACCACTGC ACTCCAGCCT GAGGGACAGA 
2051 GTGAGACTCC ATCTCAAAAA AAAAAAAAAA AAAAAAAA 


BLAST Results 


No BLAST result 


Medline entries 
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No Medline entry 


Peptide information for frame 1 


ORF from 832 bp to 11S5 bp; peptide length: 108 
Category: putative protein 


1 MEVSHSTLEP EASFPPPFLS FLVYSKFILS HLFFLNASSP LAFLFLHSLW 
51 TGPMLWPLIK AFSKKQSDSS NLHLVMQDW KNMDSGGKYT LIPIHSLIYS 
101 FSKYLVNI 

BLASTP hits 
Entry B37192 from database PIR: 

thioredoxin - Bacillus subtilis Score = 71 (25.0 bits). Expect = 0.040. 
P = 0.039 

Identities = 16/49 (32%), Positives = 30/49 (61%) 


Alert BLASTP hits for DKFZphfbr2_16k22, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_16k22, frame 1 


Report for DKF2phfbr2_16k22 . 1 


[LENGTH) 

(MW) 

Ipl] 

[PROSITE] 
[PROSITEJ 
[PROSITE] 
[PROSITE) 
[PROSITE] 
[KW] 


108 

12281.47 
8.06 

MYRISTYL 1 
CAMP^PHOS PHO_S I T E 
CK2__PHOSPH0_S ITE 
PKC_PHOS PHO_SITE 
ASNGLYCOSYLATION 
AlphaBeta 


SEQ MEVSHSTLEPEASFPPPFLSFLVYSKFILSHLFFLNASSPLAFLFLHSLWTGPMLWPLIK 

PRD ccccccccccccccccccchhhhhhhhhhhhhhhhccccchhhhhhhhccccccchhhhh 

SEQ AFSKKQSDSSNLHLVMQDVVKNMDSGGKYTLIPIHSLIYSFSKYLVNI 

PRD hhhcccccccceeehhhhhhcccccccceeeeeccceeeecccccccc 


Prosite for DKFZphfbr2_16k22. 1 


PSOOOOl 36->40 ASN^GLYCOSYLATION PDOCOOOOl 

PS00004 64->68 CAMP_PHOSPHO_SITE PDOC00004 

PS00005 63->66 PKC PHOSPHO SITE PDOC00005 

PS00006 6->10 CK2~PHOSPHO~SITE PDOC00006 

PS00008 86->92 MYRISTYL PDOC00008 


(No Pfara data available for DKFZphfbr2_16k22 . 1) 


148 


wo 01/12659 
DKFZphfbr2_^16112 


PCT/IBOO/01496 


group: transmembrane protein 


DKFZphfbr2_16112 encodes a novel 267 amino acid protein with similarity to gallus qallus 
putative transmembrane protein E3-16 jr y o yaxj-uo 

The novel protein contains one putative transmembrane domain. In chicken, E3-16 is expressed 
specxtically in the inner ear. 

No informative BLAST results; no predictive prosite, pfam or SCOP motife 

The new protein can find application in studying the expression profile of brain-specific 
genes and as a new marker for neurons involved in perception of hearing. 


similarity to gallus putative transmembrane protein E3-16 
complete cDNA, complete cds, EST hits 

potental start at Bp 73 matchs kozak consensus PyCCataG 
TRANSMEMBRANE 1 


Sequenced by Qiagen 
Locus : unknown 


Insert length: 2042 bp 

Poly A stretch at pos. 2024, polyadenylation signal at pos. 2003 


1 GGGGGCGGCG GAGGCAGAGA CCGAGGCTGC ACCGGCAGAG GCTGCGGGGC 
51 GGACGCGCGG GCCGGCGCAG CCATGGTGAA GATTAGCTTC CAGCCCGCCG 
101 TGGCTGGCAT CAAGGGCGAC AAGGCTGACA AGGCGTCGGC GTCGGCCCCT 
151 GCGCCGGCCT CGGCCACCGA GATCCTGCTG ACGCCGGCTA GGGAGGAGCA 
201 GCCCCCACAA CATCGATCCA AGAGGGGGGG CTCAGTGGGC GGCGTGTGCT 
251 ACCTGTCGAT GGGCATGGTC GTGCTGCTCA TGGGCCTCGT GTTCGCCTCT 
301 GTCTACATCT ACAGATACTT CTTCCTTGCG CAGCTGGCCC GAGATAACTT 
351 CTTCCGCTGT GGTGTGCTGT ATGAGGACTC CCTGTCCTCC CAGGTCCGGA 
401 CTCAGATGGA GCTGGAAGAG GATGTGAAAA TCTACCTCGA CGAGAACTAC 
451 GAGCGCATCA ACGTGCCTGT GCCCCAGTTT GGCGGCGGTG ACCCTGCAGA 
501 CATCATCCAT GACTTCCAGC GGGGTCTGAC TGCGTACCAT GATATCTCCC 
551 TGGACAAGTG CTATGTCATC GAACTCAACA CCACCATTGT GCTGCCCCCT 
601 CGCAACTTCT GGGAGCTCCT CATGAACGTG AAGAGGGGGA CCTACCTGCC 
651 GCAGACGTAC ATCATCCAGG AGGAGATGGT GGTCACGGAG CATGTCAGTG 
701 ACAAGGAGGC CCTGGGGTCC TTCATCTACC ACCTGTGCAA CGGGAAAGAC 
751 ACCTACCGGC TCCGGCGCCG GGCAACGCGG AGGCGGATCA ACAAGCGTGG 
801 GGCCAAGAAC TGCAATGCCA TCCGCCACTT CGAGAACACC TTCGTGGTGG 
851 AGACGCTCAT CTGCGGGGTG GTGTGAGGCC CTCCTCCCCC AGAACCCCCT 
901 GCCGTGTTCC TCTTTTCTTC TTTCCGGCTG CTCTCTGGCC CTCCTCCTTC 
951 CCCCTGCTTA GCTTGTACTT TGGACGCGTT TCTATAGAGG TGACATGTCT 
1001 CTCCATTCCT CTCCAACCCT GCCCACCTCC CTGTACCAGA GCTGTGATCT 
1051 CTCGGTGGGG GGCCCATCTC TGCTGACCTG GGTGTGGCGG AGGGAGAGGC 
1101 GATGCTGCAA AGTGTTTTCT GTGTCCCACT GTCTTGAAGC TGGGCCTGCC 
1151 AAAGCCTGGG CCCACAGCTG CACCGGCAGC CCAAGGGGAA GGACCGGTTG 
1201 GGGGAGCCGG GCATGTGAGG CCCTGGGCAA GGGGATGGGG CTGTGGGGGC 
1251 GGGGCGGCAT GGGCTTCAGA AGTATCTGCA CAATTAGAAA AGTCCTCAGA 
1301 AGCTTTTTCT TGGAGGGTAC ACTTTCTTCA CTGTCCCTAT TCCTAGACCT 
1351 GGGGCTTGAG CTGAGGATGG GACGATGTGC CCAGGGAGGG ACCCACCAGA 
1401 GCACAAGAGA AGGTGGCTAC CTGGGGGTGT CCCAGGGACT CTGTCAGTGC 
1451 CTTCAGCCCA CCAGCAGGAG CTTGGAGTTT GGGGAGTGGG GATGAGTCCG 
1501 TCAAGCACAA CTGTTCTCTG AGTGGAACCA AAGAAGCAAG GAGCTAGGAC 
1551 CCCCAGTCCT GCCCCCCAGG AGCACAAGCA GGGTCCCCTC AGTCAAGGCA 
1601 GTGGGATGGG CGGCTGAGGA ACGGGGCAGG CAAGGTCACT GCTCAGTCAC 
1651 GTCCACGGGG GACGAGCCGT GGGTTCTGCT GAGTAGGTGG AGCTCATTGC 
1701 TTTCTCCAAG CTTGGAACTG TTTTGAAAGA TAACACAGAG GGAAAGGGAG 
1751 AGCCACCTGG TACTTGTCCA CCCTGCCTCC TCTGTTCTGA AATTCCATCC 
1801 CCCTCAGCTT AGGGGAATGC ACCTTTTTCC CTTTCCTTCT CACTTTTGCA 
1851 TGTTTTTACT GATCATTCGA TATGCTAACC GTTCTCAGCC CTGAGCCTTG 
1901 GAGAGGAGGG CTGTAACGCC TTCAGTCAGT CTCTGGGGAT GAAACTCTTA 
1951 AATGCTTTGT ATATTTTCTC AATTAGATCT CTTTTCAGAA GTGTCTATAG 
2001 AACAATAAAA ATCTTTTACT TCTGAAAAAA AAAAAAAAAA AA 


BLAST Results 


No BLAST result 
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Medline entries 


96325063: 

Isolation of markers for chondro-osteogenic differentiation 
using cDNA library subtraction. Molecular cloning and 
characterization of a gene belonging to a novel multigene 
family of integral membrane proteins. 


Peptide information for frame 1 


ORF from 73 bp to 873 bp; peptide length: 267 
Category: similarity to known protein 


1 MVKISFQPAV AGIKGDKADK ASASAPAPAS ATEILLTPAR EEQPPQHRSK 
51 RGGSVGGVCY LSMGMWLLM GLVFASVYIY RYFFLAQLAR DNFFRCGVLY 
101 EDSLSSQVRT QMELEEDVKI YLDENYERIN VPVPQFGGGD PADIIHDFQR 
151 GLTAYHDISL DKCYVIELNT TIVLPPRNFW ELLMNVKRGT YLPQTYriQE 
201 EMVVTEHVSD KEALGSFIYH LCNGKDTYRL RRRATRRRIN KRGAKNCNAI 
251 RHFEKTFWE TLICGW 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_16112, frame 1 

SWISSNEW:ITMB_CHICK INTEGRAL MEMBRANE PROTEIN 2B (TRANSMEMBRANE PROTEIN 
E3-16)., N = 1, Score = 573, P « 1.4e-55 

SWISSNEW:ITMB_MOUSE INTEGRAL MEMBRANE PROTEIN 2B (E25B PROTEIN)., N » 

1, Score = 559, P = 4.2e-54 

SWISSNEW:ITMA_HUMAN INTEGRAL MEMBRANE PROTEIN 2 A (E25 PROTEIN)., N 1, 
Score - 452, P = 9.1e-43 

>SWISSNEW:ITMB_CHICK INTEGRAL MEMBRANE PROTEIN 2B (TRANSMEMBRANE PROTEIN 
E3-16). 

Length - 262 

HSPs: 

Score - 573 (86.0 bits). Expect « 1.4e-55, P = 1.4e-55 
Identities =» 118/264 (44%), Positives = 175/264 (66%) 

MVKISFQPAVAGIKGDKADKASASAPAPASATEILLTPAREEQPPQHRSKRGGSVGGVCY 60 
MVK+SF A+A + A+K ++ ++L+ P + + P+ G C+ 

MVKVSFNSALA— HKEAANKEEENS QVLILPP-DAKEPEDVWPAGHKRAWCW 50 

-LSMGMWLLMGLVFASVYIYRYFFLAQLARDNFFRCGVLY-EDSLS SQVRTQM- 112 

+ G+ +L G++ Y+Y+YF Q + CG+ Y ED LS +Q+++ 

CMCFGLAFMLAGVILGGAYLYKYFAF(Xi GGVYFCGIKYIEDGLSLPESGAQLKSARY 107 

-ELEEDVKIYLDENYERINVPVPQFGGGDPADIIHDFQRGLTAYHDISLDKCYVIELNTT 171 
+E++++I +E+ E I+VPVP+F DPADI+HDF R LTAY D+SLDKCYVI LNT-h 


+V+PP+NF ELL+N+K GTYLPQ+Y+I E+M+VT+ + + + LG FIY LC GK+TY+L+ 
VVMPPKNFLELLINIKAGTYLPQSYLIHEQMIVTDRIENVDQLGFFIYRLCRGKETYKLQ 

RRATRRRINKRGAKNCNAIRHFENTFVVETLIC 264 
R+ + I KR A NC IRHFEN F +ETLIC 
RKEAMKGIQKREAVNCRKIRHFENRFAMETLIC 260 

Pedant information for DKr2phfbr2_16112, frame 1 

Report for DKFZphfbr2_16112 . 1 

[LENGTH] 267 
(MWl 30223,94 


Query: 

1 

Sbjct: 

1 

Query: 

61 

Sbjct: 

51 

Query: 

113 

Sbjct: 

108 

Query; 

172 

Sbjct : 

168 

Query: 

232 

Sbjct: 

228 
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[pll 

8.16 


(HOMOLJ 

SWISSNEW : ITMB_CHICK 

INTEGRAL MEMBRANE PROTEIN 2B (TRANSMEMBRANE PROTEIN E3-16) 

le-49 


[PROSITE] 

PRENYLATION 1 


[PROSITE] 

MYRISTYL 5 


(PROSITE J 

CAMP PHOSPHO SITE 

2 

[PROSITE] 

CK2_PHOSPHO_SITE 

3 

(PROSITE] 

TYR PHOSPHO SITE 

1 

(PROSITE] 

PKC PHOSPHO SITE 

4 

[PROSITE] 

ASN GLYCOSYLATION 

X 

[KW] 

TRANSMEMBRANE 1 


fKW] 

LOW COMPLEXITY 15 

.36 % 


SEQ MVKISFQPAVAGIKGDKADKASASAPAPASATEILLTPAREEQPPQHRSKRGGSVGGVCY 

SEG xxxxxxxxxxxxxxxx 

PRD ccccccccchhhhhhhhhhhhhhhhhccccccceeecccccccccccccccccccccchh 
MMMMMMMMM 

SEQ LSMGMVVLLMGLVFASVYIYRYFFLAQLARDNFFRCGVLYEOSLSSQVRTQMELEEDVKI 

SEG . - xxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhcchhhhhhhhhhccceeeeeecccccccchhhhhhhhhhhlih 

MEM MMMMMMMMMMMMMMMMMMM 

SEQ YLDENYERINVPVPQFGGGDPADIIHDFQRGLTAYHDISLDKCYVIELNTTIVLPPRNFW 

SEG 

PRD hhcccceeeeccccccccccccchhhhhhhhhJihhhhhcccceeeeeccceeecccchlili 

MEM 

SEQ ELLMNVKRGTYLPQTYIIQEEMVVTEHVSDKEALGSFIYHLCNGKDTYRLRRRATRRRIN 

SEG ■•••••••••••••••••••••-•••••.».»......,,,....,,, xxxxxxxxxxxx 

PRD hhhhhticccccccceeeeehhhhhhhccccchhhhhheeeccccchhhhlihhhhhhhhhh 

MEM 

SEQ KRGAKNCNAIRHFENTFWETLICGVV 

SEG XX 

PRD hhhliccceeeecccchhhhhheeeccc 

MEM 


Prosite for DKFZphfbr2_16112 . 1 

PSOOOOl 169->173 ASN_GLYCOSYLATION 

PSO0O04 187->191 CAMP_PHOSPHO_SITE 

PS00004 232->236 CAMP_PHOSPHO_SITE 

PS00005 49->52 PKCPHOSPHO SITE 

PS00005 209->212 PKC^PHOSPHO^SITE 

PS00005 227->230 PKC_PH0SPHO_SITE 

PS00005 235->238 PKC_PHOSPHO_SITE 

PS00006 30->34 CK2_PHOSPHO_SITE 

PS00006 110->114 CK2_PH0SPH0_SITE 

PS00006 209->213 CK2_PHOSPHO_SITE 

PS00007 119->127 TYRPHOSPHO SITE 

PS00008 52->58 MYRISTYL * 

PS00008 53->59 MYRISTYL 

PSO0O08 7I->77 MYRISTYL 

PS00008 138->144 MYRISTYL 

PS00008 243->249 MYRISTYL 

PS00294 264->268 PRENYLATION 


PDOCOOOOl 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 

pDocooooe 

PDOC00007 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00266 


(No Pfam data available for DKFZphfbr2_16112 .1) 
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DKFZphfbr2_22f21 


PCT/IBOO/01496 


group: brain derived 

DKFZphfbr2_22f21 encodes a novel 567 amino acid protein with weak similarity to C. elegans 
cosmide C18C4 . 5 

No informative BLAST results; no predictive prosite, pfaro or SCOP motife 

The new protein can find application in studying the expression profile of brain-specific 
genes. 


weak similarity to C. elegans C18C4-5 

EST HSAA6531/HSAA5273/ defines Splice variant, or unspliced cDNA additional -180 Bp at 
position 250 

Sequenced by AGOWA 

Locus: /map="3ll.4 cR from top of Chrl4 linkage group" 
Insert length: 1910 bp 

Poly A stretch at pos. 1887, polyadenylation signal at pos. 1867 


1 TGGGCCCTTA GCAACGGCCT GGCGACGGTT TCCTTGCTGC TGCAGCCCCC 
51 GTCGGCTCCT CTTTTCCAGT CCTCCACTGC CGGGGCTGGG CCCGGCCGCG 
101 GGAAGGACCG AAGGGGATAC AGCGTGTCCC TGCGGCGGCT GCAAGAGGAC 
151 TAAGCATGGA TGGCAGCCGG AGAGTCAGAG CAACCTCTGT CCTTCCCAGA 
201 TATGGTCCAC CGTGCCTATT TAAAGGACAC TTGAGCACCA AAAGTAATGC 
251 TGCAGTAGAC TGCTCGGTTC CAGTAAGCAT GAGTACCAGC ATAAAGTATG 
301 CAGACCAACA ACGAAGAGAG AAACTCAAAA AGGAATTAGC ACAATGTGAA 
351 AAAGAGTTCA AATTAACTAA AACTGCAATG CGAGCCAATT ATAAAAATAA 
401 TTCCAAGTCA CTTTTTAATA CCTTACAAGA GCCCTCAGGC GAACCGCAAA 
451 TTGAGGATGA CATGTTAAAA GAAGAAATGA ATGGATTTTC ATCCTTTGCA 
501 AGGTCACTAG TACCCTCTTC AGAGAGACTA CACCTAAGTC TACATAAATC 
551 CAGTAAAGTC ATCACAAATG GTCCTGAGAA GAACTCCAGT TCCTCCCCGT 
601 CCAGTGTGGA TTATGCAGCC TCCGGGCCCC GGAAACTGAG CTCTGGAGCC 
651 CTGTATGGCA GAAGGCCCAG AAGCACATTC CCAAATTCCC ACCGGTTTCA 
701 GTTAGTCATT TCGAAAGCAC CCAGTGGGGA TCTTTTGGAT AAACATTCTG 
751 AACTCTTTTC TAACAAACAA TTGCCATTCA CTCCTCGCAC TTTAAAAACA 
801 GAAGCAAAAT CTTTCCTGTC ACAGTATCGC TATTATACAC CTGCCAAAAG 
851 AAAAAAGGAT TTTACAGATC AACGGATAGA AGCTGAAACC CAGACTGAAT 
901 TAAGCTTTAA ATCTGAGTTG GGGACAGCTG AGACTAAAAA CATGACAGAT 
951 TCAGAAATGA ACATAAAGCA GGCATCTAAT TGTGTGACAT ATGATGCCAA 
1001 AGAAAAAATA GCTCCTTTAC CTTTAGAAGG GCATGACTCA ACATGGGATG 
1051 AGATTAAGGA TGATGCTCTT CAGCATTCCT CACCAAGGGC AATGTGTCAG 
1101 TATTCCCTGA AGCCCCCTTC AACTCGTAAA ATCTACTCTG ATGAAGAAGA 
1151 ACTGTTGTAT CTGAGTTTCA TTGAAGATGT AACAGATGAA ATTTTGAAAC 
1201 TTGGTTTATT TTCAAACAGG TTTTTAGAAC GACTGTTCGA GCGACATATA 
1251 AAACAAAATA AACATTTGGA GGGGGAAAAA ATGCGCCACC TGCTGCATGT 
1301 CCTGAAAGTA GACTTAGGCT GCACATCGGA GGAAAACTCG GTAAAGCAAA 
1351 ATGATGTTGA TATGTTGAAT GTATTTGATT TTGAAAAGGC TGGGAATTCA 
1401 GAACCAAATA AATTAAAAAA TGAAAGTGAA GTAACAATTC AGCAGGAACG 
1451 TCAACAATAC CAAAAGGCTT TGGATATGTT ATTGTCGGCA CCAAAGGATG 
1501 AGAACGAGAT ATTCCCTTCA CCAACTGAAT TTTTCATGCC TATTTATAAA 
1551 TCAAAGCATT CAGAAGGGGT TATAATTCAA CAGGTGAATG ATGAAACAAA 
1601 TCTTGAAACT TCAACTTTGG ATGAAAATCA TCCAAGTATT TCAGACAGTT 
1651 TAACAGATCG GGAAACTTCT GTGAATGTCA TTGAAGGTGA TAGTGACCCT 
1701 GAAAAGGTTG AGATTTCAAA TGGATTATGT GGTCTTAACA CATCACCCTC 
1751 CCAATCTGTT CAGTTCTCCA GTGTCAAAGG CGACAATAAT CATGACATGG 
1801 AGTTATCAAC TCTTAAAATC ATGGAAATGA GCATTGAGGA CTGCCCTTTG 
1851 GATGTTTAAT CTTCATTAAT AAATACCTCA AATGGCCAGT AAAAAAAAAA 
1901 AAAAAAAAAA 


BLAST Results 


Entry HS477360 from database EMBL: 
human STS Wl-14643. 
Length =418 
Minus Strand HSPs: 

Score = 1850 (277.6 bits). Expect = 2.5e-77, P = 2.5e-77 

Identities « 392/405 (96%), Positives 392/405 (96%), Strand = Minus / 

Plus 
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Medline entries 


No Medline entry 


Peptide information for frame 3 


ORF from 156 bp to 1856 bp; peptide length: 567 
Category: similarity to unknown protein 


1 MDGSRRVRAT SVLPRYGPPC LFKGHLSTKS NAAVDCSVPV SMSTSIKYAD 
51 QQRREKLKKE LAQCEKEFKL TKTAMRANYK NNSKSLFNTL QEPSGEPQIE 
101 DDMLKEEMNG FSSFARSLVP SSERLHLSLH KSSKVITNGP EKNSSSSPSS 
151 VDYAASGPRK LSSGALYGRR PRSTFPNSHR FQLVISKAPS GDLLDKHSEL 
201 FSNKQLPFTP RTLKTEAKSF LSQYRYYTPA KRKKDFTDQR lEAETQTELS 
251 FKSELGTAET KNMTDSEMNI KQASNCVTYD AKEKIAPLPL EGHDSTWDEI 
301 KDDALQHSSP RAMCQYSLKP PSTRKIYSDE EELLYLSFIE DVTDEILKLG 
351 LFSNRFLERL FERHIKQNKH LEGEKMRHLL HVLKVDLGCT SEENSVKQND 
401 VDMLNVFDFE KAGNSEPNKL KNESEVTIQQ ERQQYQKALD MLLSAPKDEN 
451 EIFPSPTEFF MPIYKSKHSE GVIIQQVNDE TNLETSTLDE NHPSISDSLT 
551 S?LK?JJS5si ETCPLDV^^^ SNGLCGLNTS PSQSVQFSSV KGDNNHDMEL 

BLASTP hits 

Entry CEC18C4_3 from database trembL: 
"01804.5"; Caenorhabditis elegans cosinid C18C4 
Length = 1091 

Score = 98 (34,5 bits). Expect =0.29, P - 0 25 
Identities - 105/470 (22%), Positives = 192/470 (40%) 


Alert BLASTP hits for DKF2phfbr2_22f 21, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_22f21, frame 3 

Report for DKFZphfbr2_22f2l. 3 

(LENGTH) 567 

[MW] 64120.02 

[plj 5.68 

{PROSITEI AMIDATION 1 

[PROSITEJ MYRISTYL 3 

IPROSITEJ CAMP_PHOSPHO_SITE 1 

(PROSITEJ CK2_PH0SPH0_SITE 16 

[PROSITEJ PKC_PHOSPHO SITE 18 

(PROSITEJ ASN_GLYCOSYLATI0N 4 

IKW) All Alpha 

IKW) LOW_COMPLEXITY 1.23 % 

SEQ MDGSRRVRATSVLPRYGPPCLFKGHLSTKSNAAVDCSVPVSMSTSIKYADQQRREKLKKE 
SEG 

PRD 


cccccceeeeeeccccccccccccccccccceeeecccccccchhhhhhhhhhhhhlihiih 
LAQCEKEFKLTKTAMRANYKNNSKSLFNTLQEPSGEPQIEDDMLKEEMNGFSSFARSLVP 


SEQ 
SEG 

PRD lihii^^ii^hhhhhhhhhhhhccccccceeecccccccchhhhhhhhlihhcccc^ 

SEQ SSERLHLSLHKSSKVITNGPEKNSSSSPSSVDYAASGPRKLSSGALYGRRPRSTFPNSHR 
xxxxxxx. 


PRD 

SEQ 
SEG 
PRD 


ccchhhhhhhhceeeeccccccccccccccccccccccccccicccccccccccccc^ 

FQLVISKAPSGDLLDKHSELFSNKQLPFTPRTLKTEAKSFLSQYRYYTPAKRKKDFTDQR 
cceeeeeccccccccccccccccccccccccchhhhhhhhhhh 
SEQ lEAETQTELSFKSELGTAETKNMTDSEMNIKQASNCVTYDAKEKIAPLPLEGHDSTWDEI 
hhhhhhhhhhhhhhccccccccccchhhhhhhccceeehhhhhhcccccc^ 


PRD 
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SEQ KDDALQHSSPRAMCQYSLKPPSTRKIYSDEEELLYLSFIEDVTDEILKLGLFSNRFLERL 

SEG 

PRD cccccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhccchhhhhhh 

SEQ FERHIKQMKHLEGEKMRHLLHVLKVDI^TSEENSVKQNDVDMLNVFDFEKAGNSEPNKL 

SEG 

PRD hhhhhhhhhhcccchhhhhhhhhccccccccccccccccccccceeeecccccccccccc 

SEQ KNESEVTIQQERQQYQKALDMLLSAPKDENEIFPSPTEFFMPIYKSKHSEGVIIQQVNDE 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhcccccccccccccccccccccccccccceeeeecccc 

S EQ TNLETSTLDENH PS I SDSLTDRETS VNVI EGDSDPEKVEI SNGLCGLNTS PSQS VQFS SV 

SEG 

PRD ccccccccccccccccccccccccceeecccccccceeeeccccccccccccceeeeecc 

SEQ KGDNNHDMELSTLKIMEMSIEDCPLDV 

SEG 

PEID ccccccchhhhhhhhhhhhhccccccc 


Prosite for DKFZphfbr2_22f 21 . 3 


PSOOOOl 

8l->85 

PSOOOOl 

143->147 

PSOOOOl 

262->266 

PSOOOOl 

422->426 

PS00004 

1S9->163 

PS00005 

4->7 

PS00005 

27->30 

PS00005 

45->48 

PS00005 

122->12S 

PS00005 

132->13S 

PSOOOOS 

178->181 

PS00005 

202->205 

PSOOOOS 

209->212 

PSOOOOS 

212->215 

PSOOOOS 

250->253 

PSOOOOS 

309->312 

PSOOOOS 

317->320 

PSOOOOS 

322->325 

PSOOOOS 

353->356 

PSOOOOS 

395->398 

PSOOOOS 

500->S03 

PSOOOOS 

539->S42 

PSOOOOS 

552->555 

PS00006 

89->93 

PS00006 

149->1S3 

PS00006 

245->249 

PSOOOOS 

264->26B 

PSOOOOS 

295->299 

PSOOOOS 

328->332 

PSOOOOS 

337->>341 

PSOOOOS 

390->394 

PSOOOOS 

4S5->459 

PSOOOOS 

481->485 

PSOOOOS 

486->490 

PSOOOOS 

494->498 

PSOOOOS 

498->502 

PSOOOOS 

SOO->504 

PSOOOOS 

513->S17 

PSOOOOS 

SS9->563 

PSOOOOS 

1S4->170 

PSOOOOS 

2S6->2S2 

PSOOOOS 

3S0->356 

PS00009 

167->171 


ASNGLYCOSYLATION 
ASN^GLYCOSYLATION 
ASN_GLYCOSYLATION 
ASN GLYCOSYLATION 
CAMP_PHOS PHO_S I TE 
PKC_PHOSPHO_S ITE 
PKCPHOS PHOS I TE 
PKC^PHOS PHO_S I TE 
PKC^PHOS PHO_S I TE 
PKC_PHOS PHO_SITE 
PKC_PHOSPHO_SITE 
PKCPHOSPHOS ITE 
PKC_PHOSPHO_SITE 
PKC_PHOS PHO_S ITE 
PKC_PHOS PHO_S ITE 
PKCPHOS PHO_S ITE 
PKC_PHOSPHO_SITE 
PKCPHOSPHO SITE 
PKC_PHOSPHO"siTE 
PKCPHOS PHO^S ITE 
PKC_PHOSPHO_SITE 
PKC_PHOS PHO_S ITE 
PKC"PH0SPHO_S I TE 
CK2_PHOSPHO_SITE 
CK2_PH0SPHO SITE 
CK2_PH0S PHO~S I TE 
CK2_PH0SPHO_SITE 
CK2_PHOSPHO_SITE 
CK2_PH0SPHp_SITE 
CK2_PH0SPH0_SITE 
CK2_PHOSPHO_SITE 
CK2_PH0SPH0_SITE 
CK2_PH0S PHO_S ITE 
CK2_PH0SPHO SITE 
CK2 PHOSPHORS ITE 
CK2~PHOSPH07SITE 
CK22pHOS PH0_S ITE 
CK2_PH0SPH0_SITE 
CK2_PH0SPH0_SITE 
MYRISTYL 
MYRISTYL 
MYRISTYL 
AMI DAT I ON 


PDOCOOOOl 

PDOCOOOOl 

PDOCOOOOl 

PDOCOOOOl 

PDOC00O04 

PDOCOOOOS 

PDOC00005 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PE)OC0000S 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOC00009 


(No Pfam data available for DKFZphfbr2_22f21.3) 
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PCT/IBOO/01496 


DKFZphfbr2_22hl3 


group: transmembrane protein 

DKFZphfbr2_22hl3 encodes a novel 520 amino acid protein, with similarity to Drosophila 
melanogaster EG : 39E1 . 3 . 

The protein contains an ATP/GTP A Prosite pattern (P-loop) . This loop interacts with one of 
the phosphate groups of a A or G nucleotide. It is found in numerous ATP- or GTP-binding 
proteins, such as ATP synthase alpha and beta subunits. Myosin heavy chains, Kinesin heavy 
chains and kinesin-like proteins, Dynamins and dynamin-Iike proteins, several kinases, DNA and 
RNA helicases, GTP-binding elongation factors and the Ras family of GTP-binding proteins. 
Additionally, the novel protein contains one putative transmembran domain. 

No informative BLAST results; no predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes and as a new marker for neuronal cells. 


AC0047 80_1, differences to predicted genmodel 
membrane regions: 1 

AC004780_1, differences to predicted genmodel 

complete cDNA, complete cds, EST hits 
on genomic level encoded by AC004780, 
differences to predicted genmodel! 
TRANSMEMBRANE 1 

Sequenced by AGOHA 

Locus: unknown 

Insert length: 2292 bp 

Poly A stretch at pos. 2272, polyadenylation signal at pos. 2255 


1 GGGGGAGGGA ACTGATCTCA GCTCGGGCCC GCGTTACATC CTCCTCCTCT 
51 TCTTCCTTCG GCCCAGCTTT CCTTAGGGGC TGCAACCCGG ACGCCGAGGC 
101 CGGTTTCGGA GTGGGGAGTG CCCATTTTCT CTCCTTCCCA CGTTCCTGGC 
151 CCCCAGACGC CATTTGCAGG CGGGTGGCTT GGGTCAGCCT CCCCGCCCCC 
201 ACCCGACTCC CGTCACGGGA GAGCGCACAC CGCGCCCCGA GAACCAATCA 
251 GCAGCCGCGT TAGGTAACCA TGTCTGAGTC TGGACACAGT CAGCCTGGAC 
301 TCTATGGGAT AGAGCGGCGG CGACGGTGGA AGGAGCCTGG CTCTGGTGGC 
351 CCCCAGAATC TCTCTGGGCC TGGTGGTCGG GAGAGGGACT ACATTGCACC 
401 ATGGGAAAGA GAGAGAAGGG ATGCCAGCGA AGAGACAA6C ACTTCCGTCA 
451 TGCAGAAAAC CCCCATCATC CTCTCAAAAC CTCCAGCAGA GCGGTCAAAA 
501 CAGCCACCAC CTCCAACAGC CCCTGCTGCC CCGCCTGCTC CAGCCCCTCT 
551 GGAGAAGCCC ATCGTTCTCA TGTUVGCCACG GGAGGAGGGG AAGGGGCCTG 
601 TGGCCGTGAC AGGTGCCTCT ACCCCTGAGG GCACCGCCCC ACCACCCCCT 
651 GCAGCCCCTG CGCCACCCAA GGGGGAGAAG GAGGGGCAGA GACCCACACA 
701 GCCTGTGTAC CAGATCCAGA ACCGGGGCAT GGGCACTGCC GCACCAGCAG 
751 CCATGGACCC TGTCGTGGGT CAGGCCAAAC TACTGCCCCC AGAGCGCATG 
801 AAGCACAGCA TCAAGTTGGT GGATGACCAG ATGAATTGGT GTGACAGTGC 
851 CATCGAGTAC CTGTTGGATC AGACTGATGT GTTGGTGGTT GGTGTCCTGG 
901 GCCTCCAGGG GACAGGCAAG TCCATGGTCA TGTCATTGTT GTCAGCCAAC 
951 ACTCCAGAGG AGGACCAGAG GACTTATGTT TTCCGGGCCC AGAGCGCTGA 
lOOl AATGAAGGAA CGAGGGGGCA ACCAGACCAG TGGCATCGAC TTCTTTATTA 
1051 CCCAAGAACG GATTGTTTTC CTGGACACAC AGCCCATCCT GAGCCCTTCT 
1101 ATCCTAGACC ATCTCATCAA TAATGACCGC AAACTGCCTC CAGAGTACAA 
1151 CCTTCCCCAC ACTTACGTTG AAATGCAGTC ACTCCAGATT GCTGCCTTCC 
1201 TTTTCACGGT CTGCCATGTG GTGATTGTTG TCCAGGACTG GTTCACAGAC 
1251 CTCAGTCTCT ACAGGTTCCT GCAGACAGCA GAGATGGTGA AGCCCTCCAC 
1301 CCCATCCCCC AGCCACGAGT CCAGCAGCTC ATCGGGCTCC GATGAAGGCA 
1351 CCGAGTACTA CCCCCACCTA GTCTTCTTGC AGAACAAAGC TCGCCGAGAG 
1401 GACTTCTGTC CTGGGAAGCT 6CGGCAGATG CACCTGATGA TTGACCAGCT 
1451 CATGGCCCAC TCCCACCTGC GTTACAAGGG AACTCTGTCC ATGTTACAAT 
1501 GCAATGTCTT CCCGGGGCTT CCACCTGACT TCCTGGACTC TGAGGTCAAC 
1551 TTATTCCTGG TACCCTTCAT GGACAGTGAA GCAGAGAGTG AAAACCCACC 
1601 AAGAGCAGGA CCTGGTTCCA GCCCACTCTT CTCCCTGCTG CCTGGGTATC 
1651 GTGGCCACCC CAGTTTCCAG TCCTTGGTGA GCAAGCTCCG GAGCCAAGTG 
1701 ATGTCCATGG CCCGGCCACA GCTGTCACAC ACGATCCTCA CCGAGAAGAA 
1751 CTGGTTCCAC TACGCTGCCC GGATCTGGGA TGGGGTGAGA AAGTCCTCTG 
1801 CTCTGGCAGA GTACAGCCGC CTGCTGGCCT GAGGCCAAGG AGAGGAATGT 
1851 CATGCAGGGG ACCTCCTGGG TCCGCAGTGT ACTGCGAGGG AGCACAGATG 
1901 TCCATCCCCC GCTGGGGTGG AGAGCGGCAG CAGGCCTGAT GGATGAGGGA 
1951 TCGTGGCTTC CCGGCCCAGA GACATGAGGT GTCCAGGGCC AGGCCCCCCA 
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2001 CCCTCAGTTG GGGCTGTTCC GGGGGTGACT GTGAGCGATC CCACCCCAAA 

2051 CCTGAGATGG GGTAGCCCGT CCTGTGTCCT CCACAGGGAC AAGCAGTGGG 

2101 AGGAGTCTGA ATGGTCACCA GGAAGCCCGG GCTCCATCTT GACCTCCTTT 

2151 TTCAGGGACA GGAGCAACAG GCCCCTCTTC CCTGACTCTA AGCCCTTCCC 

2201 TGTAAGGTGA GGCAGGGTCT GGAGAGCTCT TTATTGGAAC AGATCTGGTG 

2251 GTTCAAATAA ACACAGTCAT GCAAAAAAAA ATVAAAAAAAA AA 


BLAST Results 


Entry AC004780 from database EMBL: 

Homo sapiens chromosome 19, cosmid F17127, complete sequence. 
Score •= 2616, P - O.Oe-t-OO, identities = 524/525 
15 exons Bp 8031-31789 


Medline entries 


No Hedline entry 


Peptide information for frame 3 


ORF from 270 bp to 1829 bp; peptide length: 520 
Category: similarity to unknown protein 
Prosite motifs: ATP_GTP_A (211-219) 


1 M5ESGHSQPG LYGIERRRRW KEPGSGGPQN LSGPGGRERD YIAPWERERR 
51 DASEETSTSV MQKTPIILSK PPAERSKQPP PPTAPAAPPA PAPLEKPIVL 
101 MKPREEGKGP VAVTGASTPE GTAPPPPAAP APPKGEKEGQ RPTQPVYQIQ 
151 NRGMGTAAPA AMDPWGQAK LLPPERMKHS IKLVDDQMNW CDSAIEYLLD 
201 QTDVLVVGVL GLQGTGKSMV MSLLSANTPE EDQRTYVFRA QSAEMKERGG 
251 NQTSGIDFFI TQERIVFLDT QPILSPSILD HLINNDRKLP PEYNLPHTYV 
301 EMQSLQIAAF LFTVCHVVIV VQDWFTDLSL YRFLQTAEMV KPSTPSPSHE 
351 SSSSSGSDEG TEYYPHLVFL QNKARREDFC PRKLRQMHLM IDQLMAHSHL 
401 RYKGTLSMLQ CNVFPGLPPD FLDSEVNLFL VPFMDSEAES ENPPRAGPGS 
451 SPLFSLLPGY RGHPSFQSLV SKLRSQVMSM ARPQLSHTIL TEKNWFHYAA 
501 RIWDGVRKSS ALAEYSRLLA 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_22hl3, frame 3 

TREMBL:AC0G47B0_1 product: ••F17127_l-; Homo sapiens chromosome 19, 

cosmid F17127, con^lete sequence., N = 2, Score =1264, P = 1.3e-23l 

TREMBL:CEY54E2A_1 gene: "Y54E2A.2"; Caenorhabditis elegans cosmid 
Y54E2A, N = 2, Score = 219, P = 1.4e-15 


>TREMBL:AC004780_1 product: "F17127_l"; Homo sapiens chromosome 19, cosmid 
F17127, complete sequence. 
Length « 528 

HSPs: 

Score - 1264 (189.6 bits). Expect - 1.3e-231, Sum P(2) = 1.3e-231 
Identities = 254/302 (84%), Positives = 264/302 (87%) 

Query: 4 6 ERERRDASEETSTSVMQKTPIILSKPPAERSKQPPPPTAPAAPPAPAPLEKPIVLMKPRE 105 

E+ER D+ + S +Q+T + R + P + A APLEKPIVLMKPRE 

Sbjct: 39 EKER-DSDSDFSP— LQQTEGCQRRDKHFRHAENPHHPLKTSSRA-APLEKPIVLMKPRE 94 

Query: 106 EGKGPVAVTGASTPEGTAPPPPAAPAPPKGEKEGQRPTQPVYQIQNRGMGTAAPAAMDPV 165 

EGKGPVAVTGASTPEGTAPPPPAAPAPPKGEKEGQRPTQPVYQIQNRGMGTAAPAAMDPV 
Sbjct: 95 EGKGPVAVTGASTPEGTAPPPPAAPAPPKGEKEGQRPTQPVYQIQNRGMGTAAPAAMDPV 154 

Query: 166 VGQAKLLPPERMKHSIKLVDDQMNWCDSAIEYLLDQTDVLVVGVLGLQGTGKSMVMSLLS 225 

VGQAKLLPPERMKHSIKLVDDQMNWCDSAIEYLLDQTDVLVVGVLGLQGTGKSMVMSLLS 
Sbjct: 155 VGQAKLLPPERMKHSIKLVDDQMNWCDSAIEYLLDQTDVLVVGVLGLQGTGKSMVMSLLS 214 
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Query: 

226 

Sbjct: 

215 

Query : 

286 

Sbjct: 

275 

Query: 

346 

Sbjct: 

335 

Score - 

993 


Identities 


Query: 

332 

Sbjct: 

340 

Query: 

392 

Sbjct: 

400 

Query: 

452 

Sbjct: 

460 

Query: 

512 

Sbjct: 

520 


ANTPEEDQRTYVrRAQSAEMKERGGNQTSGIDFFITQERIVFLDTQPILSPSILDHLINN 285 
ANTPEEDQRTYVFRAQSAEMKERGGNQTSGIDFFITQERIVFLDTQPILSPSILDHLINN 
ANTPEEDQRTYVFRAQSAEMKERGGNQTSGIDFFITQERIVFLDTQPILSPSILDHLINN 274 

DRKLPPEYNLPHTYVEMQSLQIAAFLFTVCHWIVVQDWFTDLSLYRFLQTAEMVKPSTP 345 
DRKLPPEYNLPHTYVEMQSLQIAAFLFTVCHWIVVQDWFTDLSLYR K ++ 

DRKLPPEYNLPHTYVEMQSLQIAAFLFTVCHVVIVVQDWFTDLSLYRLWDLGCKCKSNSH 334 

SP 347 

SP 

SP 336 

(149.0 bits). Expect = 1.3e-231, Sum P(2) = 1.3e-231 
= 189/189 (100%), Positives = 189/189 (100%) 

RFLQTAEMVKPSTPSPSHESSSSSGSDEGTEYYPHLVFLQNKARREDFCPRKLRQMHLMI 391 
RFLQTAEMVKPSTPSPSHESSSSSGSDEGTEYYPHLVFLQNKARREDFCPRKLRQMHLMI 
RFLQTAEMVKPSTPSPSHESSSSSGSDEGTEYYPHLVFLQNKARREDFCPRKLRQMHLMI 399 

DQLMAHSHLRYKGTLSMLQCMVFPGLPPDFLDSEVNLFLVPFMDSEAESENPPRAGPGSS 451 
DQLMAHSHLRYKGTLSKLQCNVFFGLPPDFLDSEVNLFLVPFMDSEAESENPPRAGPGSS 
DQLMAHSHLRYKGTLSMLQCNVFPGLPPDFLDSEVNLFLVPFMDSEAESENPPRAGPGSS 459 

PLFSLLPGYRGHPSFQSLVSKLRSQVMSMARPQLSHTILTEKNWFHYAARIWDGVRKSSA 511 
PLFSLLPGyRGHPSFQSLVSKLRSQVMSMARPQLSHTH.TEKNWFHYAARIWDGVRKSSA 
PLFSLLPGYRGHPSFQSLVSKLRSQVMSMARPQLSHTILTEKNWFHYAARIWDGVRKSSA 519 

LAEYSRLLA 520 
LAEYSRLLA 
LAEYSRLLA 528 


Pedant information for DKFZphfbr2_22hl3, frame 3 
Report for DKFZphfbr2_22hl3. 3 


[LENGTH) 520 

fMW] 57650.81 

[pl} 6.52 

[HOMOL] TREMBL:AC004780_1 product: ••F17127_1"; Homo sapiens chromosome 19, cosmid 

F17127, complete sequence. 0.0 

[PROSITE] ATP GTPA 1 

IPROSITEJ MYrTsTYL 8 

f PROSITE] CAMP_PH0SPHO_SITE 1 

(PROSITEl CK2 PHOSPHO__SITE 8 

{ PROSITE) GLYCOSAMINOGLYCyWI 1 

(PROSITE) PKC_PHOSPHO_SITE 3 

[PROSITE) ASN_GLYC0SYLATION 2 

(KW) TRANSMEMBRANE 1 

(KW) LOW COMPLEXITY 11,73 % 


SEQ 
SEG 
PRO 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 


MSESGHSQPGLYGIERRRRWKEPGSGGPQNLSGPGGREROyXAPHERERRDASEETSTSV 
cccccccccccccccccccccccccccccccccccccceeeeehhhhhhiihhccccccee 


MQKTPIILSKPPAERSKQPPPPTAPAAPPAPAPLEKPIVLMKPREEGKGPVAVTGASTPE 

xxxxxxxxxxxxxxx 

eeccceeecccccccccccccccccccccccccccceeeeeccccccccceeeecccccc 


GTAPPPPAAPAPPKGEKEGQRPTQPVYQIQNRQIGTAAPAAMDPVVGQAKLLPPERMKHS 

. . xxxxxxxxxxx , 

cccccccccccccccccccccccceeeeeeccccccccccccceeecceeecccchhhhli 


I KLVDDQMNWCDSAI EYLLDQTDVLWGVLGLQGTGKSMVMSLLSANTPEEDQRTYVFRA 

xxxxxxxxxxxxxxxxxxx 

hhhhcccchhhhhhhhhhccccceeeeeecccccccc)ihh)ihhhhccccc)ihhhhheeee 


QSAEMKERGGNQTSGIDFFITQERIVFLDTQPILSPSILDHLINNDRKLPPEYNLPHTYV 
hhhhhhhccccccee ee e e e ecce e ee e ecccccccccccccccccc ccccccccccchh 

EMQSLQIAAFLFTVCHWIVVQDWFTDLSLYRFLQTAEMVKPSTPSPSHESSSSSGSDEG 
xxxxxxxxxxxxxxxx . . - 
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PRD hhhhhhhhhhhhhhhheeeeeeeccchhhhhhhhhhhhhhhccccccccccccccccccc 

MEM MMMMMMMMMMMMMM>DyiMMMMMMM 

SEQ TEYYPHLVFLQNKARREDFCPRKLRQMHLMIDQLMAHSHLRYKGTLSMLQCNVFPGLPPD 

SEG 

PRD cccccceeeehhhhhhhcccccchhhhhhhhhhhhhhhhhhccccccccccccccccccc 

MEM 


SEQ FLDSEVNLFLVPFMDSEAESENPPRAGPGSSPLFSLLPGYRGHPSFQSLVSKLRSQVMSM 

SEG 

PRD chhhhhheeeeeccccccccccccccccccccceeeccccccccchhhhhhhhhhhhhhh 

MEM 

SEQ ARPQLSHTILTEKNWFHYAARIWDGVRKSSALAEYSRLLA 

SEG 

PRD hhhhhhhheeeccchhhhhhhhhhhhcchhhhhhhhhccc 

MEM 


Prosite for DKFZphfbr2_22hl3. 3 


PSOOOOl 

30->34 

PSOOOOl 

251 

->255 

PS00002 

32->36 

PS00004 

507 

->511 

PS00005 

ISO- 

->183 

PS00005 

215 

->218 

PS00005 

491- 

->494 

PS00006 

117 

->121 

PS00006 

193- 

->197 

PS00006 

228- 

->232 

PS00006 

254- 

->25B 

PS00006 

211- 

->281 

PS00006 

298- 

->302 

PS00D06 

355- 

->359 

PS00006 

436- 

■>440 

PS00008 

2i 

;->32 

PS00008 

139- 

->145 

PS00008 

153- 

>159 

PS00008 

211- 

•>217 

PS00008 

214- 

■>220 

PS00008 

249- 

>255 

PS00008 

356- 

>362 

PS00008 

505- 

>511 

psoooi? 

211- 

>219 


ASN_GLYCOS YLATION 

ASN_GLYCOSYLATI0N 

GL YCOSAMI NOGLYCAN 

CAMP_PH0SPHO_SITE 

PKC_PHOSPH0_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PH0SPH0_SITE 

CK2 PHOSPHO_SITE 

CK2~PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2 PHOSPHO_SITE 

CK2"PH0S PHO_SITE 

CK2_PH0SPH0_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

ATP GTP A 


PDOCOOOOl 
PDOCOOOOl 
PDOC00002 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 
PDOC00008 
PDCX:00008 
PDCX:00008 
PDOC00008 
PDOC00008 
PDOC00Q08 
PDCX:00017 


(No Pfara data available for DKFZphfbr2_22hl3.3) 
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DKFZphfbr2_22i4 


group: brain derived 

DKFZphfbr2_22i4 . 1 encodes a novel 228 amino acid protein with similarity to the N-terminus of 
human p52rIPK. 

No informative BLAST results; no predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 


similarity to Human P52rIPK N-terminus 
complete cDNA, complete cds, few EST hits 

function of P52riPK, repressor of p58IPK protein kinase inhibitor 
upstream regulator of interferon induced proteins 

Sequenced by AGOWA 

Locus: unknown 

Insert length: 4748 bp 

Poly A stretch at pos. 4726, polyadenylation signal at pos. 4709 


1 TGGGTCCGGT CCTAGGGTCA CACCCACCGC AGGGTCTGGC TTGGTACAGT 
51 TGGGTGCATG CAGAAGTAGG TGGAGCTGCT GTTGCAGCCT TGAGAGAGTT 
101 TTATTGTAAA ACTCTTGTAA TTTATAGTAA TCGGAGGGGA AAACACCTCT 
151 TCCTTTTAAT TGCTCTGAGG ACCGCTGCCA AAGAAACGCA GTAGATCCGC 
201 TCCCTCTTGG GGGCGGGGAG AAAGAACGGG TTGTGTCCGC CATGTTGGTG 
251 AAGTCAAGCG AAGGCGACTA GAGCTCCAGG AGGGCCAGTT CTGTGGGCTC 
301 TAGTCGGCCA TATTAATAAA GAGAAAGGGA AGGCTGACCG TCCTTCGCCT 
351 CCGCCCCCAC ATACACACCC CTTCTTCCCA CTCCGCTCTC ACGACTAAGC 
401 TCTCACGATT AAGGCACGCC TGCCTCGATT GTCCAGCCTC TGCCAGAAGA 
451 AAGCTTAGCA GCCAGCGCCT CAGTAGAGAC CTAAGGGCGC TGAATGAGTG 
501 GGAAAGGGAA ATGCCGACCA ATTGCGCTGC GGCGGGCTGT GCCACTACCT 
551 ACAACAAGCA CATTAACATC AGCTTCCACA GGTTTCCTTT GGATCCTAAA 
601 AGAAGAAAAG AATGGGTTCG CCTGGTTAGG CGCAAAAATT TTGTGCCAGG 
651 AAAACACACT TTTCTTTGTT CAAAGCACTT TGAAGCCTCC TGTTTTGACC 
701 TAACAGGACA AACTCGACGA CTTAAAATGG ATGCTGTTCC AACCATTTTT 
751 GATTTTTGTA CCCATATAAA GTCTATGAAA CTCAAGTCAA GGAATCTTTT 
801 GAAGAAAAAC AACAGTTGTT CTCCAGCTGG ACCATCTAAT TTAAAATCAA 
851 ACATTAGTAG TCAGCAAGTA CTACTTGAAC ACAGCTATGC CTTTAGGAAT 
901 CCTATGGAGG CAAAAAAGAG GATCATTAAA CTGGAAAAAG AAATAGCAAG 
951 CTTAAGAAGA AAAATGAAAA CTTGCCTACA AAAGGAACGC AGAGCAACTC 
1001 GAAGATGGAT CAAAGCCACG TGTTTGGTAA AGAATTTAGA AGCAAATAGT 
1051 GTATTACCTA AAGGTACATC AGAACACATG TTACCAACTG CCTTAAGCAG 
1101 TCTTCCCTTG GAAGATTTTA AGATCCTTGA ACAAGATCAA CAAGATAAAA 
1151 CACTGCTAAG TCTAAATCTA AAACAGACCA AGAGTACCTT CATTTAAATT 
1201 TAGCTTGCAC AGAGCTTGAT GCCTATCCTT CATTCTTTTC AGAAGTAAAG 
1251 ATAATTATGG CACTTATGCC AAAATTCATT ATTTAATAAA GTTTTACTTG 
1301 AAGTAACATT ACTGAATTTG TGAAGACTTG ATTACAAAAG AATAAAAAAC 
1351 TTCATATGGA AATTTTATTT GAAAATGAGT GGAAGTGCCT TACATTAGAA 
1401 TTACGGACTT AAAAATTTTG CTAATAAATT GTGTGTTTGA AAGGTGTTTT 
1451 TTGTTTTTGT CTTTTTAAAC TACTGTTAAA AGAACAGCTT ATGATAAGTA 
1501 ATATGTTTAA CTTAGAGAAG AATTTTTTCC TGTACCAAAG TTGGCATATT 
1551 GCATTCTAAA TAAGATGCTA AATAAGAGTT AACCAACATT CAACATGACC 
1601 TTAAAACTGC TGGGTTTTGT ATTAATTAAA TTATAATTGG CACTGTGATT 
1651 TGAAAAATTT ATAGAAAAAA AGGTACAGGG CAAGTTTTTA AATTAAAACT 
1701 TTCTATATTT TGTTTTACCA GTAAAAGTGA GCTTATCATG GCCTCTCTCA 
1751 TAAGAATGAT TTTAAAATAG GTTGTAAAAT ATTTTGAAAA TATTTGAATG 
1801 TGAAGTACCA TTGAGTCATC CAAACTAGGT AAGGCCTCAA GTACTTTAAA 
1851 CTAGTAAAAT CTAGTAGCTG ATAATATTCA CCTAAGTAAG TGTTGTAAAA 
1901 TAATTCAGAG TTCAGGACCT AGCTTAGATA AATGTATACT ACTCTTTTTC 
1951 TCATAGTAAA AATCTTACAT TTCCAACTTC AAAATTGGTG CTTCCATATT 
2001 TGTTGATAAC CAAAACTCCT AAGGTTTTTT GTTTTCTTTT TAACTACTTT 
2051 CCAAATGCAT ACTATACCTC AGAAATAGTG TATCAATATA GTGGGCTTTT 
2101 TTTTTCCTCT TCATAAACCC ACAGTAAAAT TTAATCACAG GAAACTACTT 
2151 ATATCTTCAC ACTTTGTATT GATAACTTAA AATGGCATCA GTTTATCTTA 
2201 GACATCAGCT TGCTTTTTAT CTCCTTTTTT AGTGAGTGAA ATAGAGCAAC 
2251 TAGCATGCCT GTGTTCCCAG CTACTTGGGA GGCTAAGGTG GGAAGATCAA 
2301 TTGAACCTAG GAGGTTGAGG CTATAGTGAG CTGTGATTGC ACGACTGCAC 
2351 TCCAGCCTGG GCAATGGAGT GAGACTCCTG TCTCTAAAAC AGCAACAACA 
2401 AAAATAAAGC AACCATAGTG CATAAGGGAA ATTAAATGTT CCCTATAGAA 
2451 ATATGTGTAT GTCTGTGATA GTGGTATGCA AATGCTAATT ATTTTATAAA 
2501 ATAAAAGTTC AGAACTATTC TTATCATTGC CACTTGAACA ATTAAAGGGT 
2551 TTGCTTTATT TCACTAATGT TTAATAGGAA CCCTTTGCTT CAAACAGCTT 
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2601 TGTTGAAATC ATGTAAAAAT TTGTTAATAG AGAATCAAGT TATTTAACTC 
2651 AACTTATTTA ATTCAAGCTT GTGATACTAA CATACAAAGG TAGCATAAAC 
2701 CAAGTCATAA ATTGCTGTAA TCTTTCCTGT AGAGTAATAG CTACTTCATG 
2751 ATTTTTTTAA AAATTTCATT TTTTTGCTAT TTAGGATTGC ATTTGCTTGG 
2801 CTCCTAGTAA CAATTCTTTT ACAGTATTAG CACTCTCTTT ACTAAGGAAT 
2851 GCCTCCCAAG GAAATGCAAA GGTAGGAAAA GTCTCTTAGA ATGCCCATGA 
2901 GGTATTTAAA ACAGATATTT ATGAAAATCT TTTTGTGAAT GTTATAAATC 
2951 TTGCTAGTTA TTTTATCTTT ATCTTAAGTA TTAGATGTAG TTCCTTGGAA 
3001 TTGTCATTAC ATATTTATTT TTTTCTAGTG TGGTTTCAAA TAACTTTTTG 
3051 CCAACATATA ATCATCATCA AACATTCACT GACCATATCT ATTTTATAAC 
3101 TCAAAATAAG TTGGACAAAT AATCATTTTA ATAAAAACTA TTTTTTCCAA 
3151 GTATAACCAC TGTCATGTGG TTCACCCTTC ACCCCAGATA CAAAACACTT 
3201 ATTTGTGTAG CCCAGTTCCC ATCTACAGTA ATACCTTGAA ACCTTAATAA 
3251 ATTTTAAAAA TCATAAAAAT AAAATATTGT AAAATACAAC AAATTTTGGA 
3301 CAAGGTTACT TCATCTTCAT TCATTATTAC CTGACAGTAT TAAACTACTA 
3351 CTCAATAATT TTAGAGTAAA CTTTTCTGTG TTTTCCCCGT GATTTTCATT 
3401 GTGCTGTCCT GACAACATGC TCCAAACTCT TTGCATCAAA TTGTTTTATT 
34 51 AACATACATT TGTCTACCTT AAAACTAGCT TTATTCACAG AGAAAGACCT 
3501 AAAAGGAGTC TATTAAAATG CTGCTTTCAG TTTGATAGTT TTTTTTTTAA 
3551 TCACTCTGAC CATAAACTAA CTGAAATTAT AATGGATTTT TTTTCCTCTC 
3601 CCGGTCACAA CACAGATCTT CTGTTCATTT GTTCTCTGTC TACTGGGCAC 
3651 CAACCTCTAC AAAGAACCAG CCAAAGGCTA GGTACTTGAT ATAAAAAGGA 
3701 ATATTACATT ATTTTCTGCC CTCAAGTTGC TCTATCTCCT GAAAGAAACfl 
3751 AGTAATATTT ATAATACAAT ATGATAAATG CTACAAAAGA AATAGCTGTA 
3801 AAGTCCTTTG GTAAATGCTG TTGAATTGGA ATTCAGTAAG AACTATAAAC 
3851 TGTAGACCTT TTTATAATCA AATGCTTTTG TCTTGAAACA AAACAGATTC 
3901 CTCCTTATAT TGACTTAGCA AAGGAGGTAC AAGGACATTG GCATTTGACC 
3951 TGAATTATGG TGTTTTATTG AATGAGCTAT AAGACAACAT TTTTACCCTT 
4001 TAAAATGAAC ACTGAACAAA TGTGTTAATG GTATCTTTGT TAAAAGGAAA 
4051 ACATAGCTAT AAATAAAATA CTACATCGAA ATCCAGCACT GGAGTTCATT 
4101 TGAAATTTGA TATTTTGTGT AAAGTAACAA ACCTATTAAC ACAGATTTTT 
4151 AAAATAACTC AGAATCGTAT AAAGCACTTT GGTACTTATT TGTTCTCTTT 
4201 TCCCTTACAT TCTGTGTGGT AGGTGGTATT ATCTCTGATT TACACATGAA 
4251 GACATCCTTG TTAATGCAAT TTATTTATTC ATTCGGGCAT TTACTGTGTG 
4301 CCAACTTGCA AAAGGAATAG AAATGTCTGT GATCTAGATA GTTCTAGATT 
4351 GAACATAGAT TTTCTGCCAA CAAATCCTCT CTGCTGTTCA CATTATCCTT 
4401 TGTTTAACGT ATGAACCAGG TTACTAAAAT AGGATAAATC ATGTGTCTTA 
4451 GAATATGAAA ATAGTAAGGT CTTTGAGGTC ACTTGATCTT CTCTAAGTAG 
4501 ACTTTATAAT ATTGTGTTTT ATCTCATTTC TCAATATTAG AATACGGGTA 
4 551 GATTTTAATT TTGCTATAAT ATAGGAAATG GTTCATCTTT GTACCAAAAT 
4 601 ATTGCATTCT TCTGATATTT AGACAGTTGG AAACTTTCTA AAATTGAGGA 
4 651 TTTTGTAGTG TATACTAAAT AATTGCATAT TCAAAAAAAT GTATTCTGAG 
4701 TATGGTGATA TTAAACATTT TTCCCCAAAA AAAAAAAAAA AAAAAAAA 


BLAST Results 


No BLAST result 


Medline entries 


98107671: 

Regulation of inter feron-induced protein kinase PKR: 
modulation of P58IPK inhibitory function by a novel protein. 


Peptide information for frame 1 


ORF from 511 bp to 1194 bp; peptide length: 228 
Category: similarity to known protein 


1 MPTNCAAAGC ATTYNKHINI SFHRFPLDPK RRKEWVRLVR RKNFVPGKHT 
51 FLCSKHFEAS CFDLTGQTRR LKMDAVPTIF DFCTHIKSMK LKSRNLLKKN 
101 NSCSPAGPSN LKSNISSQQV LLEHSYAFRN PMEAKKRIIK LEKEXASLRR 
151 KMKTCLQKER RATRRWIKAT CLVKNLEANS VLPKGTSEHM LPTALSSLPL 
201 EDFKILEQDQ QDKTLLSLNL KQTKSTFI 


BLASTP hits 


Entry AF007393_1 from database TREMBL; 

product: "P52rIPK"; Homo sapiens P52rIPK mRNA, complete cds. 
Score - 166, P - 2.5e-ll, identities » 40/106, positives - 56/106 
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Alert BLASTP hits for DKF2phfbr2_22i4, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_22i4, frame 1 

Report for DKFZphfbr2_22i4.1 


[LENGTH]' 228 

[MW] 26259.94 

(plj 10.17 

(HOMOLJ TREMBL:AF007393 1 product: '•P52rlPK'' 
le-09 

[PROSITE] MYRISTYL 1 

tPROSITE) CAMP_PH0SPHO_SITE 1 

IPROSITEJ CK2_PH0SPH0_SITE 2 

(PROSITE] PKCPHOSPHOSITE 4 

[PROSITE J ASNGLYCOSYLATION 3 

[KWJ All^Alpha 

[KWJ LOW_COMPLEXITY 7.02 % 


Homo sapiens P52riPK mRNA, complete cds. 


SEQ MPTNCAAAGCATTYNKHINISFHRFPLDPKRRKEWVRLVRRKNFVPGKHTFLCSKHFEAS 

SEG 

PRO cccccccccccccccccccceeeecccccchhhhhhhhhhhhhcccccceeehhhhhhhli 

SEQ CFDLTGQTRRLKMDAVPTIFDFCTHIKSMKLKSRNLLKKNNSCSPAGPSNLKSNISSQQV 

xxxxxxxxxxxxxxxx 

PRO cccccccccccccccccceeeeccccchhhhhhhhhhhccccccccccccccccccchhh 

SEQ LLEHSYAFRNPMEAKKRIIKLEKEIASLRRKMKTCLQKERRATRRWIKATCLVKNLEANS 

SEG 

PRD hhhhcccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhheeeeccc^^ 

SEQ VLPKGTSEHMLPTALSSLPLBDFKILEQDQQDKTLLSLNLKQTKSTFI 

SEG 

PRD cccccccccccccccccccccchhhhhhcccccccccccccccccccc 


Prosite for DKFZphfbr2_22i4 . 1 


PSOOOOl 

19 

->23 

ASN 

GLYCOSYLATION 

PDOCOOOOl 

PSOOOOl 

100- 

>104 

asn' 

GLYCOSYLATION 

PDOCOOOOl 

PSOOOOl 

114- 

>118 

asn" 

"GLYCOSYLATION 

PDOCOOOOl 

PS00004 

160- 

>164 

CAMP PHOSPHO SITE 

PDOC00004 

PS00005 

68 

->71 

PKC 

PHOSPHO 

SITE 

PDOC00005 

PS00005 

88 

->91 

PKC" 

'pHOSPHO" 

"site 

PDOC00005 

PS00005 

147- 

>150 

PKc" 

■pHOSPHO" 

"site 

PDOC00005 

PS00005 

163- 

>166 

PKC' 

"PHOSPHO* 

"site 

PDOC00005 

PS00006 

60 

->64 

CK2" 

"PHOSPHO 

"site 

PDOC00006 

PS00006 

78 

->82 

CK2" 

PHOSPHO" 

'site 

PDOC00006 

psooooe 

9 

->15 

MYRISTYL 


PDOC00008 


(No Pfam data available for DKFZphfbr2_22i4.1) 
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DKrZphfbr2_22k3 


group: brain derived 

DKF2phfbr2_22k3 encodes a novel 538 amino acid protein with weak similarity to extensins. 

NO informative BLAST results; no predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes. 


weak similarity to extensins 

complete cDNA, complete cds, few EST hits 
CpG Island in 5* UTR complete cDNA 

Sequenced by A60WA 

Locus: unknown 

Insert length: 2775 bp 

Poly A stretch at pos. 2755, polyadenylation signal at pos. 2718 


1 GGGGCTGCCC GCGCGCTCCA CGGTGCAGAG CTCTAAGCGC GCGGGCTGGC 
51 AGGCTGCGGC GCGTCAAGGT CAGCCTGGAG CTGGGTGGCG GCCTGCCTGG 
101 GGGCGGGGGA CCCTACTGGA GGCCCGGGCT GGGGCCTCCC AGCGCCTCGG 
151 CCATATTGAA TAGCTTCGAC TGGACCGTCT TTGTCTGCGA AGTCCTGTCC 
201 CAAGTTCCAG CCGCGTCCCT GGGGCCTGGG GCAGGAAGAG TCGCTGGCAG 
251 CCCGCGCGCC CCAACTTGGA GCTGGGACAC CACGTTTCCA GCTTGGAGTG 
301 GGCCTTGAGC CTTGGGACTG ACCTCGCCCC CGGCTCACGT AGGCATCCTG 
351 GAAATTGATT CCCCCAAGTC CTTGGTGGGG GAGCCGGACT TGGTCAAGAC 
401 TGTACTTGTT GCAGGCGAAG AGATTGGAGG CGTTTGGCTC GTCCCTGGCT 
451 AGGGAGGTGA GACTCTCCGG TCAGCGTTGC TGGAACTCCC CCCATCCAGT 
501 CCCTCCCTCA AGACTAAGGG CTACAGTAGT TTGTTGGGGC TCATTGCCCC 
551 CTCACCCCAG ATATCACCCT GGAGATCTTA AAGACTCTCG AGAAAAGCCA 
601 CGTGGGGGGC TGGTTCCCCT GGGGCTTCCT GCCGTCCCCC GACTGCCTCA 
651 TTCTTTGGAG CGTCCCCGAT GTCTGCAAAG ATGTGGATTT GGACGTCCTC 
701 GTGGAAGCCC TAAAGCCCGT GGGGACATTT AAGAAGATCG GCAAGGTGTT 
751 CCGCAAGGAG GAGGACTCCA CGGTGGGGAT GCTGCAGATC GGGGAGGACG 
801 TCGACTATTT GCTCATCCCC CGGGAGGTCA GGCTGGCTGG GGGCGTCTGG 
851 AGAGTCATCT CTAAGCCCGC CACCAAGGAA GCAGAATTTC GGGAGCGGCT 
901 GACCCAGTTC CTGGAAGAAG AGGGCCGCAC CCTGGAGGAC GTGGCCCGCA 
951 TCATGGAGAA GAGCACCCCG CACCCGCCCC AGCCCCCCAA AAAGCCCAAG 
1001 GAGCCCCGAG TGAGGAGGAG AGTGCAGCAG ATGGTGACTC CTCCGCCCCG 
1051 GCTGGTCGTG GGCACGTACG ACAGCAGCAA CGCCAGCGAC AGCGAGTTCA 
1101 GCGACTTCGA GACCTCCAGA GACAAGAGCC GCCAGGGCCC GCGGCGGGGC 
1151 AAGAAGGTGC GCAAAATGCC CGTCAGCTAC CTGGGCAGCA AGTTCCTGGG 
1201 AAGCGACCTG GAGAGTGAGG ATGATGAGGA ACTGGTCGAG GCCTTCCTCC 
1251 GGCGACAGGA GAAGCAGCCC AGCGCGCCGC CTGCCCGCCG CCGCGTCAAC 
1301 CTGCCAGTGC CCATGTTTGA GGACAACCTG GGGCCTCAGC TGTCCAAAGC 
1351 GGACAGGTGG CGGGAGTATG TCAGCCAGGT GTCCTGGGGG AAGCTGAAGC 
14 01 GGAGGGTGAA GGGTTGGGCG CCGAGGGCGG GCCCCGGGGT GGGCGAGGCC 
1451 CGGCTGGCCT CCACCGCAGT GGAGAGCGCA GGGGTATCAT CGGCGCCAGA 
1501 GGGCACCAGC CCGGGGGATC GCTTGGGAAA CGCGGGAGAT GTTTGTGTGC 
1551 CCCAGGCTTC CCCTAGGCGA TGGAGGCCCA AGATCAACTG GGCCTCCTTT 
1601 CGGCGCCGCA GGAAGGAGCA GACAGCACCC ACAGGTCAGG GGGCAGACAT 
1651 CGAGGCTGAT CAGGGGGGAG AGGCTGCAGA TAGTCAAAGG GAAGAGGCCA 
1701 TAGCTGACCA GCGGGAAGGG GCTGCAGGTA ATCAGAGGGC TGGGGCCCCA 
1751 GCTGACCAGG GGGCAGAGGC TGCAGATAAT CAGAGGGAAG AGGCTGCAGA 
1801 TAATCAGAGG GCAGGGGCCC CAGCTGAGGA GGGGGCAGAG GCTGCAGATA 
1851 ACCAGAGGGA AGAGGCTGCA GATAATCAGA GGGCAGAGGC CCCAGCTGAC 
1901 CAGAGGTCAC AGGGCACAGA TAACCACAGG GAAGAGGCTG CAGATAATCA 
1951 GAGGGCGGAG GCCCCAGCTG ACCAGGGGTC AGAGGTTACA GATAATCAAA 
2001 GGGAAGAGGC CGTACATGAC CAGAGGGAAA GGGCCCCAGC TGTCCAGGGT 
2051 GCAGATAATC AGAGGGCACA GGCCCGGGCT GGCCAGAGGG CAGAGGCTGC 
2101 ACATAATCAG AGGGCAGGGG CCCCAGGTAT CCAGGAAGCT GAAGTCTCAG 
2151 CTGCCCAAGG GACCACAGGA ACAGCTCCAG GAGCCAGGGC CCGGAAACAG 
2201 GTCAAGACAG TGAGGTTCCA GACCCCTGGA CGCTTTTCGT GGTTTTGCAA 
2251 GCGCCGGAGA GCCTTCTGGC ACACTCCCCG GTTGCCAACC CTGCCCAAGA 
2301 GAGTCCCCAG GGCAGGAGAG GTCAGGAACC TCAGGGTGCT GAGGGCGGAG 
2351 GCCAGAGCAG AAGCTGAGCA GGGAGAGCAA GAAGACCAGC TGTGAGGTGA 
2401 GGGCTAGAGA CAGCCCACGG GCCCTCCCTC CAAGTGTGGG AGGGAGAGAT 
2451 GCTCTGCCTC TGAACTTCAA AGTGGAGGTG GAGTGCTGGC CACGTCTCCA 
2501 CCTAACAACC CTCTTTATTC TCTTGTTAAA GTTTTGTTCA TGCTTTGATT 
2551 TTTTTTTAAA TTTTTTAGAG ACAGGGTCTC ACTCTGTTGC CCAGGCTGGA 
2601 GTGCAGTGGC ATGATCATAA CTCACTGCAG CCTCAAACTT CTGGCCTCAA 
2651 GTGATCCTCC TGCCTCGGCC TCCCAAAATG CTGGGATTAC AGATGTGAGC 


162 


wo 01/12659 


PCT/IBOO/01496 


2701 CACCACACAC ACCATCTGAT TAAAAAAAAA AAATACTGAT TCCCTGTAGC 
2751 AACCCAAAAA AAAAAAAAAA AAAAA 


BLAST Results 


Entry HS164A7F from database EMBL : 

H. sapiens CpG island DNA genomic Msel fragment, clone 164a7, forward 
read cpgl64a7 . ftla . 
Score 740, P - 3.0e-25, identities - 150/151 


Medline entries 


No Medline entry 


Peptide information for frame 2 


ORF from 779 bp to 2392 bp; peptide length: 538 
Category: similarity to known protein 


1 MLQIGEDVDY LLIPREVRLA GGVWRVISKP ATKEAEFRER LTQFLEEEGR 

51 TLEDVARIME KSTPHPPQPP KKPKEPRVRR RVQQMVTPPP RLWGTYDSS 

101 NASDSEFSDF ETSRDKSRQG PRRGKKVRKM PVSYLGSKFL GSDLESEDDE 

151 ELVEAFLRRQ EKQPSAPPAR RRVNLPVPMF EDNLGPQLSK ADRWREYVSQ 

201 VSWGKLKRRV KGWAPRAGFG VGEARLASTfl VESAGVSSAP EGTSPGDRLG 

251 NAGDVCVPQA SPRRWRPKIN WASFRRRRKE QTAPTGQGAD lEADQGGEAA 

301 DSQREEAIAO QREGAAGNQR AGAPADQGAE AADNQREEAA DNQRAGAPAE 

351 EGAEAADNQR EEAADNQRAE APADQRSQGT DNHREEAADN QRAEAPADQG 

401 SEVTDNQREE AVHDQRERAP AVQGADNQRA QARAGQRAEA AHNQRAGAPG 

451 IQEAEVSAAQ GTTGTAPGAR ARKQVKTVRF QTPGRFSWFC KRRRAFWHTP 

501 RLPTLPKRVP RAGEVRNLRV LRAEARAEAE QGEOEDQL 

BLAST P hits 

Entry RNU67136_1 froin database TREMBL: 

"A- kinase anchoring protein AKAP150**; Rattus norvegicus 
A- kinase anchoring protein AKAP150 raRNA, complete cds. Rattus 
norvegicus {Norway rat) 
Length =714 

Score - 182 (64.1 bits). Expect « 1.2e-10, P = 1.2e-10 
Identities - 73/257 (28%), Positives = 104/257 (40%) 


Alert BLASTP hits for DKFZphfbr2_22k3, frame 2 

TREMBL :PFSANTY_1 product: "S-antigen"; Plasmodium falciparum KF1916 

S-antigen gene, complete cds., N = 1, Score = 178, P = 3.7e-ll 


>TREMBL:PFSANTY_1 product: "S-antigen"; Plasmodium falciparum KF1916 
S-antigen gene, complete cds. 
Length « 285 

HSPs: 

Score - 178 (26.7 bits). Expect = 3.7e-ll, P = 3.7e-ll 
Identities = 60/217 (27%), Positives « 97/217 (44%) . 

Query: 269 INWASFRRRRKEQTAPTGQGA-DIEADQGGEAADSQRE-EAIADQ— REGAAGNQRAGA 323 

+N + + . + E G+G D E E +D+ E E I Q E A N+ AG+ 

Sbjct: 47 LNGKNGKGNKYEDLQEEGEGENDDEEHSNSEESDNDEENEIIVGQDGSNEKAGSNEEAGS 106 

Query: 324 PADQGAEAADNQREEAADNQRAGAPAEEGA—EAADNQR EEAADNQRAEAPADQRS 377 

G+ E+A N++AG+ E G+ EA M+ EEA N++A + S 

Sbjct: 107 NEKAGSNEEAGSNEKAGSNEKAGSNEEAGSNEEAGSHEEAGSNEEAGSNEKAGSNEKAGS 166 

Query: 378 QGTDNHREEAADNQRAEAPADQGSEVTDNQREEAVHDQRERAPAVQGADNQRAQAR— AG 435 

EEA N++A + + GS E+A +++ + G+ N++A + AG 

Sbjct: 167 NEKAGSNEEAGSNEKAGSNEEAGSNEKAGSNEKAGSHEKAGSNEEAGS-NEKAGSNEEAG 225 

Query: 436 QRAEAAHNQRAGA— PGIQEAEVSAAQGTTGTA-PGA 469 
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EA AG+ G E + +G GT PG+ 

Sbjct: 226 SNEEAGSNEEAGSNEEAGSNEGSEAGTEGPKGTGGPGS 263 

Score = 173 (26.0 bits), Expect = 1.5e-10, P = l.5e-10 
Identities = 51/190 (26%), Positives = 83/190 (43%) 


Query: 279 KEQTAPTGQ-GADIEADQGGEAADSQREEAIADQREGAAGNQRAGAPADQGAEAADNQRE 337 

+-E GQ G++ +A EA ++-»- A £ A N++AG+ G+ E 
Sbjct: 83 EENEIIVGQDGSNEKAGSNEEAGSNEK AGSNEEAGSNEKAGSNEKAGSNEEAGSNE 138 

Query: 338 EAADNQRAGAPAEEGAEAADNQREEAADNQRAEAPADQRSQGTDNHREEAADNQRAEAPA 397 

EA N+ AG+ E G+ E+A N++A + + S EEA N++A + 

Sbjct: .139 EAGSNEEAGSNEEAGSNEKAGSNEKAGSNEKAGSNEEAGSNEKAGSNEEAGSNEKAGSNE 198 

Query: 398 DQGSEVTDNQREEAVHDQRERAPAVQGADNQRAQARAGQRAEAAHNQRAGAPGIQEAEVS 457 

GS EEA +++ + G++ + AG EA N+ AG+ EA 

Sbjct: 199 KAGSKEKAGSNEEAGSNEKAGSNEEAGSNEE AGSNEEAGSNEEAGSNEGSEAGTE 253 

Query: 458 AAQGTTGTAPG 4 68 

+GT G G 
Sbjct: 254 GPKGTGGPGSG 264 


Score = 147 (22.1 bits). Expect = 1.6e-07, p = 1.6e-07 
Identities « 40/168 (23%), Positives « 70/168 (41%) 


Query: 288 GADIEADQGGEAADSQR--EEAIADQREGAAGNQRAGAPADQGAEAADNQREEAADNQRA 345 

G++ EA +A +++ A E A N+ AG+ + G+ E+A N++A 

Sbjct: 111 GSNEEAGSNEKAGSNEKAGSNEEAGSNEEAGSNEEAGSNEEAGSNEKAGSNEKAGSNEKA 170 

Query: 346 GAPAEEGAEAADNQREEAADNQRAEAPADQRSQGTDNHREEAADNQRAEAPADQGSEVTD 405 

G+ E G+ EEA N++A + S EEA H++A + + GS 

Sbjct: 171 GSNEEAGSNEKAGSNEEAGSNEKAGSNEKAGSNEKAGSNEEAGSNEKAGSNEEAGSNEEA 230 

Query: 406 NQREEAVHDQR--ERAPAVQGADNQRAQARAGQRAEAAHNQRAGAPGI 451 

EEA ++ + G + + G E +HN++ I 

Sbjct: 231 GSKEEAGSNEEAGSNEGSEAGTEGPRGTGGPGSGGEHSHNKRKSKKSI 278 

Score = 101 (15.2 bits). Expect = 2.5e-02, P « 2.4e-02 
Identities = 26/100 (26%), Positives 47/100 (47%) 

Query: 281 QTAPTGQGADIEADQGGEAADSQREEAIADQREGAAGNQRAGAPADQGAEAADNQREEAA 340 

+ A + + A + G EEA ++++ G+ N++AG+ G+ E+A 

Sbjct: 162 EKAGSNEKAGSNEBAGSNEKAGSNEEAGSNEKAGS— NEKAGSNEKAGSNEEAGSNEKAG 219 

Query: 341 DKQRAGAPAEEGAEAADNQREEAADNQRAEAPADQRSQGT 380 

N+ AG+ E G+ EEA N+ +EA + +GT 

Sbjct: 220 SNEEAGSNEEAGSNEEAGSNEEAGSNEGSEA-GTEGPKGT 258 


Pedant information for DKrzphfbr2_22k3, frame 2 


Report for DKFZphfbr2_22k3.2 


[LENGTH] 
(MWJ 
(pl) 
[HOMOL] 


538 

59402. 
8.72 

TREMBL:AF037364 


.19 


1 gene: "MAI**; product: **paraneoplastic neuronal antigen MAI" 


Homo sapiens paraneoplastic neuronal antigen MAI (MAI) mRNA, complete cds. 4e-10 

[PROSITEJ 

CPROSITE] 
(PROSITEJ 
(PROSITEJ 
[PROSITEJ 
[KWJ 


AMIDATION 
MYRISTYL 
CK2_PHOSPH0 
PKC PHOSPHO' 


1 
12 
SITE 
SITE 


[KWJ 


ASN_GLYCOSYLATION 
AllAlpha 
LOW CC»<PLEXITY 


11 
6 
1 

18.03 % 


SEQ MLQIGEDVDYLLIPREVRLAGGVWRVISKPATKEAEFRERLTQFLEEEGRTLEDVARIME 

SEG 

PRD cccccccccccccccccccccceeeeeeecccchhhhhhhhhhhhhhhccchhhhhhhhh 

SEQ KSTPHPPQPPKKPKEPRVRRRVQQMVTPPPRLWGTYDSSNASDSEFSDFETSRDKSRQG 

SEG xxxxxxxxxxxxxxxxxxx 

PRD hcccccccccccccccchhhhhhhhhccccceeeeecccccccccccccccccccccccc 

SEQ PRRGKKVRKMPVSYLGSKFLGSDLESEDDEELVEAFLRRQEKQPSAPPARRRVNLPVPMF 

SEG xxxxxxxxxxx 

PRD ccccccccccceeeccccccccccccchhhhhhhhhhhhhhccccccchhhhhccccccc 
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SEQ EDNLGPQLSKADRWREYVSQVSWGKLKRRVKGWAPRAGPGVGEARLASTAVESAGVSSAP 

SEG 

PRD cccccccchhhhhhhhhheeeeccchhhhhhccccccccccchhhhhhhhhhhccc^ 

SEO EGTSPGDRLGNAGDVCVPQASPRRWRPKINWASFRRRRKEQTAPTGQGADIEADQGGEAA 

SEG 

PRD cccccccccccccceeeecccccccccccchhhhhhhhhhhhhcccccchh^ 


SEQ 
SEG 


DSQREEAIADQREGAAGNQRAGAPADQGAEAADNQREEAADNQRAGAPAEEGAEAADNQR 

xxxxxxxxxxxxx xxxxxxxxxxxx 

PRD *»l»hhhhhhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhccccchhhhhhhhhhh 

SEQ EEAADNQRAEAPADQRSQGTDNHREEAADNQRAEAPADQGSEVTDNQREEAVHDQRERAP 

PRD hhhhhhhhhhhhhhhhhcccchhhhhhhhhhhh^^ 

SEQ AVQGADNQRAQARAGQRAEAAHNQRAGAPGIQEAEVSAAQGTTGTAPGARARKQVKTVRF 

SEG xxxxxxxxxxxxxx xxxxxxxxxxxxxx 

PRD h^ccchhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhccccccccchhhhhhhhhhh 

SEG ^'^^^^^^"^^^^^^"'TP^^LPTLPKRVPRAGEVRNLRVLRAEARAEAEQGEQEDQL 
XXXXXXXXXXXXXX... 

PRD cccccceeehhhhhhhccccccccccccccccccchhhhhhhhhhhhhhhhhhhhccc 


Prosite for DKFZphfbr2_22k3 .2 


PSOOOOl 

PSO0OQ5 

PSO0OO5 

PS00005 

PS00005 

PS00005 

PSO0OO5 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00008 

PS00008 

PS00008 

PS00008 

PS00008 

PS00008 

PS00008 

PS00008 

PS00008 

PS00008 

PS00008 

PS00008 

PS00009 


101->105 
112->115 
261->264 
273->276 
302->305 
477->480 
499->502 
51->55 
103->107 
108->112 
112->116 
142->146 
146->150 
189->193 
229->233 
238->242 
244->248 
302->306 
95->101 
220->226 
242->248 
296->302 
314>>320 
317->323 
328->334 
352->358 
400->406 
450->456 
461->467 
464->470 
123->127 


ASN_GLYCOSYLATION 
PKC_PHOSPHO_SITE 
PKC^PHOSFHOSITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC_PHOS PHO_S I TE 
CK2_PHOSPHO_S ITE 
CK2_PH0S PHO_S ITE 
CK2_PHOS PHOS ITE 
CK2_PH0S PHO_S ITE 
CK2_PHOS PHO_S ITE 
CK2 PHOSPHO SITE 
CK2~PHOSPHO"SITE 
CK2_PHOSPHO_SITE 
CK2_PH0SPH0_SITE 
CK2_PH0SPH0 SITE 
CK2_PH0SPH0"'SITE 
MYRISTYL ~ 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

AMIDATION 


PDOCOOOOl 
PDOC00005 
PEKX00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 

pDcxooooe 

POOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00008 

PDOC00008 

PDCX:00008 

PDOC00008 

PDOC00008 

PDOC00008 

PDOC00008 

PDOC00008 

PDOC00008 

PDOC00008 

PDOC00008 

PDOC00008 

PDOC00009 


(No Pfam data available for DKF2phfbr2_22k3.2) 
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DKFZphfbr2_22k8 


group: brain derived 

DKFZphfbr2_22k8 encodes a novel 172 amino acid protein without similarity to known proteins. 

No informative BLAST results; no predictive prosite, pfam or SCOP raotife 

The new protein can find application in studying the expression profile of brain-specific 
genes . 


unknown 

complete cDNA, complete cds, EST hits 

Sequenced by AGOWA 
Locus: /map="7" 
Insert length: 2789 bp 

Poly A stretch at pos. 2769, polyadenylation signal at pos. 2756 


1 GGGGGAGCCA TGAGGCGCCA GCCTGCGAAG GTGGCGGCGC TGCTGCTCGG 
51 GCTGCTCTTG GAGTGCACAG AAGCCAAAAA GCATTGCTGG TATTTCGAAG 
101 GACTCTATCC AACCTATTAT ATATGCCGCT CCTACGAGGA CTGCTGTGGC 
151 TCCAGGTGCT GTGTGCGGGC CCTCTCCATA CAGAGGCTGT GGTACTTCTG 
201 GTTCCTTCTG ATGATGGGCG TGCTTTTCTG CTGCGGAGCC GGCTTCTTCA 
251 TCCGGAGGCG CATGTACCCC CCGCCGCTGA TCGAGGAGCC AGCCTTCAAT 
301 GTGTCCTACA CCAGGCAGCC CCCAAATCCC GGCCCAGGAG CCCAGCAGCC 
351 GGGGCCGCCC TATTACACTG ACCCAGGAGG ACCGGGGATG AACCCTGTCG 
401 GGAATTCCAC GGCAATGGCT TTCCAGGTCC CACCCAACTC ACCCCAGGGG 
451 AGTGTGGCCT GCCCGCCCCC TCCAGCCTAC TGCAACACGC CTCCGCCCCC 
501 GTACGAACAG GTAGTGAAGG CCAAGTAGTG GGGTGCCCAC GTGCAAGAGG 
551 AGAGACAGGA GAGGGCCTTT CCCTGGCCTT TCTGTCTTCG TTGATGTTCA 
501 CTTCCAGGAA CGGTCTCGTG GGCTGCTAAG GGCAGTTCCT CTGATATCCT 
651 CACAGCAAGC ACAGCTCTCT TTCAGGCTTT CCATGGAGTA CAATATATGA 
701 ACTCACACTT TGTCTCCTCT GTTGCTTCTG TTTCTGACGC AGTCTGTGCT 
751 CTCACATGGT AGTGTGGTGA CAGTCCCCGA GGGCTGACGT CCTTACGGTG 
801 GCGTGACCAG ATCTACAGGA GAGAGACTGA GAGGAAGAAG GCAGTGCTGG 
851 AGGTGCAGGT GGCATGTAGA GGGGCCAGGC CGAGCATCCC AGGCAAGCAT 
901 CCTTCTGCCC GGGTATTAAT AGGAAGCCCC ATGCCGGGCG GCTCAGCCGA 
951 TGAAGCAGCA GCCGACTGAG CTGAGCCCAG CAGGTCATCT GCTCCAGCCT 
1001 GTCCTCTCGT CAGCCTTCCT CTTCCAGAAG CTGTTGGAGA GACATTCAGG 
1051 AGAGAGCAAG CCCCTTGTCA TGTTTCTGTC TCTGTTCATA TCCTAAAGAT 
1101 AGACTTCTCC TGCACCGCCA GGGAAGGATA GCACGTGCAG CTCTCACCGC 
1151 AGGATGGGGC CTAGAATCAG GCTTGCCTTG GAGGCCTGAC AGTGATCTGA 
1201 CATCCACTAA GCAAATTTAT TTAAATTCAT GGGAAATCAC TTCCTGCCCC 
1251 AAACTGAGAC ATTGCATTTT GTGAGCTCTT GGTCTGATTT GGAGAAAGGA 
1301 CTGTTACCCA TTTTTTTGGT GTGTTTATGG AAGTGCATGT AGAGCGTCCT 
1351 GCCCTTTGAA ATCAGACTGG GTGTGTGTCT TCCCTGGACA TCACTGCCTC 
1401 TCCAGGGCAT TCTCAGGCCC GGGGGTCTCC TTCCCTCAGG CAGCTCCAGT 
1451 GGTGGGTTCT GAAGGGTGCT TTCAAAACGG GGCACATCTG GCCGGGAAGT 
1501 CACATGGACT CTTCCAGGGA GAGAGACCAG CTGAGGCGTC TCTCTCTGAG 
1551 GTTGTGTTGG GTCTAAGCGG GTGTGTGCTG GGCTCCAAGG AGGAGGAGCT 
1601 TGCTGGGAAA AGACAGGAGA AGTACTGACT CAACTGCACT GACCATGTTG 
1651 TCATAATTAG AATAAAGAAG AAGTGGTCGG AAATGCACAT TCCTGGATAG 
1701 GAATCACAGC TCACCCCAGG ATCTCACAGG TAGTCTCCTG AGTAGTTGAC 
1751 GGCTAGCGGG GAGCTAGTTC CGCCGCATAG TTATAGTGTT GATGTGTGAA 
1801 CGCTGACCTG TCCTGTGTGC TAAGAGCTAT GCAGCTTAGC TGAGGCGCCT 
1851 AGATTACTAG ATGTGCTGTA TCACGGGGAA TGAGGTGGGG GTGCTTATTT 
1901 TTTAATGAAC TAATCAGAGC CTCTTGAGAA ATTGTTACTC ATTGAACTGG 
1951 AGCATCAAGA CATCTCATGG AAGTGGATAC GGAGTGATTT GGTGTCCATG 
2001 CTTTTCACTC TGAGGACATT TAATCGGAGA ACCTCCTGGG GAATTTTGTG 
2051 GGAGACACTT GGGAACAAAA CAGACACCCT GGGAATGCAG TTGCAAGCAC 
2101 AGATGCTGCC ACCAGTGTCT CTGACCACCC TGGTGTGACT GCTGACTGCC 
2151 AGCGTGGTAC CTCCCATGCT GCAGGCCTCC ATCTAAATGA GACAACAAAG 
2201 CACAATGTTC ACTGTTTACA ACCAAGACAA CTGCGTGGGT CCAAACACTC 
2251 CTCTTCCTCC AGGTCATTTG TTTTGCATTT TTAATGTCTT TATTTTTTGT 
2301 AATGAAAAAG CACACTAAGC TGCCCCTGGA ATCGGGTGCA GCTGAATAGG 
2351 CACCCAAAAG TCCGTGACTA AATTCCGTTT GTCTTTTTGA TAGCAAATTA 
2401 TGTTAAGAGA CAGTGATGGC TAGGGCTCAA CAATTTTGTA TTCCCATGTT 
2451 TGTGTGAGAC AGAGTTTGTT TTCCCTTGAA CTTGGTTAGA ATTGTGCTAC 
2501 TGTGAACGCT GATCCTGCAT ATGGAAGTCC CACTTTGGTG ACATTTCCTG 
2551 GCCATTCTTG TTTCCATTGT GTGGATGGTG GGTTGTGCCC ACTTCCTGGA 
2601 GTGAGACAGC TCCTGGTGTG TAGAATTCCC GGAGCGTCCG TGGTTCAGAG 
2651 TAAACTTGAA GCAGATCTGT GCATGCTTTT CCTCTGCAGC AATTGGCTCG 
2701 TTTCTCTTTT TTGTTCTCTT TTGATAGGAT CCTGTTTCCT ATGTGTGCAA 
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2751 AATAAAAATA AATTTGGGCA AAAAAAA/UVA AAAAAAAAA 


BLAST Results 


Entry HS671255 from database EMBL: 
human STS SHGC-11828. 
Length = 400 
Minus Strand HSPs: 

Score = 1822 (273,4 bits), Expect « 4.8e-76, P = 4.8e-76 
Identities = 382/397 (96%), Positives = 382/397 (96%), 


Medline entries 


No Medline entry 


Peptide information for frame 1 


ORF from 10 bp to 525 bp; peptide length: 172 
Category: putative protein 
Classification: unset 


1 MRRQPAKVAA LLLGLLLECT EAKKHCWYFE GLYPTYYICR SYEDCCGSRC 
51 CVRALSIQRL WYFWFLLMMG VLFCCGAGFF IRRRMYPPPL lEEPAFNVSY 
101 TRQPPNPGPG AQQPGPPYYT DPGGPGMHPV GNSTAMAFQV PPNSPQGSVA 
151 CPPPPAYCNT PPPPYEQVVK AK 


BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_22k8, frame 1 
PIR:S14970 extensin class I (clone wl7-l) - tomato, N - 1, Score « 118, 


>PIR:S14970 extensin class I (clone wl7-l) - tomato 
Length - 132 

HSPs: 

Score - 118 (17.7 bits). Expect - 2.3e-07, p = 2.3e-07 
Identities 30/82 (36%), Positives = 35/82 (42%) 


Query; 
Sbjct: 
Query: 
Sbjct: 


87 PPPLIEEPAFNVSYTRQPPNPGPGAQQPGPPYYTDPGGPGMNPVGNSTAMAFQVPPNSPQ 14 6 
PPP P Y + PP P p P P YY P P +P + P SP 

32 PPPSPSPPP— PYYYKSPPPPSPSP— PPPYYYKSPPPPDPSPPPPYYYKSPPPPSPSPP 87 

147 GSVACPPPPAYCNTPPPP — YEQV 168 
PPPP Y + PPPP YE + 

88 PPSPSPPPPTYSSPPPPPPFYENI 111 


Score - 104 (15.6 bits). Expect = 6.9e-06, P = 6.9e-06 
Identities = 28/78 (35%), Positives » 34/78 (43%) 

Query: 
Sbjct: 
Query: 
Sbjct: 


87 PPPLIEEPAFNVSYTRQPPNPGPGAQQPGPPYYTDPGGPGMNPVGNSTAMAFQVPPNSPQ 146 
PP P + Y + PP P P P P YY P P +p ++ PP P 

1 PPSPSPPPPY— .YYKSPPPPSPSP— PPPYYYKSPPPPSPSP PPPYYYKSPP-PPS 51 


147 GSVACPPPPAYCNTPPPP 164 
S PPPP Y +PPPP 
52 PS PPPPYYYKSPPPP 66 


Score =102 (15.3 bits). Expect = l.le-05, P = l.le-05 ' 
Identities = 30/78 (38%), Positives = 33/78 (42%) 

Query: 87 PPPLIEEPAFNVSYTRQPPNPGPGAQQPGPPYYTDPGGPGMNPVGNSTAMAFQVPPNSPQ 14 6 

^ y + PP P P P P YY P P +P s + PP P 

Sbjct: 48 PPPSPSPPP— PYYYKSPPPPDPSP—PPPYYYKSPPPPSPSPPPPSPS PP-PPT 97 
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Query: 147 GSVACPPPPAYCNTPPPP 164 

S PPPP Y N P PP 
Sbjct: 98 YSSPPPPPPFYENIPLPP 115 

Score = 95 (14.3 bits). Expect = 2.4e-04, P « 2.4e-04 
Identities - 24/61 (39%), Positives - 29/61 (47%) 

Query: 104 PPNPGPGAQQPGPPYYTDPGGPGMNPVGNSTAMAFQVPPNSPQGSVACPPPPAYCNTPPP 163 

PP+P P P P YY P P -f-P ++ PP P S PPPP Y +PPP 
Sbjct: 1 PPSPSP PPPYYYKSPPPPSPSP PPPYYYKSPP-PPSPS PPPPYYYKSPPP 49 

Query: 164 P 164 
P 

Sbjct; 50 P 50 

Score = 68 (10.2 bits). Expect = 4.2e+00, P = 9.8e-01 
Identities * 24/69 (34%), Positives « 29/69 (42%) 

Query: 87 PPPLIEEPAFNVSYTRQPP NPGPGAQQPGPPYYTDPGGPGMNPVGNSTAMAFQVPPN 143 

PPP P Y PP +P P + P PP Y+ P P P + + PP 
Sbjct: 63 PPPPDPSPPPPyYYKSPPPPSPSPPPPSPSPPPPTYSSPPPPP--PFYENIPL PPV 116 

Query: 144 SPQGSVACPPPP 155 

S A PPPP 
Sbjct: 117 IGV-SYASPPPP 127 


Peptide information for frame 3 


ORF from 0 bp to 368 bp; peptide length: 123 
Category: questionable ORF 
Classification: unset 


1 GSHEAPACEG GGAAARAALG VHRSQKALLV FRRTLSNLLY MPLLRGLLWL 
51 QVLCAGPLHT EAVVLLVPSD DGRAFLLRSR LLHPEAHVPP AADRGASLQC 
101 VLHQAAPKSR PRSPAAGAAL LH 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_221c8, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_22Ic8 , frame 1 

Report for DKFZphfbr2_22lc8 .1 

( LENGTH! 172 

[MW] 19194.47 

[pU 8.77 

[KWJ SIGNAL PEPTIDE 23 

[KW] TRANSMEMBRANE 1 

(KWJ LOWCOMPLEXITY 27.33 % 

SEQ MRRQPAKVAALLLGLLLECTEAKKHCWYFEGLYPTYYICRSYEDCCGSRCCVRALSIQRL 
SEG xxxxxxx 

PRD ccchhhhhhhhhhhhhhhhhhhhhhcccccccccceeeeccccccccccchhhhhhhhhh 
MEM , _ 

SEQ WYFWFLLMMGVLFCCGAGFFIRRRMYPPPLIEEPAFNVSYTRQPPNPGPGAQQPGPPYYT 

SEG xxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhccccceeeeecccccccccccccceeeeccccccccccccccccccc 

MEM . MMMMMMMMMhlMMMMMMM 

SEQ DPGGPGMNPVGNSTAMAFQVPPNSPQGSVACPPPPAYCNTPPPPYEQVVKAK 

SEG xxxxxx xxxxxxxxxxxxxxxx 

PRD ccccccccccccccceeecccccccccccccccccccccccccccccccccc 

MEM 


(No Prosite data available for DKFZphfbr2_22Jt8 . 1) 


168 


wo 01/12659 


PCT/IBOO/01496 


(No Pfam data available for DKFZphfbr2_22k8 . 1) 

Pedant information for DKF2phfbr2_22k8, frame 3 
Report for DKr2phfbr2_22k8 .3 

[LENGTH) 122 
[MWl 12854.08 
[pl] 10.27 
(KWI All^Alpha 

fKW] LOW COMPLEXITY 25.41 « 


GSHEAPACEGGGAAARAALGVHRSQKALLVFRRTLSNLLYMPLLRGLLWLQVLCAGPLHT 
> • • * XXXXXXXXXXXXXXXX > 

ccccccccccchhhhhhhhccccchhhhhhhhhhhhhhhccccccchh^ 

EAWLLVPSDDGRAFLLRSRLLHPEAHVPPAADRGASLQCVLHQAAPKSRPRSPAAGAAL 

'''' xxxxxxxxxxxxxxx . 

cceeeeeccccchhhhhhhhccccccccccccccchhhhhhhhhccccccccchhhhhhc 


SEQ LH 
SEG 

PRO cc 


(No Prosite data available for DKFZphfbr2_22k8.3) 
(No Pfam data available for DKFZphfbr2_22k8 .3) 
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DKFZphfbr2_23blO 


group: nucleic acid managment 

DKFZphfbr2 2bl0 encodes a novel 580 amino acid protein with strong similarity to rat RNA 
helicase HEL117. 

HEL117 is a DEAO/H box helicase, which co-localises with a splicing factor and thus seems to 
be involved in splicing. 

The new protein can find application in modulation of splicing. 


strong similarity to rat RNA helicase HEH17 

complete cDNA, complete cds, EST hits 

Sequenced by AGOHA 

Locus: unknown 

Insert length: 2905 bp 

Poly A stretch at pos. 2885, no polyadenylation signal found 


1 GGGGGCTCCG CTCCGCACCA CCAACCCCGG GCCGCAGTCC TGACGAGCGG 
51 GTCAGGGCTT GTCGGGCGGA AGCCTGGCCT GGAGCCTGGA AGGGGGAGAC 
101 GGCCCGAGCG GGAGCGGGAG CGGACGCGGC CTCAGTCCTG CGCGGAATAT 
151 TGAAGGATGT TTGTTCCAAG ATCTCTAAAA ATCAAGAGGA ATGCTAATGA 
201 TGATGGCAAA AGTTGTGTGG CTAAGATAAT TAAACCAGAC CCAGAAGACC 
251 TTCAGTTGGA CAAAAGCAGA GATGTTCCCG TTGATGCTGT AGCTACAGAA 
301 GCAGCCACAA TAGACAGGCA CATCAGCGAA TCATGCCCTT TCCCCAGCCC 
351 AGGTGGCCAG TTGGCAGAGG TTCATTCAGT AAGTCCCGAG CAGGGTGCGA 
401 AGGACAGCCA TCCTTCTGAA GAGCCCGTTA AGTCATTTTC CAAAACACAG 
451 CGCTGGGCAG AACCAGGGGA ACCCATCTGT GTTGTCTGTG GTCGTTATGG 
501 AGAGTATATC TGTGATAAGA CAGATGAAGA TGTGTGTAGT TTGGAGTGTA 
551 AAGCGAAACA TCTTCTACAA GTTAAGGAAA AGGAAGAGAA ATCAAAACTC 
601 AGCAATCCAC AGAAGGCTGA TTCTGAGCCA GAGTCTCCAC TGAATGCTTC 
651 CTATGTCTAC AAAGAGCACC CCTTTATTTT GAACCTTCAG GAAGACCAGA 
701 TTGAAAATCT TAAACAGCAG CTGGGAATTT TAGTTCAAGG GCAAGAAGTC 
751 ACCAGGCCCA TTATTGACTT TGAACATTGT AGTCTCCCTG AGGTCTTAAA 
801 TCACAACTTG AAGAAATCAG GCTATGAGGT GCCAACTCCC ATTCAAATGC 
851 AGATGATTCC TGTGGGACTT CTGGGAAGAG ACATTCTGGC CAGTGCAGAT 
901 ACTGGCTCAG GAAAAACAGC TGCTTTTCTT CTTCCTGTTA TCATGCGAGC 
951 TTTATTCGAG AGCAAAACTC CATCTGCGCT CATTCTTACA CCAACCAGAG 
1001 AGTTAGCCAT TCAGATAGAG AGACAAGCTA AAGAATTGAT GAGTGGCCTG 
1051 CCACGCATGA AAACTGTGCT TCTTGTAGGG GGCTTACCCT TACCCCCACA 
1101 GCTTTATCGT CTGCAACAAC ATGTTAAGGT TATCATAGCA ACCCCTGGGC 
1151 GACTTCTGGA TATAATAAAG CAGAGCTCTG TAGAACTCTG TGGTGTAAAG 
1201 ATTGTGGTAG TAGATGAAGC TGATACCATG TTAAAGATGG GTTTTCAACA 
1251 ACAAGTGCTT GACATTTTGG AAAACATTCC TAATGATTGT CAGACCATTT 
1301 TGGTTTCAGC CACAATTCCA ACTAGCATAG AACAGCTAGC AAGCCAGCTT 
1351 CTGCATAATC CTGTGAGAAT TATCACTGGA GAAAAGAACC TACCTTGTGC 
1401 CAATGTACGT CAGATTATTT TGTGGGTAGA AGACCCAGCC AAAAAGAAAA 
1451 AATTATTTGA AATTTTAAAT GATAAGAAAC TCTTTAAGCC TCCAGTGTTA 
1501 GTATTTGTGG ACTGCAAACT AGGAGCAGAT CTTTTGAGTG AAGCCGTTCA 
1551 GAAAATCACA GGGCTGAAAA GCATATCTAT ACATTCGGAG AAGTCGCAAA 
1601 TAGAAAGGAA AAACATATTG AAGGGATTAC TTGAAGGAGA CTATGAAGTT 
1651 GTAGTGAGCA CAGGAGTCTT GGGACGAGGC CTAGACTTGA TCAGTGTCAG 
1701 GCTGGTTGTC AATTTTGATA TGCCTTCAAG TATGGATGAG TATGTCCATC 
1751 AGGAAAATAC CTACAAGTCT ACTTGGAGGA ATCCCCAGCA TTTTCAACAG 
1801 GATGTCAGAA TGACCTTGGG CTATGTTGGC AAAGCACAAT CGGAAGAAGA 
1851 CAACCAATTG AAGGTCAAAC TAGGCCTTAA AAAAAATTGT TCTTCCTAAA 
1901 TGAAACTTTA TGTAAGACCC AAGCTTCCTT TATGTAAAAA TAGGATACTC 
1951 ACTAGGCTTT GGGGCTGACA ATGGTTTTTA AATCTTGCTA ATCTTCCCTG 
2001 GAATGAAACC AGCATGACTC AAAGAGAAAA AGAGAGTCTA TAATATTTTC 
2051 TAATCCCTGA GTTCTTTTCT TTATATATTA AAAAGGATTA TTAGGCTGGG 
2101 TGTGGTGGCT CACGCCTGTA ATCCCAGCAC TTTGGGAGGC CGAGGGGAGT 
2151 GGATCACCTG AGTTCGAGAC CAGCCTAACC AACATGGAGA AACCCTGTCT 
2201 CTACTAAAAA TACAAAATTA GCCAGGCGTG GTGGCGCATG CCTGTAATCC 
2251 CAGCTACTCA GGAGGCTACA GCAGGAGAAT TGCTTGAACT CGGGAGGCAG 
2301 AGCCAAGATC GCACCACTGC ACTCCAGCCT GGGCAACAAG AGTGAAACTC 
2351 TGTCTCAAAA TAATATTAAT GATAATAATA ATAATAATAA TAGGGATTAC 
2401 TTGCATAATT GTTCTTTTAA AATTATTGGC AGTATTGCTG AATGTATTTA 
24 51 GATTTTTTCA CCAAGTGACA ACAACTGAAT TCATAAAGAT TCATCAACAA 
2501 GACCTGATAA AAAAAAATGT AAGCATATTA TAGTGGATAC TTCCAAGACT 
2551 CTTGGTCTAA CATGTATTAG A7VAGCAGAAG GAGCCCAGGC ACAGGGGCTC 
2601 CCGCCGGTAA TCCCAAAGCT TTGGGAAGCC AAGGCAGGTG GATCGCTTGA 
2651 GCTCAGGAGT TAGAGACCAG CCTGGGCAAC ATGGTGAAAT CCCGTCACCA 
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2701 CAAAAAAATG CAAAAATTAA CTGGGCGTGG TGGCATGCAC CTGTAGTCCC 

2751 AGCTACTCTG GAGGCTGAGG TGAGGGGAAT CACCTGAGCC GGGGGAATCA 

2801 CCTGAGCCCA GGGAAGTTGA GGCTGCTGTG AGCCATGGTC ATGACACTGC 

2851 CCTCCAGCCT GGACAACAGA TTGAGACCCT GTCTCAAAAA AAAAAAAAAA 
2901 AAAAA 


BLAST Results 


No BLAST result 


Medline entries 


Medline: 

A putative maiwnalian RNA helicase with an arginine-serine-rich 
domain 


Peptide information for frame 1 


ORF from 157 bp to 1896 bp; peptide length: 580 
Category; strong similarity to known protein 
Prosite motifs: ATPGTP A (247-255) 
LEOCINE^ZIPPER (298-320) 


1 MFVPRSLKIK RNANDDGKSC VAKIIKPDPE DLQLDKSRDV PVDAVATEAA 
51 TIDRHISESC PFPSPGGQLA EVHSVSPEQG AKDSHPSEEP VKSFSKTQRW 
101 AEPGEPICW CGRYGEYICD KTDEDVCSLE CKAKHLLQVK EKEEKSKLSN 
151 PQKADSEPES PLNASYVYKE HPFILNLQED QIENLKQQLG ILVQGQEVTR 
201 PIIDFEHCSL PEVLNHNLKK SGYEVPTPTQ MQMIPVGLLG RDILASADTG 
251 SGKTAAFLLP VIMRALFESK TPSALILTPT RELAIQIERQ AKELMSGLPR 
301 MKTVLLVGGL PLPPQLYRLQ QHVKVIIATP GRLLDIIKQS SVELCGVKIV 
351 VVDEADTMLK MGFQQQVLDI LENIPNDCQT ILVSATIPTS lEQLASQLLH 
401 NPVRIITGEK NLPCANVRQI ILWVEDPAKK KKLFEILNDK KLFKPPVLVF 
451 VDCKLGADLL SEAVQKITGL KSISIHSEKS QIERKNILKG LLEGDYEVVV 
501 STGVLGRGLD LISVRLVVNF DMPSSMDEYV HQENTYKSTW RNPQHFCXJDV 
551 RMTLGYVGKA QWEEDNQLKV KLGLKKNCSS 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_23bl0, frame 1 

PIR;A57514 RNA helicase HEL117 - rat, N = 2, Score - 615, P = 1.6e-60 

TREMBL:AB018344__1 gene: -KIAA0801-; product: "KIAA0801 protein"; Homo 
sapxens^mRNA for KIAA0801 protein, complete cds,, N - 1, Score = 615, P 

TREMBL:CEF01F1_1 gene: "F01F1.7"; Caenorhabditis elegans cosmid 
FOIFI., N 2, Score = 365, P = 1.9e-58 

TREMBL:AF083255_1 product: "RNA helicase-related protein"; Homo 
sapiens RNA helicase-related protein mRNA, complete cds., N - 2. Score 
« 556, P « 1.5e-57 ' 

PIR:S14048 RNA helicase dbp2 - fission yeast (Schizosaccharorayces 
pombe), N 1, Score = 591, P = 1.6e-57 


>PIR:A57514 RNA helicase HEL117 - rat 
Length = 1,032 

HSPs: 

Score » 615 (92.3 bits), Expect = 1.6e-60, Sum P(2) - 1.6e-60 
Identities «= 140/394 (35%), Positives = 236/394 (59%) 

Query: 144 eksklsnpqkadsepesplnasyvykehpfilnlqedqienlkqql-gilvqgqevtrpi 202 

KL P p ++ Y E P + + ++++ + ++ GI V+G+ +PI 

SbDCt: 313 KQRKLLEPVDHGKIEYEPFRKNF-YVEVPELAKMSQEEVNVFRLEMEGITVKGKGCPKPI 371 

Query: 203 IDFEHCSLPEVLNHNLKKSGYEVPTPIQMQMIPVGLLGRDILASADTGSGKTAAFLLPV- 261 


171 


wo 01/12659 


PCT/IB00/0J496 


+ C + + ++LKK GYE PTPIQ Q IP + GRD++ A TGSGKT ArLLP+ 
SbjCt: 372 KSWVQCGISMKILNSLKKHGYEKPTPIQTQAIPAIMSGRDLIGIAKTGSGKTIAFLLPMF 431 

Query; 262 — IM— RALFESKTPSALILTPTRELAIQIERQAKELMSGLPRMKTVLLVGGLPLPPQLY 317 

IM R+L E + P A+I+TPTRELA+QI ++ K+ L ++ V + GG + Q+ 
Sbjct: 432 RHIMDQRSLEEGEGPIAVIMTPTRELALQITKECKKFSKTLG-LRWCVYGGTGISEQIA 490 

Query: 318 RLQQHVKVIIATPGRLLDIIKQSS — VELCGVKIVWDEADTMLKMGFQQQVLDILENI 374 

L++ ++1+ TPGR++D++ +S L V VV+DEAD M MGF+ QV+ I++N+ 

Sbjct: 491 ELKRGAEIIVCTPGRMIDMLAANSGRVTNLRRVTYVVLDEADRMFDMGFEPQVMRIVDNV 550 

Query: 375 PNDCQTILVSATIPTSIEQLASQLLHNPVRIITGEKNLPCANVRQIILWVEDPAKKKKLF 434 

D QT++ SAT P ++E LA ++L P+ + G +++ C++V Q ++ +E+ K KL 
Sbjct: 551 RPDRQTVMFSATFPRAMEALARRILSKPIEVQVGGRSVVCSDVEQQVIVIEEEKKFLKLL 610 

Query: 435 EILNDKKLFKPPVLVFVDCKLGADLLSEAVQKITGLKSISIHSEKSQIERKNILKGLLEG 4 94 

E+L + V++FVD + AD L + + + + +S+H Q +R +1+ G 

Sbjct: 611 ELLGHYQE-SGSVIIFVDKQEHADGLLKDLMRAS-YPCMSLHGGIDQYDRDSIINDFKWG 668 

Query: 495 DYEVVVSTGVLGRGLDLISVRLVVNFDMPSSMDEYVHQ 532 

+++V+T V RGLD+ + LWN+ P+ ++YVH+ 
Sbjct: 669 TCKLLVATSVAARGLDVKHLILVVNYSCPNHYEDYVHR 706 

Score = 37 (5.6 bits). Expect = 1.6e-60, Sum P{2) = l-6e-60 
Identities = 13/36 (36%), Positives - 17/36 (47%) 

Query: 132 KAKHLLQVKEKEE— -KSKLSNPQKADSEPESPLNA 164 

KA++ + KEK E SK K D E E +A 

Sbjct: 113 KAENRSRSKEKAEGGDSSKEKKKDKDDKEDEKEKOA 148 


Pedant information for DKFZphfbr2_23bl0, frame 1 


Report for DKFZphfbr2_23bl0 . 1 


[LENGTH] 
[MW] 
fpll 
(HOMOL] 

[FUNCAT] 
IFUNCAT] 
[FUNCAT] 
[FUNCAT J 
[ FUNCAT 1 
YOR204W} 2e- 
[FUNCAT) 
[FUNCAT] 
(FUNCAT] 
[FUNCAT] 
influenzae, 
[FUNCAT] 
[FUNCAT] 
(FUNCAT) 
(FUNCAT] 
[FUNCAT] 
[FUNCAT] 
(FUNCAT] 
[FUNCAT] 
(BLOCKS] 
(BLOCKS] 
(BLOCKS] 
[BLOCKS] 
[BLOCKS] 
[PIRKW] 
(PIRKW) 
(PIRKW] 
(PIRKW) 
(PIRKW J 
[PIRKW] 
[PIRKW] 
[PIRKW] 
(PIRKW) 
[PIRKW] 
(PIRKW) 
(SUPFAMl 
[SUPFAM] 
[SUPFAMl 
[SUPFAMl 


580 

64572.24 
6.13 

TREMBL:CEF01F1 1 gene: 


"F01F1.7-; Caenorhabditis elegans cosinid FOIFI, 8e-61 


49 


30.10 nuclear organization [S, cerevisiae, YNLll2w] 2e-53 
04.01.04 rrna processing (S. cerevisiae, YNL112w] 2e-53 

04 .05. 03 mrna processing (splicing) [S. cerevisiae, YPL119c] 5e-53 

30.03 organization of cytoplasm (S. cerevisiae, YOR204w] 2e-49 

05.04 translation (initiation, elongation and termination) [S. cerevisiae, 

j mrna translation and ribosome biogenesis [H. influenzae, HI0231 RNA] 2e-4 6 


06.10 assembly of protein complexes 
04.99 other transcription activities 
1 genome replication, transcription, 
HI0892] 3e-35 

04.05.01.07 chromatin modification 

98 classification not yet clear-cut 
09.01 biogenesis of cell wall 
30.16 mitochondrial organization 

99 unclassified proteins [S 
r general function prediction 


(S. cerevisiae, YLLOOSw) 3e-43 
(S. cerevisiae, YDL160c] 4e-39 
recombination and repair [H. 


(S. cerevisiae, YMR290c) 6e-34 
(S. cerevisiae, YOR046c) 3e-32 
[S. cerevisiae, YJL033wl 8e-30 
(S. cerevisiae, YDR194c] 5e-23 
cerevisiae, YGL064c] le-16 

[M. jannaschii, MJ1401] 5e-ll 


11,10 cell death [S. cerevisiae, ymr190c] le-06 

03.19 recombination and dna repair [S. cerevisiae, YMR190c] le-06 
BL00115B Eukaryotic RNA polymerase II )ieptapeptide repeat proteins 
BL00039D DEAD-box subfamily ATP-dependent helicases proteins 
BL00039C DEAD-box subfamily ATP-dependent helicases proteins 
BL00039B DEAD-box subfamily ATP-dependent helicases proteins 
BL00039A DEAD-box subfamily ATP-dependent helicases proteins 
nucleus 6e-53 
RNA binding 9e-52 
DEAD box 2e-43 
transmembrane protein le-21 
DNA binding 5e-48 
ATP 4e-57 

purine nucleotide binding 2e-43 

P-loop 4e-57 

hydrolase 6e-42 

protein biosynthesis 2e-43 

ATP binding 2e-50 

WW repeat homology le-49 

translation initiation factor eIF-4A 2e-43 
DEAD/H box helicase homology 4e-57 
recQ helicase homology 8e-06 
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[SUPFAM] unassigned DEAD/H box helicases 4e-57 

[SUPFAMJ ATP-dependent RNA helicase DBPl 2e-53 

(SUPFAM} ATP-dependent RNA helicase DHHl 6e-40 

[SUPFAM] tobacco ATP-dependent RNA helicase DBIO le-49 

[SUPFAM) Bloom's syndrome helicase 8e-06 

[PROSITE] ATP_GTP A 1 

[PROSITE] LEUCINE"zIPPER 1 

[PROSITE] MYRISTYL 6 

[PROSITE] CK2_^PH0SPH0_SITE 8 

(PROSITEJ TYR PHOSPHO SITE 1 

( PROSITE! PKC^PHOSPHO SITE 7 

[PROSITEJ ASN^GLYCOSYLATION 1 

[PFAM] Helicases conserved C-terminal domain 

[PFAMJ DEAD and DEAH box helicases 

[KWJ Alpha_Beta 

JKW) LOW COMPLEXITY 3.10 % 

SEQ Mf^PRSLKIKRNANODGKSCVAKIXKPDPEDLQLDKSRDVPVDAVATEAATIDRHISESC 

SEG 

PRD ccccceeeeccccccccceeeeeeeeccccceeecccccccccchhhhhhhhhhhh^ 

SEQ PFPSPGGQLAEVHSVSPEQGAKDSHPSEEPVKSFSKTQRWAEPGEPICVVCGRYGEYICD 
SEG 

PRD cccccccceeeeccccccccccccccccccccccccccccccccccceeeeccccceeec 

SEQ KTDEDVCSLECKAKHLLQVKEKEEKSKLSNPQKADSEPESPLNASYVYKEHPFILNLQED 

PRD cccccccchhhhhhhhhhhhhhccccccccccccccccccccccceeecccc^^ 

SEQ 0IENLKQQLGILVQGQEVTRPIIDFEHCSLPEVLNHNLKKSGYEVPTPIQM<»1IPVGLLG 

SEG 

PRD h^^ihhhhhheeeeccccccccccccccccchhhhhhhhhhhcccccccccccc^ 

SEQ RDILASADTGSGKTAAFLLPVIMRALFESKTPSALILTPTRELAIQIERQAKELMSGLPR 
SEG ^ 

PRD cceeeeeccccccceeeehhhhhhhhcccccc4eeeecchhhhhhhh^ 

SEQ MKTVLLVGGLPLPPQLYRLQQHVKVIIATPGRLLDIIKQSSVELCGVKIWVDEADTMLK 
SEG . . .xxxxxxxxxxxxxxxxxx 

PRD eeeeeeecccccchhhhhhhhheeeeeeccccchhhhhhheeeee^eeeee^ 

SEQ MGFQQQVLDILENIPNDCQTILVSATIPTSIEQLASQLLHNPVRIITGEKNLPCANVRQI 

PRD cccchhhhhhhhhcccccceeeeecccchhhhhhhhhhhhceeeeeeecccccc^^ 

SEQ ILWVEDPAKKKKLFEILNDKKLFKPPVLVFVDCKLGADLLSEAVQKITGLKSISIHSEKS 
SEG 

PRD eeecccchhhhhhhhhhhhhccccceeeeeeecccchhhhhhhhhhhhccceeec^^ 

SEQ QIERKNILKGLLEGDYEVWSTGVLGRGLDLISVRLVVNFDMPSSMDEYVHQENTYKSTW 
SEG •••••■•••••.••«.....,... 

PRD hhhhhhhhhhhccccceeeeehhhhhhcccceeeeeeeeecccccccceeeecc^ 

SEQ RNPQHFQQDVRMTLGYVGKAQWEEDNQLKVKLGLKKNCSS 
SEG 

PRD ccccccchhhhhhhccccchhhhhhhhhhhhhhhcccccc 


PSOOOOl 

163- 

->167 

ASN^GLYCOSYLATION 

PS00005 


6->9 

PKC PHOSPHO SITE 

PS00005 

97->100 

PKC_PHOSPHO SITE 

PS00005 

251- 

->254 

PKC_PHOSPHO SITE 

PS00005 

477- 

->480 

PKC PHOSPHO SITE 

PS00005 

513- 

->516 

PKC PHOSPHO SITE 

PS00005 

535- 

->538 

PKC PHOSPHO SITE 

PS00005 

539- 

->542 

. PKC PHOSPHO SITE 

PS00006 

122- 

->126 

CK2 PHOSPHO SITE 

PS00006 

156->160 

CK2 PHOSPHO SITE 

PS00006 

209- 

->213 

CK2 PHOSPHO SITE 

PS00006 

221- 

•>225 

CK2 PHOSPHO SITE 

PS00006 

340- 

->344 

CK2 PHOSPHO SITE 

PS00006 

389- 

■>393 

CK2 PHOSPHO SITE 

PS00006 

480- 

•>484 

CK2 PHOSPHO SITE 

PS00006 

524- 

■>528 

CK2 PHOSPHO SITE 

PS00007 

489- 

•>497 

TYR PHOSPHO SITE 

PS00008 

66->72 

MYRISTYL 

PS00008 

80->86 

MYRISTYL 


Prosite for DKF2phfbr2_23blO. 1 

PDOCOOOOl 
PDOC00C05 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00008 
PDOC00008 
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PS00008 
PS00008 
PS00008 

psooood 

PS00017 
PS00029 


195->201 
250->256 
490->496 
573->579 
247->255 
298->320 


MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

ATP_GTP_A 

LEUCINE ZIPPER 


PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00017 
PDOC00029 


Pfain for OKF2phfbr2_23blO. 1 


HMM_NAME. DEAD and DEAH box helicases 

HMM ♦ gLpPWI LRn I y eMGFEkPTP IQQqAI Pi I LeGRDVMACAQTGSGKTAAF 

+LP+ + N+++ G+E PTPIQ+Q IP+ L GRD++A A TGSGKTAAF 
Query 209 SLPEVLNHNLKKSGYEVPTPIQMQMIPVGLLGRDILASADTGSGKTAAF 257 

HMM 1 1 PMLQHI DwdPWpqpPQd Pr ALI LA PTRELAMQI QEEcRkFg kHMng I R 
L+P++ + + + ++P ALI L+PTRELA+QI +++++++ + ++ ++ 
Query 258 LLPVIMRALFES--KTPS ALILTPTRELAIQIERQAKELMSGLPRMK 302 

HMM ImcIYGGtnMRdQMRniLeRGpPHIVIATPGRLIDHIERgtldLDrleMLV 
++++GG+++ +Q+ +L++ + ++IATPGRL+D+I++ ++ L ++++V 
Query 303 TVLLVGGLPLPPQLYRLQQHV-KVIIATPGRLLDIIKQSSVELCGVKIW 351 

HMM MDEADRMLDMGFIDQIRrlMrqlPMpwNRQTMMFSATMPdelqELARrEM 
DEAD ML MGF++Q+ +1+ IP + QT++ SAT+P +I++LA ++ 
Query 352 VDEADTMLKMGFQQQVLDILENIP— NDCQTILVSATIPTSIEQLASQLL 399 

HMM RNPIRInldMdElTtnEnlkQwYiyVerEMWKfdcLcrLIe* 

+NP+RI+ ++++L N++Q++ +VE + K +L+++++ 
Query 400 HNPVRIITGEKNLPCA-NVRQIILWVE-DPAKKKKLFEILN 438 


HMM_NAME 

HMM 

Query 

HMM 

Query 


Helicases conserved C-terminal domain 


458 


*EiieeWLknl , GlrvmYIHGdMpQeERdelMddFNnGEynVLIcTDVgg 
++L+E ++ G++ ++IH+ ++Q ER +G+Y V ++T V+G 

DLLSEAVQKITGLKSISIHSEKSQIERKNILKGLLEGDYEVWSTGVLG 


506 


RGIDIPdVNHVINYDMPWNPEqYIQRIGRTgRIG* 
RG+D+++V++V+N+DMP +++ Y++ + T + 
507 RGLDL I S VRL WN FDMPS SMDEYVH -QENT YKST 


539 
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DKFZphfbr2_23b21 


group: signal transduction 

neurocflcin^^*"^^"^ ^""""^^^ ^ """"^^ ^^^"^ protein which is nearly identical to bovine 


?n ^^^mS"""/-.^^ * ^^'^'^^^t"^ protein with three putative Ca (2^-) -binding domains (EF-hands) . 
aSrfn^? «;.nH S^^^T ^^^di""entlaUy expressed in the central nervous system, retina and 
guanylatl cyclase^ recoverin indicates involvement in Ca2+ dependent activation of 

The new protein can find application in modulating/blocking the guanylate cyclase-pathway. 

nearly identical to bovine neurocalcin 

complete cds complete cDNA 
EST hits 


Sequenced by AGOWA 

Locus: /map="574.6 cR from top of Chr8 linkage group- 
Insert length: 3300 bp 

Poly A stretch at pos, 3279, polyadenylation signal at pos. 3249 


1 GGGGAGAATC TGGTGGATGC TGGACCTTGC TGCTGCTGCT ACTGCTGTTT 
51 CC AGGGGCTG CAGAGCATGG ACTGTTAAAT CTTGCACTTC TTCTGAGTGA 
101 GCTGAATTCT TGCCGCCAGG ATGGGGAAAC AGAACAGCAA GCTGCGCCCG 
151 GAGGTCATGC AGGACTTGCT GGAAAGCACA GACTTTACAG AGCATGAGAT 
201 CCAGGAATGG TATAAAGGCT TCTTGAGAGA CTGCCCCAGT GGACATTTGT 
251 CAATGGAAGA GTTTAAGAAA ATATATGGGA ACTTTTTCCC TTATGGGGAT 
301 GCTTCCAAAT TTGCAGAGCA TGTCTTCCGC ACCTTCGATG CAAATGGAGA 
351 TGGGACAATA GACTTTAGAG AATTCATCAT CGCCTTGAGT GTAACTTCGA 
401 GGGGGAAGCT GGAGCAGAAG CTGAAATGGG CCTTCAGCAT GTACGACCTG 
451 GACGGAAATG GCTATATCAG CAAGGCAGAG ATGCTAGTGA TCGTGCAGGC 
501 AATCTATAAG ATGGTTTCCT CTGTAATGAA AATGCCTGAA GATGAGTCAA 
551 CCCCAGAGAA AAGAACAGAA AAGATCTTCC GCCAGATGGA CACCAATAGA 
601 GACGGAAAAC TCTCCCTGGA AGAGTTCATC CGAGGAGCCA AAAGCGACCC 
651 GTCCATTGTG CGCCTCCTGC AGTGCGACCC GAGCAGTGCC GGCCAGTTCT 
701 GAGCCCTGCG CCCACCAATC GAATTGTAGA GCTGCTTGTG TTCCCTTTTG 
751 ATTCTTCTTT TTAACAATTT tTTTTTTTTT TTGCCAAACA ATATCAATGG 
801 TGATGCCGTC CCCTGTGCGG TCTGATGCGC cttcctccgt gacgccttc^ 

851 gcctcttttg tcgtggatgc ttcgtgggaa tgcccagagc cccagtgtgc 
901 ttgtggagag catggacaga cttcgtggtg ttcattgttt gatgattttt 
951 aatcgttact attatttctt tttattctaa tgtctctgtt ctaaaacgta 

1001 AGACTCGGGG GTTGGGGCAA AAGAAGGGAA ACCCATCCAG TCCTGTGATT 
1051 CTATTGCAAG CTTCAAGGGG CTTTTGTTTG AAAGACAAAA CTCCCCACCT 
1101 GGGTCTGTTG TCACACGTGC CGTAGGGGTG ATGGATGGCA CCGGATGCTG 
1151 GATTCCCCAA GAACAAGTTA CCCTCTGGGG TGAGGCTATT CCAGCGAGCT 
1201 GGGACATTTC CCCATGGGGG CCCACTCCCC TCTCTTCCCC AGCAGGCTGT 
1251 AGTTTCTAAG CTGTGAACAT TTCAAGATAA ATTAACAGAG GAGAGGAAAA 
1301 AGATGGCTCA GCTATTTTTT CACAGGTTTA CACTAGTTGA GCTAATATGC 
1351 GTGTCTTTGG AAATTAAACA CAAATGGTAA catattccaa aaccagaccc 
14 01 atcttgttgc ctattgtgat aaaataaaaa gacggctgta tataacatat 
14 51 tgggtaatgc agaccaaatt aagtgttttg ccttgtttaa atgaaatgca 
1501 tgtttagtga gcactaatac aatcttattc cagaagactg tttttagtag 
1551 cttattgtga agtaagacaa ctataatgaa tgtctgtctt gtttggaagt 
1601 catatctgtc tttgcacaaa tgtaccaatc gacaagtata ttttatatat 
1651 tccataaaaa tacaaagtaa ccctgactag ggcccaactt taattttgaa 
1701 tgcatttcca gagtggccat gcctagaggg cagatgcaga gcaggtggta 

1751 GTGGGACAGG ACAATTGGAG CACAGGAATG TTAACATGTA TGACAGGGGA 
1801 CCAGTAGGGT GGTTTCCCTC TCAGGCCCAG CAGCCCATTG ACAGCATTAG 
1851 ACTGGCGGCA TGGTGCTTTT CTGAGCAGAT CAATACTCTG CAGACTCGAA 
1901 AAAACATCAC ATACATTCTT GGAACTTCCC AGTGGTTTAA TCTATGTGCA 
1951 TGGTTAGGGA GCCAGGCCTG GAATATTCAG TTTCCCTGCC CCTGTTAAAG 
2001 AATCAGAGGT TGGGCAGTCA TCAAATTCAT CATAAAGACA TGGGCAAGTG 
2051 TGTCTGTGGT TTCCAAGGCC CCCCTATGGA GAATCCAAAA gtattttcca 

2101 ttgccgtgct ctttgaatgc agacttctat ttccagaagt gacagcacaa 
2151 gtctgagttg ctgtttggtc tggtgacctc agacacacta atttgaattg 
2201 aaagctaaga gtaaaaattt gctggttaca ggcgagtcat actcttgcaa 
2251 gtagttagca aagggaggcc caaattctca aggttgttga tggggaactt 
2301 gccactaaga gaaggcagag aggtccctag tgggtatatt tgctgccaag 
2351 ccacttgcca aagaagagga accacagaaa gagagacatc atgaccagga 
2401 gaaaaatgtg actagacatg ctaacctcca ggtttttata tatgacttga 
2451 gtctgctgta attggcagca gaaatccaaa tttgtatggt AGACCAAAAA 
2501 gaaccaaatc catagggtga aattttgaga cctagactct gtaaaaataa 
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2551 TCCTAGTCTT CCTCCAGGGG TCAGTTCCTC ACAGTGGTTC TGTACCAAAA 
2601 CTTGCCAAAT TCCTCCATGG CCAAGTGTTA AAATCTGTGT TTGGAAAATA 
2651 GCGAATTAAC CTAAGACACA GAAGGCAGAC TGGGTGAGGA GACCTAGCAT 
2701 GCCCTATTGG CAGTGCTCAG GAGCTGCATC CCACTTTTCC CTGCTCTGAA 
2751 TCGAAGTCCT AGTTCCTTCC TTTGATTCTC CTTTGGTAGG TGGAATCAGT 
2801 TAATGTTTTG AGAAACCTGC CTGGGCTCTG CCCTTAGTCA TGACATC7CG 
2851 CTGAGCCAGA CCCACTCTGT TCCTTGGAAC CTAGAGCTGG AGTGAGGAGT 
2901 AGAGGTCTCC GGCTATTCCA GAAAGAAAAG TGAGCCACAT GCAGGCTGAT 
2951 GAATGCCGAC ACTTCCAGAA TGTATAGAAA TAGTCCCTGT CCTGGCCTGC 
3001 CACTGACCCT GTCTGTATTT TCTCGGAGGT TGTTTTTCTC CTTCTCCTTC 
3051 CCAGGAAGGT CTTTGTATGT CGAATCCAGT GCACTCAAGT TTGGCCAAGG 
3101 GACTCCACAG CACCCAGAGG ACTGCATGCC TCAAGGTTTA TGTCACTCCT 
3151 CTGCTGGGCT GTTCATTGTC ATTGCTGTGT TCAGGGACCT TTGGAAATAA 
3201 AACCTGTTCT GTCCCAAATA AAACCAGCCT GTGATGTTCA AGGGACTGGA 
3251 ATAAAGTGGC TTACGACCTG AAGGATTCTA AAAAAAAAAA AAAAAAAAAA 


BLAST Results 


Entry HS431350 from database EMBL: 
human STS WI-15914. 
Score = 1308, P = 3.1e-53, identities = 276/285 

Entry HSG19929 from database EMBL: 
human STS A002C26. 
Score = 926, P = 1.5e-35, identities = 186/187 

Entry AF052142 from database EMBL: 

Homo sapiens clone 24 665 roRNA sequence. 

Score = 7378, P = O.Oe+00, identities - 1482/1487 

3* UTR 


Medline entries 


93247712: 

Neurocalcin family: a novel calcium-binding protein abundant in bovine 
central nervous 
system. 

94045365: 

Distinct regional localization of neurocalcin, a Ca (2+) -binding 
protein, in the bovine adrenal gland. 

96407688: 

Crystallization and preliminary X-ray crystallographic studies of 
reccnnbinant bovine 

neurocalcin delta. 

96066284: 

Distribution pattern of three neural calcium-binding proteins (NCS-1, 
VI LIP and recover in) 

in chicken, bovine and rat retina. 


Peptide information for frame 1 


ORF from 121 bp to 699 bp; peptide length: 193 
Category: strong similarity to known protein 
Prosite motifs: EF_HAND {73-86) 
EF_HAND il09-122) 
EF_HAND (157-170) 


1 MGKQNSKLRP EVMQDLLEST DFTEHEIQEW YKGFLRDCPS GHLSMEEFKK 
51 lYGNFFPYGD ASKFAEHVFR TFDANGDGTI DFREFHALS VTSRGKLEQK 
101 LKWAFSMYDL DGNGYISKAE MLVIVQAIYK MVSSVMKKPE DESTPEKRTE 
151 KIFRQMDTNR DGKLSLEEFI RGAKSDPSIV RLLQCDPSSA GQF 

BLASTP hits 

Entry JH06ie from database PIR: 
neurocalcin (clone pCalN) - bovine 
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Score = 1001, P * 5.2e-101, identities = 192/193, positives = 192/193 
Entry GGU91630_1 from database TREMBL: 

product: "neurocalcin"; Gallus gallus neurocalcin mRNA, complete cds 
Score = 998, p = l.le-100, identities = 191/193, positives « 192/193 

Entry NECD_B0VIN from database SWISSPROT: 
NEUROCALCIN DELTA. 

Score = 996, P ^ 1.8e-100, identities = 191/192, positives = 191/192 

Entry S47565 from database PIR: 
BDR-1 protein - human 

Score « 934, P - 6.6e-94, identities = 174/193, positives • 187/193 

Entry 150676 from database PIR: 

gene Rem-1 protein - chicken >TREMBL:GGREM1_1 gene: "Rem-1"; G. gallus 
rem-1 mRNA 

Score = 933, P = 8.4e-94, identities = 174/193, positives = 186/193 


Alert BLASTP hits for DKFZphfbr2_23b21, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_23b21, frame 1 


Report for DKF2phfbr2_23b21,l 


[LENGTH J 

[MW) 

Ipl) 

[HOMOLl 

[FOHCATJ 

[FUNCATI 

fFUNCATJ 


{FUNG AT) 

(FUNCAT] 

[FUNG AT] 

[FUNCAT] 

(FUNCATI 

(FUNCATI 

(FUNCATI 

0.001 

(FUNCATI 

[FUNCATI 

[BLOCKS I 

[SCOPl 

[SCOP J 

(SCOP I 

(SCOPl 

(SCOP I 

(SCOPl 

(SCOP) 

ESCOP) 

(SCOP) 

(SCOPl 

(SCOPl 

(SCOPl 

(SCOP) 

[SCOPl 

I SCOP) 

( SCOP] 

(SCOP) 

(SCOPl 

(ECl 

[PIRKW] 

(PIRKWI 

(PIRKWI 

(PIRKW) 

(PIRKW) 

[PIRKW) 

(PIRKW] 

(PIRKW) 

(PIRKWI 

(PIRKWI 

(PIRKWI 

(PIRKWI 


(S. 


193 

22215.30 
5.35 

PIR:JH0616 neurocalcin (clone pCalN) - bovine le-109 

98 classification not yet clear-cut (S. cerevisiae, yDR373w] 3e-54 

30.03 organization of cytoplasm (S. cerevisiae, YKLi90w] 2e-18 
03.07 pheromone response, mating-type determination, sex-specific proteins 

cerevisiae, YKL190w) 2e-18 

03.01 cell growth (S. cerevisiae, YKLl90w) 2e-18 

13.04 homeostasis of other ions (S. cerevisiae, YKL190w] 2e-18 
04.05.01.04 transcriptional control [S. cerevisiae, YKL190wl 2e-18 
30.04 organization of cytoskeleton (S. cerevisiae, YBR109cl 0.001 
08.19 cellular import (S. cerevisiae, YBR109cl 0.001 

03.22 cell cycle control and mitosis (S. cerevisiae, YBRlOSc] 0.001 

03.04 budding, cell polarity and filament formation (S. cerevisiae, YBR109cl 


(S, cerevisiae, YBR109cl 0.001 
cerevisiae, YBR109c] O.OOl 


10.02.99 other morphogenetic activities 
30.05 organization of centrosome (S 
BL00018 

dlrec 1.34.1.5-18 Recoverin (bovine (Bos taurus) 8e-55 

dljsa 1.34.1.5.17 Recoverin (human (Homo sapiens) 6e-58 

dltcob_ 1.34.1.5-16 Calcineurin regulatory subunit (B-chain le-06 
d2inysc_ 1.34.1.5.15 Myosin Regulatory Chain (chicken (Gallu 2e-29 
dlscmc_ 1.34.1.5.14 Myosin Regulatory Chain (bay scallo 5e-33 
d2mysb_ 1.34.1.5.13 Myosin Essential Chain (chicken (Gallu 4e-26 
dlscmb_ 1,34.1.5.12 Myosin Essential Chain (bay scallo 6e-27 

dlclm 1.34.1.5.11 Calmodulin (Paramecium tetraurelia le-15 

d4cln 1.34.1.5.10 Calmodulin (Drosophila melanogaster 2e-16 

fi^ctc 1.34.1.5.9 Calmodulin (African frog (Xenopus laevis) 2e-16 

<*lahr 1.34.1-5.8 Calmodulin (chicken gallus gallus 4e-16 

<^3cln 1.34.1.5-7 Calmodulin (rat (Rattus rattus) 2e-16 

dltrcb_ 1.34.1.5.6 Calmodulin [bovine (Bos taurus) 8e-08 

dlcll 1.34.1.5.5 Calmodulin (human (Homo sapiens) 2e-16 

dlrtpl_ 1.34.1,4.5 Parvalbumin (rat (Rattus rattus) 8e-06 

^5tnc 1.34.1,5.2 Troponin C (turkey (Meleagris gallopavo 

dlpvaa_ 1.34.1.4.3 Parvalbumin (pike (Esox lucius) 6e-06 

<*ltnp 1.34.1.5.1 Troponin C (chicken (Callus gallus) 9e-ll 

2,7.1.107 Diacylglycerol kinase 2e-08 
blocked amino end le-100 
phosphotransferase 2e-0B 
duplication 4e-17 
tandem repeat 7e-06 
heterodimer 4e-17 
heart 6e-09 
zinc 2e-08 

serine/threonine-specific protein kinase le-06 
muscle contraction le-08 
acetylated amino end 4e-09 
ATP 2e-08 

skeletal muscle 6e-09 


3e-13 
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[PIRKWJ signal transduction le-91 

[PIRKWJ protein kinase 2e-08 

t PIRKWJ calcium binding le-100 

( PIRKWJ alternative splicing 2e-13 

[PIRKWJ methylated amino acid le-09 

(PIRKWJ thin filaments le-08 

(PIRKWJ lipoprotein le-101 

[PIRKW] cardiac muscle 6e-09 

(PIRKWJ muscle 6e-09 

(PIRKWJ myristylation le-100 

(PIRKWJ EF hand le-IOl 

[PIRKWJ retina 2e-51 

[SUPFAM] calcium-dependent protein kinase 2e-08 

[SUPFAM] unassigned calroodulin-related proteins 8e-41 

[SUPFAM] spec-related protein LpSl 7e-06 

[SUPFAMJ calmodulin repeat homology le-101 

(SUPFAMJ human diacyiglycerol kinase 2e-08 

[SUPFAM] protein kinase C zinc-binding repeat homology 2e-08 

(SUPFAM] protein kinase homology 2e-08 

[SUPFAM] calmodulin le-101 

(PROSITEl EF_HAND3 

(PROSITEJ CK2_PH0SPH0_SITE 7 

(PROSITEJ PKCPHOSPHO^SITE 3 

(PFAM) EF hand 

[KWJ All Alpha 

[KWJ 30 " 


SEQ MGKQNSKLRPEVMQDLLESTDFTEHEIQEWYKGFLRDCPSGHLSMEEFKKIYGNFFPYGD 

1 r ec- HHHHHHHHHTTTTCCCHHHHHHHHHHHHHHTTTTEEEHHHHHHHHHHHTTTTC 

SEQ ASKFAEHVFRTFDANGDGTIDFREFIIALSVTSRGKLEQKLKWAFSMYDLDGNGYISKAE 

Irec- HHHHHHHHHHHH CEEEHHHHHHHHHHHHCCCGGGHHHHHHHHHTTTTCCCEEHHH 

SEQ MLVIVQAIYKMVSSVMKMPEDESTPEKRTEKIFRQMDTNRDGKLSLEEFIRGAKSDPSIV 

1 rec- HHHHHHHHHHCCTTGGGCTTTTTCHHHHHHHHHHHHCCTTTTEECHHHHHHHHHHCHHHH 

SEQ RLLQCOPSSAGQF 

Irec- HHHCCCH 


Prosite for DKFZphfbr2_23b21 . 1 


PS00005 

psoaoo5 

PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS0O0O6 
PS00006 
PS00006 
PS00018 
PS00018 
PS00018 


92->95 
149->152 
158->161 

23->27 

44->48 
106->110 
117->121 
143->147 
158->162 
165->169 

73->86 
109->122 
157->170 


PKC_PHOSPHO 

PKC_PHOSPHO" 

PKC_PHOSPHO" 

CK2_PH0SPH0" 

CK2_PHOSPHO'' 

CK2_PH0SPHO" 

CK2_PH0SPH0'" 

CK2_PHOSPHO~ 

CK2_PH0SPH0" 

CK2_PH0SPH0'^ 

EF_HAND 

EF_HAND 

EF HAND 


SITE 

'site 

SITE 
SITE 
SITE 
SITE 

SITE 

"site 

SITE 
SITE 


PDOC00005 
PDOC00005 
PDOC00005 
POOC00006 
PDOC00006 
PDOC00006 
PDOC00006 

PDcx:oooo6 
p[xx:oooo6 

PDOC00006 
PDOCOOOIB 
PDOC00018 
PDOC00018 


Pfam for DKFZphfbr2_23b21 . 1 


HMM_N7\ME EF hand 

HMM *MFrmMDkDGDGyIDFEEFmeMMkem* 

+FR +D +GDG+IDF EF+ +++ 
Query 68 VFRTFDANGDGTIDFREFIIALSVT 92 

30.75 100 128 1 29 dJcfzphfbr2_23b2l.l nearly identical to bovine neurocalcin 

Alignment to HMM consensus : 
Query *EIqEMFrmMDkDGDGyI DFEEFmeMMkem* 

++++F+M+D DG+GYI++ E++++++++ 

dkfzphfbr2 100 KLKWAFSMYDLDGNGYISKAEMLVIVQAI 128 

Q^^fy I'^e 1 29 dkf2phfbr2_23b21.1 nearly identical to bovine neurocalcin 

Alignment to HMM consensus: 
HMM *EIqEMFrroMDkDGDGyIDFEEFmeMMkem* 

+++FR MD'l-4-^DG-l>^+ EEF++ K+ 
Query 148 RTEKIFRQMDTNRDGKLSLEEFIRGAKSD 176 
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DKFZphfbr2_23f2 


group: brain derived 

DKFZphfbr2_23f2 encodes a novel 182 amino acid protein with weak similarity to S. pombe 
Vps29p. 

No informative BLAST results; no predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes. 


. similarity to Vp529p 

complete cDNA, complete cds, EST hits 

S.cerevisiae and S. pombe Vps29p are involved in vacuolar protein 
sorting 

part of the cDNA is encoded by HSAC2350, splice pattern 4 exons 
Sequenced by AGOWA 
Locus: /raap=''12q24" 
Insert length: 1016 bp 

Poly A stretch at pos. 996, polyadenylation signal at pos. 974 

1 GAATGGGGAG GAGCCAGAGG AAGAGGGCGG CGACGGTGGT GGTGACTGAG 
51 CGGAGCCCGG TGACAGGATG TTGGTGTTGG TATTAGGAGA TCTGCACATC 
101 CCACACCGGT GCAACAGTTT GCCAGCTAAA TTCAAAAAAC TCCTGGTGCC 
151 AGGAAAAATT CAGCACATTC TCTGCACAGG AAACCTTTGC ACCAAAGAGA 
201 GTTATGACTA CCTCAAGACT CTGGCTGGTG ATGTTCATAT TGTGAGAGGA 
251 GACTTCGATG AGAATCTGAA TTATCCAGAA CAGAAAGTTG TGACTGTTGG 
301 ACAGTTCAAA ATTGGTCTGA TCCATGGACA TCAAGTTATT CCATGGGGAG 
351 ATATGGCCAG CTTAGCCCTG TTGCAGAGGC AATTTGATGT GGACATTCTT 
401 ATCTCGGGAC ACACACACAA ATCTGAAGCA TTTGAGCATG AAAATAAATT 
451 CTACATTAAT CCAGGTTCTG CCACTGGGGC ATATAATGCC TTGGAAACAA 
501 ACATTATTCC ATCATTTGTG TTGATGGATA TCCAGGCTTC TACAGTGGTC 
551 ACCTATGTGT ATCAGCTAAT TGGAGATGAT GTGAAAGTAG AACGAATCGA 
601 ATACAAAAAA CCTTAAAGCC AGGCCTGTCT TGATGATTTT TGGTTTTTTT 
651 TCATTGTCCT GTTGAAATCA AGTAATTAAA CATTTAAGAG CCACAAAATT 
701 GTATCACTTT TATAATATTT TGCAGTAAAA TATAATACCA TCTTCTCTGT 
751 TAATACATAA TTGCTCCAAG CTTCCTGTAA ACTATAAGAA TATATTTAGT 
801 TTACAGTATA TGGATTCTAT GAAAAAATGT CCACAACACA GTAATTGGTC 
851 ACTTGTTAAG AAAAATTTAT CCTTGTAAGT ATCTTCAAAG TTGATATTTG 
901 GAACTTTATT CCAAAAGTAG TGCATGTGGA GAAAGAATCT AGACTTTCTT 
951 GTATACATTT TTCTCTTCTC CAGTAATAAA CAATTACCTT TCATTGAAAA 
1001 AAAAAAAAAA AAAAAA 


BLAST Results 


Entry HSAC2350 from database EMBLNEW: 

Homo sapiens 12q24 PAC P424M6 Length « 167,217 


Medline entries 


No Medline entry 


Peptide information for frame 2 


ORF from 68 bp to 613 bp; peptide length: 182 
Category: similarity to known protein 
Prosite motifs: RGD (60-63) 


1 MLVLVLGDLH IPHRCNSLPA KFKKLLVPGK IQHILCTGNL CTKESYDYLK 
51 TLAGDVHIVR GDFDENLNYP EQKWTVGQF KIGLIHGHQV IPWGDMASLA 
101 LLQRQFDVDI LISGHTHKSE AFEHENKFYI NPGSATGAYN ALETNIIPSF 
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151 VLMDIQASTV VTYVyQLIGD DVKVERIEYK KP 

BLASTP hits 

Entry CEZK1128_6 from database TREMBL: 
••ZK1128.1"; Caenorhabditis elegans cosmid ZK1128 
Length = 523 

Score = 400 (140.8 bits). Expect = 2.3e-37, P = 2.3e-37 
Identities = 81/150 (54%), Positives = 106/150 (70%) 

Entry S46793 from database PIR: 

hypothetical protein YHR012c - yeast (Saccharomyces cerevisiae) 
Length = 282 

Score = 180 (63.4 bits). Expect = 3.7e-37, Sum P(3) = 3.7e-37 
Identities « 35/71 (49%), Positives 44/71 (61%) 

Entry AB011824_1 from database TREMBL: 
"Vps29"; Schizosaccharomyces pombe mRNA for Vps29, 
partial cds. Schizosaccharomyces pombe (fission yeast) 
Length = 176 

Score = 189 (66.5 bits), Expect » 2.7e-27, Sum P(2) •» 2.7e-27 
Identities = 33/72 (45%), Positives = 50/72 (69%) 


Alert BLASTP hits for DKFZphf br2_23f 2, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphf br2_23f 2, frame 2 

Report for DKFZphf br2_23f2 . 2 


(LENGTH) 
(MW) 
(PIJ 
[HOMOL] 

(FUNCAT] 

le-27 

[FUNCAT J 

( FUNCAT] 

le-27 

(FUNCATl 

(FUNCAT J 

[FUNCAT J 

(BLOCKS) 

[BLOCKS) 

( PROSITE] 

(PROSITE) 

[PROSITE] 

IKW) 


182 

20445.84 
6.29 

TREMBL:CEZK1128_6 gene; -ZK1128.8"; Caenorhabditis elegans cosmid ZK1128 2e-51 
06.04 protein targeting, sorting and translocation (S. cerevisiae, yHR012w) 

08.13 vacuolar transport (S. cerevisiae, YHR012wl le-27 

08.07 vesicular transport (golgi network, etc.) (S. cerevisiae, YHR012w] 

30.08 organization of golgi (S. cerevisiae, YHR012wJ le-27 


09.25 vacuolar and lysosomal biogenesis 
r general function prediction [m. 
BL01269D 
BL01269A 
RGD 1 
MYRISTYL 4 
PKC_PHOSPHO_SITE 1 
Alpha_Beta " 


(S- cerevisiae, YHR012wJ le-27 
jannaschii, MJ0623] le-16 


SEQ MLVLVLGDLHIPHRCNSLPAKFKKLLVPGKIQHILCTGNLCTKESYDYLKTLAGDVHIVR 

PRD ccceeecccccccccccchhhhhhhhhhcceeeeeecccccchhhhhhhhhhhhceeeee 

SEQ GDFDENLNYPEQKWTVGQFKIGLIHGHQVIPWGDMASLALLQRQFDVDILISGHTHKSE 

PRD cccccccccccceeeeeccceeeeecccccccccchhhhhhhhhhhcceeeeeccccccc 

SEQ AFEHENKFYINPGSATGAYNALETNIIPSFVLMDIQASTVVTYVYQLIGDDVKVERIEYK 

PRD ccccccccccccccccccccccccccccceeececcccceeeeeeeecccceeeeeeeec 

SEQ KP 

PRD cc 


PS00005 
PS00008 
PS00008 
PS00008 
PS00008 
PS00016 


Prosite for DXFZphfbr2_23f 2 . 2 


116-M19 
38->44 
83->89 
133->139 
137->143 
60->63 


PKC_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

RGD 


PDOC00005 
PDOC00008 
PDOC00008 

PDOCOOOOB 
PDOC00008 
PDOC00016 


(No Pfam data available for DKFZphf br 2_23f2 . 2) 
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DKFZphfbr2_23124 


group: intracellular transport and trafficking 

DKF2phfbr2_23124.2 encodes a novel 348 amino acid protein with similarity to human 
glycoprotein 9p36b and canine VIP36 glycoprotein. 

The vesicular protein Vrp36 (36 kDa vesicular integral membrane protein) shows homology to 
leguminous plant lectins. The protein is localized to the Golgi apparatus, endosomal and 
vesicular structures and the plasma membrane. VIP36 binds to sugar residues of 
glycosphingolipids and/or glycosylphosphatidyl-inositol anchors and might provide a link 
between the extracellular/lurainal face of glycolipid rafts and the cytoplasmic protein 
segregation machinery. Gp36 is located within the endoplasmatic reticulum. For the novel 
protein, a lectin character is predicted. Due to the intracellular localisation of the homolog 
proteins, it should be involved in intracellular transport and trafficking. 

The new protein can find application in modulating/blocking intracellular transport and 
trafficking. 


strong similarity to human GP36b glycoprotein 

complete cDNA, complete cds, EST hits 

potential start at Bp 29 matches kozak consensua ANNatgG 
similarity to lectins. 

Sequenced by AGOHA 

Locus: /map=*'2" 

Insert length: 2416 bp 

Poly A stretch at pos. 2394, no polyadenylation signal found 


1 GGGGGATGAA GGGTCGTTGG TGGGAAAGAT GGCGGCGACT CTGGGACCCC 
51 TTGGGTCGTG GCAGCAGTGG CGGCGATGTT TGTCGGCTCG GGATGGGTCC 
101 AGGATGTTAC TCCTTCTTCT TTTGTTGGGG TCTGGGCAGG GGCCACAGCA 
151 AGTCGGGGCG GGTCAAACGT TCGAGTACTT GAAACGGGAG CACTCGCTGT 
201 CGAAGCCCTA CCAGGGTGTG GGCACAGGCA GTTCCTCACT GTGGAATCTG 
251 ATGGGCAATG CCATGGTGAT GACCCAGTAT ATCCGCCTTA CCCCAGATAT 
301 GCAAAGTAAA CAGGGTGCCT TGTGGAACCG GGTGCCATGT TTCCTGAGAG 
351 ACTGGGAGTT GCAGGTGCAC TTCAAAATCC ATGGACAAGG AAAGAAGAAT 
401 CTGCATGGGG ATGGCTTGGC 7VATCTGGTAC ACAAAGGATC GGATGCAGCC 
451 AGGGCCTGTG TTTGGAAACA TGGACAAATT TGTGGGGCTG GGAGTATTTG 
501 TAGACACCTA CCCCAAPGAG GAGAAGCAGC AAGAGCGGGT ATTCCCCTAC 
551 ATCTCAGCCA TGGTGAACAA CGGCTCCCTC AGCTATGATC ATGAGCGGGA 
601 TGGGCGGCCT ACAGAGCTGG GAGGCTGCAC AGCCATTGTC CGCAATCTTC 
651 ATTACGACAC CTTCCTGGTG ATTCGCTACG TCAAGAGGCA TTTGACGATA 
701 ATGATGGATA TTGATGGCAA GCATGAGTGG AGGGACTGCA TTGAAGTGCC 
751 CGGAGTCCGC CTGCCCCGCG GCTACTACTT CGGCACCTCC TCCATCACTG 
801 GGGATCTCTC AGATAATCAT GATGTCATTT CCTTGAAGTT GTTTGAACTG 
851 ACAGTGGAGA GAACCCCAGA AGAGGAAAAG CTCCATCGAG ATGTGTTCTT 
901 GCCCTCAGTG GACAATATGA AGCTGCCTGA GATGACAGCT CCACTGCCGC 
951 CCCTGAGTGG CCTGGCCCTC TTCCTCATCG TCTTTTTCTC CCTGGTGTTT 
1001 TCTGTATTTG CCATAGTCAT TGGTATCATA CTCTACAACA AATGGCAGGA 
1051 ACAGAGCCGA AAGCGCTTCT ACTGAGCCCT CCTGCTGCCA CCACTTTTGT 
1101 GACTGTCACC CATGAGGTAT GGAAGGAGCG GGCACTGGCC TGAGCATGCA 
1151 GCCTGGAGAG TGTTCTTGTC TCTAGCAGCT GGTTGGGGAC TATATTCTGT 
1201 CACTGGAGTT TTGAATGCAG GGACCCCGCA TTCCCATGGT TGTGCATGGG 
1251 GACATCTAAC TCTGGTCTGG GAAGCCACCC ACCCCAGGGC AATGCTGCTG 
1301 TGATGTGCCT TTCCCTGCAG TCCTTCCATG TGGGAGCAGA GGTGTGAAGA 
1351 GAATTTACGT GGTTGTGATG CCAAAATCAC GGAACAGAAT TTCATAGCCC 
1401 AGGCTGCCGT GTTGTTTGAC TCAGAAGGCC CTTCTACTTC AGTTTTGAAT 
1451 CCACAAAGAA TTAAAAACTG GTAACACCAC AGGCTTTCTG ACCATCCATT 
1501 CGTTGGGTTT TGCATTTGAC CCAACCCTCT GCCTACCTGA GGAGCTTTCT 
1551 TTGGAAACCA GGATGGAAAC TTCTTCCCTG CCTTACCTTC CTTTCACTCC 
1601 ATTCATTGTC CTCTCTGTGT GCAACCTGAG CTGGGAAAGG CATTTGGATG 
1651 CCTCTCTGTT GGGGCCTGGG GCTGCAGAAC ACACCTGCGT TTCGCTGGCC 
1701 TTCATTAGGT GGCCCTAGGG AGATGGCTTT CTGCTTTGGA TCACTGTTCC 
1751 CTAGCATGGG TCTTGGGTCT ATTGGCATGT CCATGGCCTT CCCAATCAAG 
1801 TCTCTTCAGG CCCTCAGTGA AGTTTGGCTA AAGGTTGGTG TAAAAATCAA 
1851 GAGAAGCCTG GAAGACACCA TGGATGCCAT GGATTAGCTG TGCAACTGAC 
1901 CAGCTCCAGG TTTGATCAAA CCAAAAGCAA CATTTGTCAT GTGGTCTGAC 
1951 CATGTGGAGA TGTTTCTGGA CTTGCTAGAG CCTGCTTAGC TGCATGTTTT 
2001 GTAGTTACGA TTTTTGGAAT CCCTCTTTGA GTGCTGAAAG TGTAAGGAAG 
2051 CTTTCTTCTT ACACCTTGGG CTTGGATATT GCCCAGAGAA GAAATTTGGC 
2101 TTTTTTTTCT TAATGGACAA GGGACAGTTG CTGTTCTCAT GTTCCAAGTC 
2151 TGAGAQCAAC AGACCCTCAT CATCTGTGCC TGGAAGAGTT CACTGTCATT 
2201 GAGCAGCACA GCCTGAGTGC TGGCCTCTGT CAACCCTTAT TCCACTGCCT 
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2251 TATTTGACAA GGGGTTACAT GCTGCTCACC TTACTGCCCT GGGATTAAAT 
2301 CAGTTACAGG CCAGAGTCTC CTTGGAGGGC CTGGAACTCT GAGTCCTCCT 
2351 ATGAACCTCT GTAGCCTAAA TGAAATTCTT AAAATCACCG ATGGAACCAA 
2401 AAAAAAAAAA AAAAAA 


BLAST Results 


Entry HS622145 from database EMBL: 
human STS WI-674 5. 
Score = 1079, P = 5.1e-43, identities = 219/223 

Entry G42541 from database EMBLNEW: 

SHGC-58649 Human Homo sapiens STS genomic, sequence tagged site. 
Score = 1091, P := l,7e-43, identities « 219/220 


Medline entries 


94265253: 

A putative novel class of animal lectins in the secretory pathway 
homologous to leguminous 
lectins. 

94208543: 

VIP36, a novel component of glycolipid rafts and exocytic carrier 
vesicles in epithelial cells. 


Peptide information for frame 2 


ORF from 29 bp to 1072 bp; peptide length: 348 
Category: strong similarity to known protein 


1 MAATLGPLGS WQQWRRCLSA R[)GSRMLLLL LLLGSGQGPQ QVGAGQTFEY 
51 LKREHSLSKP YQGVGTGSSS LWNLMGNAMV MTQYIRLTPD MQSKQGALWN 
101 RVPCFLRDWE LQVHFKIHGO GKKNLHGDGL AIWYTKDRMQ PGPVFGNMDK 
151 FVGLGVFVDT YPNEEKOQER VFPYISAMVN NGSLSYDHER DGRPTELGGC 
201 TAIVRNLHYD TFLVIRYVKR HLTIMMDIDG KHEWRDCIEV PGVRLPRGYY 
251 FGTSSITGDL SDNHDVISLK LFELTVERTP EEEKLHROVF LPSVDNMKLP 
301 EMTAPLPPLS GLALFLIVFF SLVFSVFAIV IGIILYNKWQ EQSRKRFY 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_23124, frame 2 

PIR:G01447 GP36b glycoprotein - human, N = 1, Score = 1001. P = 
5.9e-101 


SWISSPROT:VP36_CANFA VESICULAR INTEGRAL-MEMBRANE PROTEIN VIP36 
PRECURSOR (VIP36)., N = 1, Score - 990, P = 8,6e-100 

TREMBL:CET04G9_2 gene: '•T04G9.3"; Caenorhabditis elegans cosmid 
T04G9., N » 1, Score « 614, P = 6e-60 

PIR:S42626 ER-golgi intermediate compartment protein - human, N « 2. 
Score = 397, P = le-42 


>PIR:G01447 GP36b glycoprotein - human 
Length = 356 

HSPs: 


Score = 1001 (150.2 bits). Expect = 5.9e-101, P - 5.9e-101 
Identities « 197/356 (55%), Positives = 256/356 (71%) 

Query: l MAATLGPLGSWQQWRRCLSARDG SRMLLLLLLLGSGQGPQQVGAGQTFEYLK 52 

MAA G + W RRCL R G + L LLLLLGS + G + E+LK 

SbDCt: 1 MAAE-GWIWRWGWGRRCLG-RPGLLGPGPGPTTPLFLLLLLGSVTA— DITDGNS-EHLK 55 
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Query: 53 REHSLSKPYQGVGTGSSSLWNLMGNAMVMTQYIRLTPDMQSKQGALWNRVPCFLRDWELQ 112 

REHSL KPYQGVG+ S LW+ G+ M+ +OY+RLTPD +SK+G++WN PCFL+DWE+ 
SbjCt: 56 REHSLIKPYQGVGSSSMPLWDFQGSTMLTSQYVRLTPDERSKEGSIWNHQPCFLKDWEMH 115 

Query: 113 VHFKIHGQGKKNLHGDGLAIWYTKDRMQPGPVFGNMDKFVGLGVFVDTYPNEEKQQERVF 172 

VHFK+HG GKKNLHGDG+A+WYT+DR+ PGPVFG+ D F GL +F+DTYPN+E ERVF 
Sbjct: 116 VHFKVHGTGKKNLHGDGIALWYTRDRLVPGPVFGSKDNFHGLAIFLDTYPNDETT-ERVF 174 

Query: 173 PYISAMVNNGSLSYDHERDGRPTELGGCTAIVRNLHYDTFLVIRYVKRHLTIMMDIDGKH 232 

PYIS MVNNGSLSYDH +DGR TEL GCTA RN +DTFL +RY + LT+M D++ K+ 
Sbjct: 175 PYISVMVNNGSLSYDHSKDGRWTELAGCTADFRNRDHDTFLAVRYSRGRLTVMTDLEDKN 234 

Query: 233 EWRDCIEVPGVRLPRGYYFGTSSITGDLSDNHDVISLKLFELTVERTPEEEKLHRDVFLP 292 

EW++CI++ GVRLP GYYFG S+ TGDLSDNHD+IS+KLF+L VE TP+EE + P 
Sbjct: 235 EWKNCIDITGVRLPTGYYFGASAGTGDLSDNHDIISMKLFQLMVEHTPDEESIDWTKIEP 294 

Query: 293 SVDNMKLPEMTAPLP PLSGLALFLIVFFSLVFSVFAIVIGIILYNKWQEQSRK 345 

SV+ +K P+ P PL+G +FL++ +L+ V V+G +++ K QE++ K 

Sbjct: 295 SVNFLKSPKDNVDDPTGNFRSGPLTGWRVFLLLLCALLGIVVCAWGAWFQKRQERM-K 353 

Query: 346 RFY 348 
RFY 

Sbjct: 354 RFY 356 


Pedant information for DKFZphfbr2_23124, frame 2 


Report for DKFZphf br2_23124 . 2 


[LENGTH] 348 

(MWI 39711.10 

[plj 8.55 

[HOMOLl PIR:G01447 GP36b glycoprotein - human le-101 

fPiRKWl lectin 2e-37 

[PIRKWl transmembrane protein 2e-37 

tPIRKWJ endoplasmic reticulum 2e-37 

{PIRKWJ Golgi apparatus 2e-37 

(PROSITEl AMIDATION 1 

tPROSITE] MYRISTYL 5 

(PROSITE) CK2 PHOSPHOSITE 2 

(PROSITEl GLYCOSAMINOGLYCAN 1 

(PROSITE} PKC_PH0SPHO_SITE 3 

I PROSITE] ASN_GLYC0SYLATI0N 1 

[KW] Alpha Beta 

[KW) SIGNAL_PEPTIDE 39 

{KW] L0W_C(»1PLEXITY 7.76 % 


SEQ MAATLGPLGSWQQWRRCLSARDGSRWLLLLLLLGSGQGPQQVGAGQTFEYLKREHSLSKP 

SEG xxxxxxx 

PRD ccccccccccccccccccccccchhhhhhhhhhhcccccccccccchhhhhhhhhhhccc 

SEQ YQGVGTGSSSLWNLMGNAMVMTQYI RLTPDMQSKQGALWNRVPCFLRDWELQVHFKIHGQ 

SEG 

PRD cccccccccceeecccccccccceeeeccchhhhhcccccccccchhhhhhhheeeeecc 

SEQ GKKNLHGDGLAIWYTKDRMQPGPVFGNMDKFVGLGVFVDTYPNEEKQQERVFPYISAMVN 

SEG 

PRD ccccccccceeeeeecccccccccccccccccceeeeeecccccccccccccceeeeeec 

SEQ ngslsydherdgrptelggctaivrnlhydtflviryvkrhltimmdidgkhewrdciev 

SEG 

PRD ccccccccccccccccccccccccccccccceeeehhhhhhheeeeeccccccccccccc 

SEQ pgvrlprgyyfgtssitgdlsdnhdvislklfeltvertpeeeklhrdvflpsvdnmklp 

SEG 

PRD cccccccccccccccccccccccchhhhhhhhhhhhhccccccccccccccccccccccc 

SEQ emtaplpplsglalflivffslvfsvfaivigiilynkwqeqsrkrfy 

SEG XXXXXXXXXXXXXXXXXXXX 

PRD cccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccc 


Prosite for DKFZphf br2_23124 .2 

PS 00 001 181->185 ASN_GLYC0SYLATI0N PDOCOOOOl 
PS00002 35->39 GLYCOSAMINOGLYCAN PDOC00002 

PS00005 19->22 PKC_PHOSPH0_SITE PDOC00005 
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PS00005 

268->271 

PKC PHOSPHO 

SITE 

PDOC00005 

PS00005 

343->346 

PKC PHOSPHO" 

'site 

PDOC00005 

PS00006 

19->23 

CK2_PH0SPH0" 

"site 

PDOC00006 

PS00006 

279->283 

CK2 PHOSPHO" 

"site 

PDOCO0006 

PS00008 

43->49 

MYRISTYL 


PDOC00008 

PS00008 

63->69 

MYRISTYL 


PDOC00008 

PS00008 

6S->71 

MYRISTYL 


PDOCO0008 

PS00008 

96->102 

MYRISTYL 


PDOC00008 

PS00008 

198->204 

MYRISTYL 


PDOC00008 

PS00009 

120->124 

AMIDATION 


PDOC00009 


<No Pfam data available for DKFZphfbr2 23124.2) 


184 


wo 01/12659 


PCT/IBOO/01496 


DKFZphfbr2_23nl6 


group: signal transduction 

DKF2phfbr2_23nl6. 1 encodes a novel 292 amino acid protein with weak similarity to putative 
phosphatidylinositol-4-phosphate 5-kinase of Arabidopsis thaliana. 

The novel proteins contains a WW domain which has been originally described as a short 
conserved region in a number of unrelated proteins, among them dystrophin, the gene 
responsible for Duchenne muscular dystrophy. The domain, which spans about 35 residues, is 
repeated up to 4 tiroes in some proteins. It has been shown to bind proteins with particular 
proline-motifs, {AP] -P-P- [AP] -Y, and thus resembles somewhat SH3 domains. This domain is 
frequently associated with other domains typical for proteins in signal transduction 
processes. Examples of proteins containing the WW domain are Dystrophin, Utrophin, vertebrate 
YAP protein (binds the SH3 domain of the Yes oncoprotein), murine NEDD-4 (embryonic 
development and differentiation of the central nervous system) , IQGAP (human GTPase activating 
protein acting on ras) . Therefore the new protein should be involved in intracellular signal 
transduction. 

The new protein can find application in modulating/blocking intracellular signal transduction 
pathways . 


similarity to putative phosphatidylinositol-4-phosphate 5-kinase 
complete cDNA, complete cds, EST hits 
Sequenced by AGOWA 
Locus : unknown 

Insert length: 2936 bp 

Poly A stretch at pos. 2916, polyadenylation signal at pos. 2873 


1 GGGGGCGCTC CCGAGAAAGA GTGAGGGCGC GACGCGCACC AACGGTGGAG 
51 GGATGTTTCA GCAGCCCCTG A6AAGGAAGA GGAGGAAGCT GAGGGCCCGC 
101 TGAGGGCGCA GGACCTGAGG GAGTCCTACA TCCAGCTCGT CCAGGGTGTG 
151 CAGGAGTGGC AGGATGGTTG CATGTACCAG GGGGAGTTTG GGTTGAACAT 
201 GAAGCTTGGA TATGGCAAAT TCTCTTGGCC CACAGGCGAG TCATACCATG 
251 GGCAGTTTTA CCGGGACCAC TGCCATGGCC TGGGTACCTA CATGTGGCCA 
301 GATGGCTCCA GTTTCACGGG CACATTTTAC CTCAGCCACC GAGAAGGCTA 
351 CGGCACCATG TACATGAAGA CACGGCTTTT CCAGACTCAC TGCCACAACG 
401 ACATTGTCAA CCTTCTCCTG GACTGTGGGG CCGACGTGAA CAAGTGCTCA 
4 51 GATGAGGGTC TCACGGCACT CAGCATGTGT TTCCTCCTCC ACTACCCCGC 
501 CCAGTCCTTC AAGCCCAATG TTGCTGAACG GACCATACCT GAGCCCCAGG 
551 AACCTCCAAA ATTCCCAGTT GTTCCAATCC TTTCATCATC ATTTATGGAC 
601 ACAAACCTGG AGTCTCTGTA CTATGAGGTG AACGTGCCTT CCCAGGGTAG 
651 CTATGAGGTG AGGCCACCGC CAGCACCACT GCTCCTGCCA CGCGTCTCAG 
701 GCAGCCACGA GGGCGGCCAC TTCCAGGACA CCGGGCAGTG TGGGGGGTCC 
751 ATAGACCACA GGAGCAGCTC TCTGAAGGGG GACTCCCCGT TGGTGAAGGG 
801 CAGCCTTGGC CATGTGGAAA GCGGGCTTGA GGACGTGTTG GGAGACACAG 
851 ACCGGGGCAG TCTGTGCAGT GCTGAGACGA AATTTGAGTC CAACTTGTGT 
901 GTGTGCGACT TCTCCATCGA GCTCTCGCAG GCCATGCTGG AGAGAAGCGC 
951 CCAGTCCCAC AGCTTGCTGA AGATGGCCTC GCCCTCACCG TGCACCAGCA 
1001 GCTTCGACAA AGGGACCATG CGGAGGATGG CGCTGTCCAT GATCGAGTAG 
1051 GTCCTGGCAC CAGCTGGTGG GGGTGGAGGG CCACCATCAG GGCTGAATCC 
1101 TATGCTCAGC AGACCCACGT CTCTTCCCTG TGCCAGTGGG AGGCGTTGTG 
1151 TCTGGAGATG TGTGTCTGAA TGTGTGAGCA TCCCTGTGTC GGTGGCTCCA 
1201 TGCCATGGCC AGCCCTGTGG GGGTGCCACG GTGACGGGCT GTTTTCAGTG 
1251 CCACCCCAGC CCTGTGGGGG TGCCACGGTG ACGGGCTGTT TTCAGTACCA 
1301 CGCCAGCCCT GCTTTGGCCT TTGGCACTGG CCTGAAGTGT CTCTGTGGGA 
1351 GCCTCAGCAG GGGCCACTGT CAGGGGTCCT ATCCTAGCCA TAGTGCACGT 
1401 GAGTGACACC TGCCTGGGCA GCTCTCACAC CCCTGCTGTC CACCCTGTCT 
1451 ATACCAGTGT GTCTCAAAAT GTGGTCTATG CACCCCCGGG GGTCCAAGAC 
1501 CCTTTCAGGG AGTCTGTGGG GTCAAAATGA TTCTCTTGAT AACCCTGAGA 
1551 CTCTGTTAGC CTTCTCCTTG TGTTGATGTT GGTGGATGGT ATGAAGACAG 
1601 GGCCGTGCAG ACCACCAGCC CCCAGCGTGC AGGGCAGCAG TGCCCGGCCT 
1651 GCTTGGGGGC ATGGTATTCC TTCACCACGG TGTGCACTTG CGGGGATGCC 
1701 TGTCTCACTG AAGAATGCCT TTGACTAAGC AGAAAAGCAA TGACT^TTG 
1751 CATTAAATCT TGCTCCTTGC GTACACACCC CTCGAATATT CTGGGTCGGA 
1801 AAACATG(3GA AGGACACTGA TGTGTGTCTG CCACAGACCA AGGCACACCG 
1851 CTTCCCCGCA AGAAGCGCTT CCCCCAGGGC CAGAGTAGCA ACAGAATGCG 
1901 GCATCTTCCC AACCTCCTGC CCCATTTTTG ATTGGAAGAA TGACCACTGG 
1951 TATGTGGCTG TTCATTCTCC TGAACACAGC CTGCCACTTT AAGGAAAACA 
2001 TATGACACTA TTTGTTGCTG GCGAAATTTA CATTTTCAAG TGAATAGCAG 
2051 AATTCTGGAC ACTTGCCACC ACCACCAAAA CCTTCATAGC TTCCCTTAAC 
2101 TTTGAGACAT GGGTGTTCAG AGGTTTTTCA CGTGAGATGG CGTTAGCAGC 
2151 GCAGTTTTGT GATACTGCCT GAAGACATGC CGACAGTGCC CAGATCTCTT 
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2201 CTATTGGTGA GCCAGCTTTT CCCACACGGC CAAGTTCTGA TGTTGAACCA 
2251 TTGCCAGGTG GGTGAAGATC CATTGACAGT GAGAGGTGGG CCCGTGGGCT 
2301 TCAGTGCAGC CAGGCGCAGA AGGCTGGTTC ATGAGTGTCC AGCTCCGCCA 
2351 GGTAGCTAGC TCACCACCCC CAGCCTGGGT TCATGTAGTT CAAATAGGAA 
2401 GACCACGATG ATCAGAAAGG CTGCTCAAAT ACTCCTTCGT CCAGCCGCGT 
2451 ACCTGGGGGA GGCTGAATCT CCACTCACTT CCACCAAGGC TGTGCAGAGC 
2501 AGATAGGGGA ATCCAGCAAA GGTGGAAAAC AGTGCCATCC TTCTCCCCAA 
2551 CTGGTTTTGT TTTGTAAAAT AACTTTTTGT GACAGTGTTA CTTATTAGTA 
2601 ACATGCAGTG GGTTTGTTAT GGTTAACAAG TTGGTGAGCA TTATTGAGAG 
2551 GTGAAGCCAG CTGAGCTTCT GGGTTGGGTG GGGACTTGGA GAACTTTTGT 
2701 GTCTAGCTAA AGGATTGTAA ATGCACCAAT CAATGCTCAG TGTCTAGCTA 
2751 AAGGATTGTA AATGCACCAA TCAGCACTCT GTAAAATTGA CCAATCAGCG 
2801 TTCTGTAAAA TGGACCAATC AGTGGTCTGT AAAATGGACC AGTCAGCAGG 
2851 ATGTGGGCGG GGCCAAAAAA GGGAATAAAA GCTGGCCACC GCCAGGCTCC 
2901 CCACCAGCCT GCAGCGAAAA AAAAAAAAAA AAAAAA 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 1 


ORF from 172 bp to 1047 bp; peptide length: 292 
Category: similarity to unknown protein 
Prosite motifs: WW_D0MAIN_1 (19-24) 


1 MYQGEFGLNM KLGYGKFSWP TGESYHGQFY RDHCHGLGTY MWPDGSSFTG 
51 TFYLSHREGY GTMYMKTRLF QTHCHNDIVN LLLIXTGAOVN KCSDEGLTAL 
101 SMCFLLHYPA QSFKPNVAER TIPEPQEPPK FPVVPILSSS FMDTNLESLY 
151 YEVNVPSQGS YELRPPPAPL LLPRVSGSHE GGHFQDTGQC GGSIDHRSSS 
201 LKGDSPLVKG SLGHVESGLE DVLGDTDRGS LCSAETKFES NLCVCDFSIE 
251 LSQAMLERSA OSHSLLKMAS PSPCTSSFDK GTMRRMALSM IE 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKF2phfbr2_23nl6, frame 1 

TREMBL:AB005902_1 product: "AtPIPSKl"; Arabidopsis thaliana mRNA for 

AtPIPSKl, complete cds., N = 2, Score = 138, P = l.le-06 

TREMBL:AF019380_1 product: "putative phosphatidylinositol-4-phosphate 
S-kinase"; Arabidopsis thaliana putative 

pho3phatidylinositol-4-phoaphate 5-kinase mRNA, complete cds , N = 2 
Score = 138, P = 1.4e-06 

PIR:TD2098 probable phosphatidylinositol-4-phosphate 5-kinase - 
Arabidopsis thaliana, N = 2, Score « 135, P = 6.7e-06 


>TREMBL:AB005902_1 product: "AtPIPSKl"; Arabidopsis thaliana mRNA for 
AtPIPSKl, complete cds. 
Length = 683 

NSPs: 


Score = 138 (20.7 bits). Expect = l.le-06. Sum P(2) = l.le-06 
Identities = 23/61 (37%), Positives = 35/61 (57%) 

Query: l MYQGEFGLNMKLGYGKFSWPTGESYHGQFYRDHCHGLGTYMWPDGSSFTGTFYLSHREGY 60 

MY+G++ G GKFSWP+G +Y G+F G GT+ DC ++ GT+ + G+ 

Sbjct: 34 MYEGDWKRGKASGKGKFSWPSGATYEGEFKSGRMEGFGTFTGADGDTYRGTWVADRKHGH 93 

Query: 61 G 61 
G 

Sbjct: 94 G 94 


186 


wo 01/12659 PCT/IBOO/01496 


Score = 112 (16.8 bits). Expect = 9.7e-04, Sum P{2) = 9.7e-04 
Identities = 19/51 {37%), Positives » 27/51 (52%) 

Query: 12 LGYGKFSWPTGESYHGQFYRDHCHGLGTYMWPDGSSFTGTFYLSHREGYGT 62 

+G GK+ W G YG + R GG + WP G+++ G F EG+GT 
Sbjct: 22 IGSGKYLWKDGCMYEGDWKRGKASGKGKFSWPSGATYEGEFKSGRMEGFGT 72 

Score = 97 (14.6 bits). Expect = 4.4e~02, Sum P(2) = 4.3e-02 
Identities = 19/60 (31%), Positives = 32/60 (53%) 

Query: 2 YQGEFGLNMKLGYGKFSWPTGESYHGQFYRDHCHGLGTYMWPDGSSFTGTFYLSHREGYG 61 

Y+GEF G+G F+ G++Y G + D HG G + +G + GT+ + ++G G 

Sbjct: 58 YEGEFKSGRMEGFGTFTGADGDTYRGTWVADRKHGHGQKRYANGDFYEGTWRRNLQDGRG 117 

Score * 93 (14.0 bits). Expect = 1.2e-01, Sum P(2) = l.le-01 
Identities - 18/62 (29%), Positives 34/62 (54%) 

Query: 2 YQGEFGLNMKLGYGKFSWPTGESYHGQFYRDHCHGLGTYMWPDGSSFTGTFYLSHREGYG 61 

Y+G + + K G+G+ + G+ Y G + R+ G G Y+W +G+ +TG + + G G 
Sbjct: 81 YRGTWVADRKHGHGQKRYANGDFYEGTWRRNLQDGRGRYVWRNGNQYTGEWRIGVISGKG lAQ 

Query: 62 TM 63 
+ 

Sbjct: 141 LL 142 

Score = 91 (13.7 bits), Expect = 2.0e-01, Sura P(2) = 1.8e-01 
Identities = 18/51 (35%), Positives = 24/51 (47%) 

Query: 2 YQGEFGLNMKLGYGKFSWPTGESYHGQFYRDHCHGLGTYMWPDGSSFTGTF 52 

YGE+ + +GG WPGYG+ GG+W DGSS G + 

Sbjct: 127 YTGEWRIGVISGKGLLVWPNGNRYEGLWENGIPKGNGVFTWSDGSSCVGAW 177 

Score = 90 (13.5 bits). Expect = 2.6e-01, Sum P(2) = 2.3e-01 
Identities = 17/60 (28%), Positives = 31/60 (51%) 

Query: 2 YQGEFGLNMKLGYGKFSWPTGESYHGQFYRDHCHGLGTYMWPDGSSFTGTFYLSHREGYG 61 

Y+G + N++ G G++ W G Y G++ G G +WP+G+ + G + +G G 

Sbjct: 104 YEGTWRRNLQDGRGRYVWRNGNQYTGEWRIGVISGKGLLVWPNGNRYEGLWENGIPKGNG 163 

Score = 45 (6.8 bits). Expect = l.le-06. Sum P{2) = l.le-06 
Identities = 14/62 (22%), Positives = 26/62 (41%) 

Query: 215 VESGLEDVLGDTDRGSLCSAETKFESNLCVCDF— SIELSQAMLERSAQSHSLLKMASPS 272 

V+SG + G+ +C E+ E+ CD ++E S +R + + + 

Sbjct: 205 VDSGAGSLGGEKVFPRICIWESDGEAGDITCDIIDNVEASMIYRDRISVDRDGFRQFKKN 264 

Query: 273 PC 274 
PC 

Sbjct: 265 PC 266 


Pedant information for DKFZphfbr2_23nl6, frame 1 


Report for DKFZphfbr2_23nl6. 1 


(LENGTH) 292 

tMWJ 32214.44 

(pl] 5.51 

(HOMOL) TREMBL:AB005902_1 product: "AtPIP5Kl"; Arabidopsis thaliana mRNA for AtPIPSKl, 

complete cds. 7e-08 . 

[BLOCKS] BL01137A Hypothetical YBL055c/yjjV family proteins 

[PROSITEl WW_D0MAIN_1 1 

CPROSITE] MYRISTYL 5 

{PROSITE] CK2_PH0SPH0_SITE 7 

[PROSITEl PKC_PHOSPHO SITE 5 

[KW] Alpha^Beta " 

[KW] LOW_COMPLEXITY 4,11 % 


SEQ MYQGEFGLNMKLGYGKFSWPTGESYHGQFYRDHCHGLGTYMWPDGSSFTGTFYLSHREGY 
SEG 

PRD cccccccccccccccceeeccccccccccccccccccccccccccccccceeeeeccccc 

SEQ GTMYMKTRLFQTHCHNDIVNLLLDCGADVNKCSDEGLTALSMCFLLHYPAQSFKPNVAER 

SEG 

PRD cccchhhhhheeeccccchhhhhcccccccccccccchhhhhhhhhccccccccccceee 

SEQ TIPEPQEPPKFPVVPILSSSFMDTNLESLYYEVNVPSQGSYELRPPPAPLLLPRVSGSHE 
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SEG XXXXXXXXXXXX 

PRO eccccccccceeeeeeeccccccccccceeeeeecccccccccccccccccccccccccc 

SEQ GGHFQDTGQCGGSIDHRSSSLKGDSPLVKGSLGHVESGLEDVLGDTDRGSLCSAETKFES 

SEG 

PRD cccccccccccccccccccccccccceeecccccccccccccccccccccceeeeecccc 

SEQ NLCVCDFSIELSQAMLERSAQSHSLLKMASPSPCTSSFDKGTMRRMALSMIE 

SEG 

PRD cccccchhhhhhhhhhhhhhhhhhhhcccccccccccccccchhhhhhhccc 


Prosite for DKFZphfbr2_23nl6.1 


PS00005 

55->58 

PKC PHOSPHO 

SITE 

PDOC00005 

PS00005 

112->115 

PKC PHOSPHO' 

"site 

PDOC00005 

PS00005 

200->203 

PKC PHOSPHO" 

'site 

PDOC00005 

PS00005 

226~>229 

PKC PHOSPHO" 

'site 

PDOC00005 

PS00005 

282->2B5 

PKC PHOSPHO' 

"site 

PDOCOOOOS 

PS00006 

55->59 

CK2 PHOSPHO" 

'site 

PDOC00006 

PS00006 

121->125 

CK2 PHOSPHO' 

'site 

PDOC00006 

PS0000 6 

140->144 

CK2 PHOSPHO' 

'site 

PDOCOOOOS 

PS00006 

144->148 

CK2 PHOSPHO' 

"site 

PDOC00006 

PS00006 

217->221 

CK2 PHOSPHO" 

"site 

PDOCOOOOS 

PS00006 

236->240 

CK2 PHOSPHO" 

"site 

PDOCOOOOS 

PS00006 

276->280 

CK2 PHOSPHO" 

"site 

PDOCOOOOS 

PS00008 

45->51 

MYRISTYL 


PDOC00008 

PS00008 

86->92 

MYRISTYL 


PDOCOOOOS 

PS00008 

177->133 

MYRISTYL 


PDOC00008 

PS00008 

188->194 

MYRISTYL 


PDOCOOOOS 

PS00008 

229->235 

MYRISTYL 


PDOCOOOOS 

PS01159 

19->44 

WW DCM4AIN 1 


PDOC50O20 


(No Pfam data available for 0KFZphfbr2_23iil6.1) 
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PCT/IBOO/01496 


DKFZphfbr2_23o24 


group: brain derived 

DKFZphfbr2_23o24 encodes a novel 139 amino acid protein with similarity to CAAX-box protein 

The CAAX box is a prenyl group binding site found in a number of eukaryotic proteins, such 
which is found in Ras- and ras-like proteins such as EUio, Rab, Rac, Ral, and Rap, as well a 
in nuclear lamins A and B, some G protein alpha and gamma subunits and some dnaJ-like 
proteins. These proteins are posttranslationally modified at this site by the attachment of 
either a farnesyl or a geranyl-geranyl group to a cysteine residue. 

No informative BLAST results; no predictive prosite, pfam or SCOP motif e 

The new protein can find application in studying the expression profile of brain-specific 
genes. 


similarity to lectins 

complete cDNA, complete cds, EST hits 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 3564 bp 

Poly A stretch at pos. 3541, no polyadenylation signal found 


1 GAATGGCTCC GCAGATGGCC GGCACTGAGA GCCAGCAAGA AGCGGAGGAG 
51 ATGGGCCTTC AGCAGGGGGT TGCGGGGGGA GCTTTAAACT GAGCCCTGTA 
101 AACATGGCAG AACTGCTCAG TGGGAGACTC TCAGCACAGA CGGTCATGGG 
151 GAAGTGAGTG CAGTTCATTT GTAATCTTGT TGTCGAGTTC TGGGTTTTTT 
201 TTGTTTGTTT CGTAACTTTA AAGGTATGCA CTTTATATAG ATTTATTTAT 
251 TTGCTGGGAC CGTTACTCAG AGTTCCTAGA AATGTACACA GCTTTTTTAC 
301 CAGGGTTACT CCTCAGAATC ACTTGTCACT TCTTTAAATG AATGAATGAA 
351 TGTGCCAGGC CCTATGCCTG GAGGTTGGGA GCTTCATCTA CATCACATTC 
401 TAACAGGTGA CCACTGGGGT AAGCACTGTG TGACTGCAAA GCCAGGGTGT 
451 GTTTCCATCA ACACCCAGAT GACCGTGCCT ATGTGCCCCT GTTGTCCTCC 
501 CTCCAGGACT GCCTCCTCAC CCCACCCCTT TCTGCAGCTC CTCATCTAAA 
551 CATCTCGCCT GGTGAGGTCA CGGCTTAGCC TGTTGGCCAG TGGCCCCACC 
601 ACCATCCTTC CCCCTGTGCA GATTGGAGGA GGCCAGGTCT CTCCCCTTAG 
651 CTCCTATGTC CCCTTCACCC CCCATGCCAC AGATGAGACA TTCACAGAGT 
701 TTGCAGATGA TGGAAGAGAA GACTCCAGGT TGCCAGGTGT GTCCACTCTC 
751 AGGAACCCCC AGCCCAAGCC TCACTGCTCG TGTTCCCAGC CAACCCCAGC 
801 ACGGGGGATA CGCCGGTGCT GTTTCCCTGC TCAGATACAA CCAGTTACCA 
851 GAAACGACCT CACCCCTCCA ACCACTTTCC AAGGTGCCAG GACAGAGAAG 
901 CCCTTCACTG GCCCACCCAG GGCAGTTGAC AGAGGGATGC CCTCCTTGGA 
951 GGGGAGCCTC ACCTCTACCC ACAGGGCCGC GGCCTTGTCC TGGATTCTCA 
1001 CCGGGGCAGT CACGTCAGGA TGGAGAGGTC CCATGTCAGC CAGTTCTTTG 
1051 GTGGGGGTCA TGTAGTCTGA AATGACCTGC CGATGGTCCA GGCTGAGCCA 
1101 GGGAAGCTGA GCCTGGGTGC CTTTTTGGTG CCTACTCTGA CTTGAGTTGG 
1151 ATTCATGCCA CAGACCCACC TTCTTGAGCA ACAACACATA TAGCCACCAA 
1201 CACAAGAGCC AGGCACACAC TGAGCAGAGA AAGTCCCTGT CGCCTCACCA 
1251 CCCAAAAACT CCAGCTTTGC AGAGACCAAG GTTCTTCTCT ACCTTTGCAG 
1301 AAGCCTCTGT GACCAAACCC GGAGCTTGCC CTTCTGAGGC CTCTAGCATT 
1351 TCTCCAGGTG TTTTTCAGAG GACTTGGTTT AAATTTGTTC ACCCCAAATG 
1401 TGGTCTTTCC CGGATCATGA AAGGATCTGC CGCAAAGGTG AATCTGAGTC 
1451 TCCTCAGAGT CATATGAGAC TGAAACTGCT TATAACATTT CCGTGACCTA 
1501 ATAAGTCTTC CAAAAATGTA GGGTATTAAG AGTTTAGTGA CATTAAAAAG 
1551 TTTAGTCGAA AATATCGTGA TTCAGGTATA TTTAGACATT TGATTCATGC 
1601 CAAATTGCCA CTGTTAACAG AAAACACACC CCAAGCACAT TAATGCCTAG 
1651 ATATTTCAAA CCCTTTTCTG CCCACACATT CTTAAAAATA ATATACTGAG 
1701 AAATCTATAT ACAGGTTTTT TTTTAATTAG CTTGGAAAAG AGCAGTTGTA 
1751 TTCTGTTTGA ACAGCTGCTA ATGTCAATTC CTGTGGGAAG AAAGACCAAA 
1801 GAACATGGAG TTACACCAAG AATTTTAAAA CAAAGACGCT GTCCCTTTCC 
1851 TGAGCACCGT GCAGCCAAGA CTGAGAGATC AGTCTGAGAC CTGTGATTAA 
1901 GGAGTGTTTT CTACATAGCG TATAATTATG GAGCCACACA AGTGGGCCAT 
1951 TACTCTGTTG AGTGCTTCAT GTTTGAGGTA TTTTCGTGTT CCAACTTACA 
2001 TTAAAGTGTT TATAAAACAG GAAAAATCCA CGAGCAGGTA TTGACACTAT 
2051 CCATATTAGA TCATCACAAA ATTATATATA TAGCAGAGTC ATAAACAATG 
2101 AGAAACGGTC TTCCCACACT TGCTTTAAAT GGCCATGACC TAGTGTTTAG 
2151 GGAAAGCAGT AAAATCAGCG AGGAGCTCGT GGGAAAAATG AGACGGGCCC 
2201 TGAGGGGGTG ACTCATGGGC CAAGCAGGGC CACACAGGTA CCAGGCCGCC 
2251 ACGTCCTCTC CTGCCTCTCA CTCTCTGGAG ACTGGACTTC CTTTACTGCC 
2301 TCCTTTCTGA CATTTCCTAG ACATCAGACT TTGCTACTTA GTACACAAAC 
2351 GGGGTTCCCT TTTAAATTTG TTCACTCTAG TTAGCATTTG CAGAAGCTGT 
2401 GAAAAATTAC AGAGAGATGA TGTGTTGGGT AAGAGATGGT TTAAAAGTCC 
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2451 AGCTTGCTGT TTTTCATTAA GTGTCTTGAA AATGAGTAAG TGGCGTTCCT 
2501 GGAGGGGAAC AATCATATAA TTCCGCAGGG TGGGTCTAAA CTTGTTTTCT 
2551 GATAGTGTTT AGCAGCTCAT GGCTCTGAGG GCACCTGATA ACACAGCAGC 
2601 CAGGCGCTGA TGAGAAGTGT GTGCCAGACA GACCCGAGTG TGGCTTGGCT 
2651 CTTGCCTTAT GTTCCTTTCT CTGTTCAGAG AAGCGTGAGA TGAGATTTTG 
2701 TGATTATATT GCACTCCTTG GGCTGACTTT CCCATGCACA GAATGTTTTA 
2751 CACATCCTGA TAGCTGAGCT GAAAATGCAA AGAGAAGGGA AAATGCCTTA 
2801 AATTGTTCTG GCTAATTTAG AAGCAGCAGG CCTTGGAAGT CTTTGTCCTG 
2851 TGTCCCTGAA CAAATCTTAT GGGAGCTCTG GTACCTATGC CAGAAAATGC 
2901 ACATAGGCAC AACACTTTTA CATACACGTT CACACACCCC ACCCTTATGG 
2951 AGAACTTTTT TCTAAATAAG AGAAAGAAAA ATTTTAAGAC TTACAAGTTA 
3001 TGTTTAGGTA TTTTACATGG TTCAGAAAAC AAGACATGAA GCGGTATAAA 
3051 CTGAGAAGTC TTGTTCCCAC AACCCCACGT GCCAGGTACA CATAACCATT 
3101 TTTATTCACC TCTAGCTTGT GCTTCCAATG TTTGTTAGGC ATATGTAAAT 
3151 AAGTGAATAG ATAAGCATTT CTCCCTCCTT TTGCTGACAT GAGTGGTGGC 
3201 ATGTTTTGCC CCTGGCTTTT ATCCCTTGAC CCCATTCCAG TACCTAGAGA 
3251 CCTGCTTCAT TTTTTTAGAT GTGTAATACT TCATGTGTGC GTGTGCCTTA 
3301 GTGATTAACT CGTGCACTGT GCAGGGACAT CGGGCTGGGA TCAGTTTGTT 
3351 CACTGATATA TACAGCGCTG CGGGAGATAC CCTCACATGT GTATCATTTG 
3401 GTCCATGTGC AGGTGTGTCT GGAAGATAGA ATTCTAGGCG TAGAATTGAT 
3451 AGGTTAAATG TATTTATAGG GAAAAAATCA ATATAAAACT TTGCGTGTAA 
3501 TGATATTTGC GTGCTTTTTT TTTTAATTTT TTTACCCAAA TAGTAAAAAA 
3551 AAAAAAAAAA AAAA 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 2 


ORF from 656 bp to 1072 bp; peptide length: 139 
Category; similarity to known protein 


1 MSPSPPMAQM RHSQSLQMME EKTPGCQVCP LSGTPSPSLT ARVPSQPQHG 
51 GYAGAVSLLR YNQLPETTSP LQPLSKVPGQ RSPSLAHPGQ LTEGCPPWRG 
101 ASPLPTGPRP CPGFSPGQSR QDGEVPCQPV LWWGSCSLK 

BLASTP hits 

Entry CEEGAP7_1 from database TREMBL: 

gene: "EGAP7.1"; Caenorhabditis elegans cosmid EGAP7. 

Score = 123, P = 2.3e-07, identities « 35/103, positives = 44/103 

Entry MMBPC35_1 from database TREMBL: 

Mouse carbohydrate binding protein 35 mRHA, 3* end. 

Score « 113, P = 2.2e-06, identities « 40/103, positives * 44/103 

Entry A28651 from database PIR: 

galactose-specific lectin - mouse >TR£MBL : MMMAC2A_1 Mouse mRNA for 
Mac-2 antigen 

Score = 113, P = 2.2e-06, identities = 40/103, positives = 44/103 


Alert BLASTP hits for DKFZphfbr2_23o24, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_23o24, frame 2 


Report for DKFZphfbr2_23o24 . 2 


139 

14748.91 
8.90 

PRENYLATION 1 


(LENGTH J 

(MWl 

Ipl) 

[PROSITEJ 
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PCT/IBOO/01496 


(PROSITEJ MYRISTYL l 

(PROSITE] CK2_PH0SPH0 SITE 1 

(PROSITEJ PR0KAR_LIPOPR0TErN 1 

(PROSITEl PKC^PHOSPHO SITE 1 

fKWl All_Alpha " 


FRD «=ccchhhhhhhhhhhhhhhhhccccccccccccccccccccccccccccccchhhhhhhh 


PRD ^US^^^'^^^^^QP^SKVPGQRSPSLAHPGQLTEGCPPWRGASPLPTGPRPCPGrSPGQSR 
PRO h^^cccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 


SEQ QDGEVPCQPVLWWGSCSLK 
PRD ccccccccccccccccccc 


Prosite for DKF2phfbr2_23o24 .2 


PS00005 40->43 PKC PHOSPHO SITE 

PS00006 119->123 CK2~PH0SPH0"SITE 

PS00008 50->56 MYRISTYL * 

PS00013 126->137 PROKAR LIPOPROTEIN 

PS00294 136-M40 PRENYLATIOM 


PDOC00005 
PDOC00006 
PDOC00008 
PDOC00013 
PDOC00266 


(No Pfam data available for DKF2phfbr2_23o24 .2) 
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PCT/IBOO/01496 


DKFZphfbr2_23o5 


group: brain derived 

DKF2phfbr2_23o5 encodes a novel 360 amino acid protein with no known similarity 

No informative BLAST results; no predictive prosite, pfam or SCOP motife 

The new protein can find application in studying the expression profile of brain-specific 
genes . 


unknown 

potential start at Bp 24 matchs Kozak consensus ANNatgG 
Sequenced by AGOWA 
Locus; /inap="7q21-q22" 
Insert length: 1736 bp 

Poly A stretch at pos. 1714, polyadenylation signal at pos. 1680 


1 GGGGGAGGAT CAAAGTAGGC AAGATGGCGT CGAGCGGCGG GGAGCCAGGG 
51 AGTTTATTTG ATCACCACGT CCAGAGGGCG GTATGCGACA CACGGGCCAA 
101 ATATCGAGAG GGACGACGGC CTCGTGCTGT GAAGGTATAT ACAATCAATT 
151 TGGAATCTCA GTACTTATTA ATACAAGGAG TTCCTGCTGT GGGAGTCATG 
201 AAGGAATTAG TTGAGCGATT CGCTTTATAT GGTGCAATTG AACAGTACAA 
251 TGCTCTAGAT GAATACCCAG CAGAAGACTT TACTGAAGTT TATCTTATTA 
301 AATTTATGAA CTTACAAAGT GCAAGGACAG CCAAGAGAAA AATGGATGAA 
351 CAGAGTTTCT TCGGTGGATT GCTTCATGTG TGCTATGCTC CAGAATTTGA 
401 AACAGTTGAA GAAACTAGAA AAAAACTACA AATGCGGAAG GCATATGTAG 
451 TAAAAACTAC TGAAAATAAA GACCATTACG TGACAAAGAA GAAATTGGTT 
501 ACAGAGCATA AAGACACAGA GGATTTTAGA CAAGACTTCC ACTCAGAGAT 
551 GTCTGGATTT TGTAAAGCTG CTTTGAACAC TTCTGCAGGG AACTCAAATC 
601 CTTATCTTCC GTATTCCTGT GAATTGCCTT TATGTTATTT CTCCTCAAAA 
651 TGTATGTGTT CATCCGGGGG ACCTGTAGAC AGAGCACCAG ACTCCTCTAA 
701 GGATGGTAGA AACCATCATA AAACAATGGG GCATTATAAC CACAATGACT 
751 CTTTGCGGAA AACACAGATA AACTCTTTGA AAAACTCAGT GGCCTGCCCT 
801 GGTGCACAAA AGGCTATTAC GTCTTCAGAG GCAGTTGACA GATTTATGCC 
851 TAGGACAACA CAACTGCAGG AGCGCAAAAG AAGAAGAGAA GATGATCGTA 
901 AACTTGGAAC TTTTCTTCAA ACAAACCCAA CTGGTAATGA GATTATGATT 
951 GGACCTCTGT TACCAGACAT CTCTAAAGTG GATATGCACG ATGACTCATT 
1001 GAATACAACG GCGAATTTAA TTCGGCATAA ACTTAAAGAG GTATTTCATC 
1051 TGTGCCAAAG CCTCCAGAGG ACAAGCCAGA AGATGTACAT ACAAGTCATC 
1101 CATTAAAACA AAGAAGAAGA ATATAGAGTG CC AGCAGCAA CTTAGTATTT 
1151 TCTAAAAAGA ACATTTATTA TTTATTTTTA GCCTGTCATT TTAATTCTTC 
1201 AAGAGATTTT ACTGCTGGTA TTTTTTGATG CACTCCTCTT TGTAATTTCA 
1251 TTCAAGCCAT TTGTCTAAAG TCATTTCTTT GTTTTTTGGG AGATGGAGTC 
1301 TTGCTCTGTT GCCCAGGCTG GAATGCAGTG GCGTGATCTC GGCTCACTGC 
1351 AACCTCCACC TCCCGGGTTC AAGCGATTCT CCTGCCTCAG CCTCCTGAGT 
1401 ATCTGGGATT ACAGGCGTGC ACCACCATGC CTGGCTAAGT TTTGTGTTTT 
1451 TTTTAGTAGA GATGGGTTTT CACCATATTG GTCAGGCTGG TCTCGAACTC 
1501 CTGACCTTGT GATACACCTG CCTCAGCCTC CCAAAGGGAT GAGCCACCGC 
1551 GCCTGGCCCA TTTCTTCTTT TTTTGACCCA TACTTAATGT TGCAGAAACT 
1601 ATTCTTGTCA TAACATTATC TCTCATGTAC AGTAATTATA TGTAAATTAA 
1651 TTGAAGCAAA TATGGAAACT TTACAATAGA AATAAAGATA GGCAGCCAGC 
1701 GTCTGTTTCC AATTATAAAA AAAAAAAAAA AAAAAA 


BLAST Results 


Entry AC005156 from database EMBL: 

Homo sapiens PAC clone DJ1099C19 from 7q21-q22, complete sequence. 
Score = 2897, P = 2.4e-154, identities = 583/586 
2 exons covering Bp 4.65-1723 


Medline entries 


NO Medline entry 


Peptide information for frame 3 
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ORF from 24 bp to 1103 bp; peptide length: 360 
Category: similarity to unknown protein 


1 MASSGGEPGS LFDHHVQRAV CDTRAKYREG RRPRAVKVYT INLESQYLLI 
51 QGVPAVGVMK ELVERFALYG AIEQYNALDE YPAEDFTEVY LIKFMNLQSA 
101 RTAKRKMDEQ SFFGGLLHVC YAPEFETVEE TRKKLQMRKA YVVKTTENKO 
151 HYVTKKKLVT EHKDTEDFRQ DFHSEMSGFC KAALNTSAGN SNPYLPYSCE 
201 LPLCYFSSKC MCSSGGPVDR APDSSKDGRN HHKTMGHYNH NDSLRKTQIN 
251 SLKNSVACPG AQKAITSSEA VDRFMPRTTQ LQERKRRRED DRKLGTFLQT 
301 NPTGNEIMIG PLLPDISKVD MHDDSLNTTA NLIRHKLKEV FHLCQSLQRT 
351 SQKMYIQVIH 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_23o5, frame 3 

TR£MBL:AC005824_10 gene: "F15K20.11"; Arabidopsis thaliana chromosome 
II BAG F15K20 genomic sequence, complete sequence., N = 2, Score « 114, 


>TREMBL:AC005824_10 gene: "F15K20.11"; Arabidopsis thaliana chromosome II 
BAG F15K20 genom}.c sequence, complete sequence. 
Length « 227 

HSPs: 

Score * 114 (17.1 bits). Expect = 3.6e-H, Sum P<2) = 3.6e-ll 
Identities - 21/41 (51%), Positives « 29/41 (70%) 

Query: 103 AKRKMDEQSFFGGLLHVCYAPEFETVEETRKKLQMRKAYW 143 

AKRK+DE SF G L + YAPE+E V +T+ KL+ R+ V+ 
Sbjct: 51 AKRKLDESSFLGNRLQISYAPEYENVNDTKDKLESRRKEVL 91 

Score = 107 (16.1 bits). Expect « 2.6e-10, Sum P(2) « 2.6e-10 
Identities = 50/191 (26%), Positives = 83/191 (43%) 

Query: 103 akrkmdeqsffggllhvcyapefetveetrkklqmrkaywkttenkdhyvtkkklvteh 162 

AKRK+DE SF G L + YAPE+E V +T+ KL+ R+ V+ + T + VT+ 

Sbjct: 51 AKRKLDESSFLGNRLQISYAPEYENVNDTKDKLESRRKEVLARLNPQKEKSTSQ— VTKL 108 

Query: 163 KDTEDFRQDFHSEMSGFCKAALNTSAGNSNPYLPYSCELPLCYFSSKCMCSSGGPVDRAP 222 

+ D S + + GN+ P S + YF+S M + V 
Sbjct : 109 AGPALTQTDNVSSQRREMEYQFHR— GNA-PVTRVSSDQE— YFASSSMNQTVKTV 159 

Query: 223 DSSKDGRNHHKTMGHYNHNDSLRKTQINSLKNSVACPGAQKAITSSEAVDRFMPRTTQLQ 282 

K++++H+++N+ P+Q S RP ++Q+Q 

Sbjct: 160 -REKLNKTREENISSLSHCKQIEESG-NQKRLQ— PSSQTQPEESGNQKRLQP-SSQIQ 213 

Query: 283 -ERKRRREDDRK 293 

+ KR R D+R+ 
Sbjct: 214 FDLKRTRVDNRR 225 

Score = 102 (15.3 bits). Expect - 3.6e-ll, Sum P(2) - 3.6e-ll 
Identities » 22/55 (40%), Positives = 38/55 (69%) 

Query: 26 KYREGRRPRAVKVYTINLESQYLLrQGVPAVGVMKELVERFALYGAIEQY— NALDE 80 

+Y++ P AV+VYT+ ES+Y++++ VPA+G +L+ F YG +E++ LDE 
Sbjct: 3 RYKD-ETP-AVRVYTVCDESRYMIVRNVPALGCGDDLMRLFMTYGEVEEFAKRKLDE 57 


Pedant information for DKF2phfbr2_23o5, frame 3 


Report for DKFZphfbr2_23o5.3 


(LENGTH) 360 
(MW) 41105.85 
Ipl] 8.89 


(H^OL) TREMBL:AC005824_10 gene: -F15K20.il"; Arabidopsis thaliana chromosome II BAG 

FiDK<^o genomic sequence, complete sequence. 5e-12 


IPROSITE) AMIDATION 1 

(PROSITEJ MYRISTYL 2 

[PROSITE] CK2 PHOSPHO SITE 7 


193 


wo 01/12659 


PCT/lBOO/01496 


[PROSITEJ PKC_PHOSPHO_SITE 9 

[PROSITEl ASN GLYCOSYLATION 3 

{KWl Alpha^Beta 

(KWJ LOW COMPLEXITY 4.17 ' 


SEQ MASSGGEPGSLFDHHVQRAVCDTRAKYREGRRPRAVKVYTINLESQYLLIQGVPAVGVMK 

SEG 

PRD ccccccccceeeecceeeeehhhhhhhhhccccceeeeeeecccceeeeeeccccchhhh 

SEQ ELVERFALYGAIEQYNALDEYPAEDFTEVYLI KFMNLQSARTAKRKMDEQS FFGGLLHVC 

SEG 

PRD hhhhhhhhhhhhhhhhhhccccccceeeeeeehhhhhhhhhhhhhhhhhccccccceeee 

SEQ YAPEFETVEETRKKLQMRKAYVVKTTENKDHYVTKKKLVTEHKDTEDFRQDFHSEMSGFC 

SEG 

PRD eccchhhhhhhhhhhhhhhhheeeeccccceeeeeeeeeeeccccchhhhhhhhhcccce 

SEQ KAALNTSAGNSNPYLPYSCELPLCYFSSKCMCSSGGPVDRAPDSSKDGRNHHKTMGHYNH 

SEG 

PRD eeeeccccccccccccccccccceeecccccccccccccccccccccccccccccccccc 

SEQ NDSLRKTQINSLKNSVACPGAQKAITSSEAVDRFMPRTTQLQERKRRREDDRKLGTFLQT 

SEG xxxxxxxxxxxxxxx 

PRD cccceeeeccccccccccccceeeeecceeeeeccccchhhhhhhhhhhhccceeeeeec 

SEQ NPTGNEIMIGPLLPDISKVDMHDDSLHTTANLIRHKLKEVFHLCQSLQRTSQKMYIQVIH 

SEG 

PRD cccccceeeecccccccccccccccccchhhhhhhhhhhhhhhhhhhhhcchhhhhhccc 


Prosite for DKF2phfbr2_23o5.3 


PSOOOOl 

185- 

->189 

ASN GLYCOSYLATION 

PDOCOOOOl 

PSOOOOl 

241- 

->245 

ASN GLYCOSYLATION 

PDOCOOOOl 

PSOOOOl 

327- 

->331 

ASN GLYCOSYLATION 

PDOCOOOOl 

PS00005 

99- 

->102 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

102->105 

PKC PHOSPHO SITE 

PDOCO0005 

PS00005 

131- 

•>134 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

154- 

'>157 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

207- 

->210 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

224- 

•>227 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

243- 

'>246 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

251- 

■>254 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

351- 

>354 

PKC PHOSPHO SITE 

POOC00005 

PS00006 


4->8 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

10 

i->14 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

127- 

>131 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

224- 

>228 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

266- 

>2?0 

CK2 PHOSPHO SITE 

PDOC00006 

PS000D6 

303- 

>307 

CK2 PHOSPHO SITE 

PDOC0C006 

PS00006 

317- 

>321 

CK2 PHOSPHO SITE 

PDOC00006 

PS00008 

5 

->11 

MYRISTYL 

PDOC00008 

PS00008 

260- 

>266 

MYRISTYL 

PDOCO0008 

PS00009 

29 

->33 

AMIDATION 

PDOC00009 


(No Pfam data available for DKFZphfbr2_23o5. 3) 
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PCT/IBOO/01496 

DKFZphfbr2_2a2 
group: brain derived 

autoantig^^^ ' ""^"^ ^^^^ P^^^"" "^^^ "-^'^ similarity to hunian 52K 

The novel protein contains a C3HC4 Zinc finger "RING finger" motive 

This domaxn is probably involved in mediating protein-protein interactions 

containing a RING-finger are: mammalian V(D)j recombination activ^tino nrn^.in 
(RAGl), mouse rpt-1, human rfp, human 52 Kd Ro/SS-A protein anS others? ^ ^ 

NO informative BLAST results; no predictive prosite, pfam or SCOP motif e 

The^new protein can find application in studying the expression profile of brain-specific 

similarity to 52K autoantigen Ro/SS-A - human 

complete cDNA, complete cds, few EST hits 

Sequenced by Qiagen 

Locus: unknown 

Insert length: 1376 bp 

Poly A stretch at pos . 1355, polyadenylatioif signal at pos. 1340 

1 GGGGACTCCA AATTAGAAAG GGGACGTCTA GTGGGTTGCC CGGGAGGGGT 
51 GGCGGGAGCG GTCCTGGAAA TAATCTGTCC TCTGTCGCCG GGAACTGGCG 

101 AGGTAGTTCC TTCGCGGTGG AGAGACCTGG AATGGCCAAA TATCAAGGTG 

151 AAGTTCAAAG TTTGAAACTG GATGATGATT CAGTTATAGA AGGAGTAAGC 

201 GACCAAGTAC TTGTGGCAGT TGTGGTCAGT TTCGCTTTGA TTGCTACCCT 

251 GGTATATGCA CTTTTCAGAA ATGTACATCA AAACATTCAC CCAGAAAACC 

301 AGGAGCTAGT AAGGGTACTT CGAGAACAGC TTCAAACAGA ACAGGATGCA 

^1} CCTGCTGCCA CTCGACAGCA GTTCTACACT GACATGTACT GTCCCATCTG 

401 CCTGCACCAA GCCTCCTTCC CGGTGGAGAC CAACTGTGGA CATCTTTTTT 

451 GTGGTGCCTG CATTATTGCT TACTGGCGAT ATGGTTCATG GCTTGGGGCA 

501 ATCAGTTGTC CAATCTGTAG ACAAACGGTA ACCTTACTCC TAACAGTATT 

^11 TGGTGAAGAT GATCAGTCTC AGGATGTTCT GAGATTGCAT CAGGATATTA 

601 ATGATTATAA CCGGAGATTC TCAGGGCAAC CCTGATCTAT TATGGAGAGA 

651 ATTATGGATC TACCCACTTT ACTGAGGCAT GCATTCAGGG AAATGTTTTC 

701 AGTCGGGGGC CTTTTCTGGA TGTTTCGCAT CAGGATAATA CTTTGTTTAA 

2. ^SS^^*^ TTTCTATCTT ATATCACCTC TAGATTTTGT ACCTGAAGCC 

801 TTGTTTGGAA TTCTAGGCTT TCTAGATGAT TTCTTTGTCA TCTTTTTATT 

851 GCTTATCTAC ATCTCTATTA TGTATCGAGA AGTGATAACC CAAAGGCTAA 

901 CTAGATGAAA AAGGAAACAA AACTGAGTTT ACTAGGATAT CTGAGCTAAT 
,no,^ GTAGAACATC AAACAGAAGG ACCCATGGCA GTATAAAGCA ATGAAGCAAT 
^GTATTAT CTCACAAATA TAAAACCACT ATAAGACAAA CATTTGATTA 
1051 TCATTTGACA AATACCTAGG TATAACTGGA ATTTTCATGT TTGAAGTTCT 
}}^}: ^l^'^'^^^'^ TTAGAATTAT AATGATCTAC AGTTGTATCT TGATTCTATG 
1151 TTGTCTGGAA AAAATATGGA ATTATATAAA AAGGGATGCT TTTATATATT 
J?c! 113!?^'^'^^^ CCAGAATTAC TTAGATTAAT TAGATGTATA GTAAAATATT 
1251 GTTAAATGTC AGTTTATCCA TCTTATCCTT CTCAGCAGGT ACCTATATGA 
1301 TAATATATAG CTGTGAAACT CATCTAAATA TTTTTGTTCC AATAAAATAT 
1351 TATATACTAA AAAAAAAAAA AAAAAA 

BLAST Results 

No BLAST result 

Medline entries 

No Medline entry 


Peptide information for frame 3 


ORF from 132 bp to 632 bp; peptide length: 167 
Category: similarity to known protein 
Classification: unset 
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Prosite motifs: ZINC^FIKGER C3HC4 (102-112) 


1 MAKYQGEVQS LKLDDDSVIE GVSDQVLVAV VVSFALIATL VYALFRNVHQ 
51 NIHPENQELV RVLREQLQTE QDAPAATRQQ FYTDMYCPIC LHQASFPVET 
101 NCGHLFCGAC IIAYWRYGSW LGAISCPICR QTVTLLLTVF GEDDQSQDVL 
151 RLHQDINDYN RRFSGQP 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_2a2, frame 3 

TREMBL:CEY38F1A_8 gene: ••Y38F1A.2*; Caenorhabditis elegans cosmid 
Y38F1A, N » 1, Score = 194, P = 2e-15 

PIR:T05222 hypothetical protein F17I5.130 - Arabidopsis thaliana, N - 
1, Score = 159, P = 1.4e-10 

TREMBLNEW:AB025011_1 gene: "TRIF"; product: "Trif-d"; Mus musculus 
mRNA for Trif-d, complete cds . , N = 1, Score = 108, P = 2.6e-G6 

PIR:A37241 52K autoantigen Ro/SS-A - human, N » 1, Score = 115, p = 
5e-05 


>TREMBL:CEY38F1A_8 gene: "Y38F1A.2"; Caenorhabditis elegans cosmid Y38F1A 
Length - 283 

HSPs: 

Score = 194 (29.1 bits). Expect = 2.0e-15, P = 2.0e-15 
Identities = 52/149 (34%), Positives = 78/149 (52%) 


Query: 16 DSVIEGVSDQVLVAVVVSFALIATLVYALFRNVHQNIHPENQELVRVLREQLQTEQDAPA 75 

D +E ++ 0+ +A+ VF++ + A Q E RQ+ T++ 

Sbjct: 41 DPDVE-LATQITMAIAVIF-IVKAIFDAWQSRRRQRAASRMDENAE— RNQIITQRRISE 96 

Query: 76 ATRQQFYTDMYCPICLHQASFPVETNCGHLFCGACIIAYWRYGSWLGA-ISCPICRQTVT 134 

A Q + cpicL asfpv t+cgh+fc cii yw+ + C +CR T 

Sbjct: 97 ALHQSSHE CPICLANASFPVLTDCGHIFCCECHQYWQQSKAIVTPCDCAMCRSTFY 153 

Query: 135 LLLTV FGEDDQSQDVLRLHQ-DINDYNRRFS 164 

+LL V G +++ D ++ + I+DYNRRFS 

Sbjct: 154 MLLPVHWPTMGTSEETDDHIQENNIRIDDYNRRFS 188 

Pedant information for DKFZphfbr2_2a2, frame 3 


Report for DKF2phfbr2_2a2 . 3 

(LENGTH! 167 

[MW] 18941.65 

(pU 4.91 

[HOMOLl TREMBL:CEY3BF1A_8 gene: -Y38F1A.2"; Caenorhabditis elegans cosmid Y38F1A le-13 

(FUNCATJ 06.10 assembly of protein complexes [S, cerevisiae, YDR265w) le-04 

[FUNCATJ 30-19 peroxisomal organization [s. cerevisiae, YDR265wi le-04 

(FUNCATJ 99 unclassified proteins (S. cerevisiae, YLR323cl 2e-04 

(BLOCKS) BL00518 Zinc finger, C3HC4 type, proteins 

(PRGSITEJ 2INC_FINGER_C3HC4 1 

(PFAM) Zinc finger7 C3HC4 type (RING finger) 

(KW] Irregular 

(KW] 3D 

[KWJ LOW^COMPLEXITY 6.59 % 

SEQ makyqgevqslkldddsviegvsdqvlvawvsfaliatlvyalfrnvhqnihpenqelv 

SEG xxxxxxxxxxx 

Irmd- 

SEQ Rvlreqlqteqdapaatrqqfytdmycpiclhqasfpvetncghlfcgaciiaywrygsw 

SEG 

Irmd- HHHHHHBTTTTTEETTTEEEETTTEEEEHHHHH---HHHHH 


SEQ lgaiscpicrqtvtllltvfgeddqsqdvlrlhqdindynrrfsgqp 
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SEG 

Irnid- HCCB-TTTTT 


PS00518 


Prosite for DKFZphfbr2_2a2 .3 
102->112 2INC_FINGER_C3HC4 PDCX:O0449 


Pfam for DKF2phfbr2_2a2.3 


HMM_NAME 

HMM 

Query 

HMM 

Query 


Zinc finger, C3HC4 type (RING finger) 

*CPICFcTFQlDyPWPFdePmMlPCgHsFCypCIrrW cP 

CPIC L+ P++++CGH+FC +CI+ + **"CP 

87 CPIC LHQ— ASFPVETNCGHLFCGACIIAYWRYGSWLGAISCP 127 

mC* 
+C 

128 IC 129 
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DKFZphfbr2_2bl7 


group: transmembrane protein 

DKFZphfbr2_2bl7 encodes a novel 285 amino acid protein with similarity to D. melanoqaster 30K 
protein. 

The protein contains 3 transmembrane regions. 

No informative BLAST results; no predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes and as a new marker for neuronal cells. 


similarity to Drosophila hypothetical 30K protein 

complete cDNA, complete cds, EST hits 
TRANSMEMBRANE 3 

Sequenced by Qiagen 

Locus: unknown 

Insert length: 1426 bp 

Poly A stretch at pos. 1345, polyadenylation signal at pos. 1330 

1 GGGGGTATTT CCAAGGACTC CAAAGCGAGG CCGGGGACTG AAGGTGTGGG 
51 TGTCGAGCCC TCTGGCAGAG GGTTAACCTG GGTCAAATGC ACGGATTCTC 
101 ACCTCGTACA GTTACGCTCT CCCGCGGCAC GTCCGCGAGG ACTTGAAGTC 
151 CTGAGCGCTC AAGTTTGTCC GTAGGTCGAG AGAAGGCCAT GGAGGTGCCG 
201 CCACCGGCAC CGCGGAGCTT TCTCTGTAGA GCATTGTGCC TATTTCCCCG 
251 AGTCTTTGCT GCCGAAGCTG TGACTGCCGA TTCGGAAGTC CTTGAGGAGC 
301 GTCAGAAGCG GCTTCCCTAC GTCCCAGAGC CCTATTACCC GGAATCTGGA 
351 TGGGACCGCC TCCGGGAGCT GTTTGGCAAA GATGAACAGC AGAGAATTTC 
401 AAAGGACCTT GCTAATATCT GTAAGACGGC GGCTACAGCA GGCATCATTG 
451 GCTGGGTGTA TGGGGGAATA CCAGCTTTTA TTCATGCTAA ACAACAATAC 
501 ATTGAGCAGA GCCAGGCAGA AATTTATCAT AACCGGTTTG ATGCTGTGCA 
551 ATCTGCACAT CGTGCTGCCA CACGAGGCTT CATTCGTTAT GGCTGGCGCT 
601 GGGGTTGGAG AACTGCAGTG TTTGTGACTA TATTCAACAC AGTGAACACT 
651 AGTCTGAATG TATACCGAAA TAAAGATGCC TTAAGCCATT TTGTAATTGC 
701 AGGAGCTGTC ACGGGAAGTC TTTTTAGGAT AAACGTAGGC CTGCGTGGCC 
751 TGGTGGCTGG TGGCATAATT GGAGCCTTGC TGGGCACTCC TGTAGGAGGC 
801 CTGCTGATGG CATTTCAGAA GTACTCTGGT GAGACTGTTC AGGAAAGAAA 
851 ACAGAAGGAT CGAAAGGCAC TCCATGAGCT AAAACTGGAA GAGTGGAAAG 
901 GCAGACTACA AGTTACTGAG CACCTCCCTG AGAAAATTGA AAGTAGTTTA 
951 CAGGAAGATG AACCTGAGAA TGATGCTAAG AAAATTGAAG CACTGCTAAA 
1001 CCTTCCTAGA AACCCTTCAG TAATAGATAA ACAAGACAAG GACTGAAAGT 
1051 GCTCTGAACT TGAAACTCAC TGGAGAGCTG AAGGGAGCTG CCATGTCCGA 
1101 TGAATGCCAA CAGACAGGCC ACTCTTTGGT CAGCCTGCTG ACAAATTTAA 
1151 GTGCTGGTAC CTGTGGTGGC AGTGGCTTGC TCTTGTCTTT TTCTTTTCTT 
1201 TTTAACTAAG AATGGGGCTG TTGTACTCTC ACTTTACTTA TCCTTTWUiTT 
1251 TAAATACATA CTTATGTTTG TATTAATCTA TCAATATATG CATACATGAA 
1301 TATATCCACC CACCTAGATT TTAAGCAGTA AATAAAACAT TTCGCAAAAG 
1351 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
1401 AAAAAAAAAA AAAAAAAAAA AAAAAA 


BLAST Results 


Entry HSG19630 from database EMBL: 
human STS A001T27. 
Score = 961, P « 1.2e-36, identities « 193/194 


Medline entries 


No Medline entry 


Peptide information for frame 3 


ORF from 189 bp to 1043 bp; peptide length: 285 
Category: similarity to unknown protein 


198 


wo 01/12659 


PCT/IBOO/01496 


1 MEVPPPAPRS FLCRALCLFP RVFAAEAVTA DSEVLEERQK RLPYVPEPYY 
51 PESGWDRLRE LFGKDEQQRI SKDLANICKT AATAGIIGWV YGGIPAFIHA 
101 KQQYIEQSQA EIYHNRFDAV QSAHRAATRG FIRYGWRWGW RTAVFVTIFN 
151 TVNTSLNVYR NKDALSHFVI AGAVTGSLFR INVGLRGLVA GGIIGALLGT 
201 PVGGLLMAEX2 KYSGETVQER KQKDRKALHE LKLEEWKGRL QVTEHLPEKI 
251 ESSLQEDEPE NDAKKIEALL NLPRNPSVID KQDKD 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_2bl7, frame 3 

PIR:JQ1024 hypothetical 30K protein (DinRP140 5' region) - fruit 
(Drosophila melanogaster) , N « 1, Score « 312, P « 6.1e-28 


>PIR:JQ1024 hypothetical 3DK protein (DmRP140 5' region) - fruit flv 
(Drosophila melanogaster) ^ 
Length =261 


HSPs: 

Score - 312 
Identities - 


Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 


30 
23 
90 
83 
150 
143 
210 
203 


(46.8 bits). Expect = 6.1e-28, P = 6.1e-28 
- 68/231 (29%), Positives = 125/231 (54%) 

ADSEVLEERQKRLPYVPEPYYPESGWDRLRELFGKDEQQRISKDLANICKTAATAGIIGW 89 
AD V +E + ++ E+G +RL+++F DE I +L ++ + +IG 

ADEI VDKENKTYKAFLASKPPEETGLERLKQMm DEFGSI FSELNS VYQAGFLGFLIGA 82 

VYGGI PAFIHAKQQYIEQSQAEIYHNRFDAVQSAHRAATRGFIRYGWRWGWRTAVFVTI F 1 4 9 

+YGG+ A ++E +QA + + FDA + T F + G++WGWR +F T + 

I YGGVTQSRVAYMNFMENNQATAFKSHFDAKKKLQDQFTVNFAKGGFKWGWRVGLFTTS Y 142 

NTVNTSLNVYRNKDALSHFVIAGAVTGSLFRINVGLRGLVAGGIIGALLGTPVGGLLMAF 209 
+ T ++VYR K ++ . ++ AG++TGSL+++++GLRG+ AGGIIG LG G + 

FGIITCMSVYRGKSSIYEYLAAGSITGSLYKVSLGLRGMAAGGIIGGFLGGVAGVTSLLL 202 

QK YSGETVQERKQKDRKALHELKLEEWKGRLQVTEHLPEKIESSLQEDEPE 2 60 

K SG +++E ++ ++K RL E++ + + +++ PE 

MKASGTSMEE VRYWQYKWRLDRDENIQQAFKKLTEDENPE 242 


Pedant information for DKF2phfbr2_2bl7, frame 3 


Report for DKFZphfbr2_2bl7.3 


[LENGTH! 285 

tMW) 32177.88 

fpl] 8.65 

me?^ogaster) 7e-2o''°'°^' hypothetical 30K protein (DmRP140 5' region) - fruit fly (Drosophila 

(PROSITEJ MYRISTYL 7 

[PROSITE] CK2_PH0SPH0_SITE 5 

f PROSITEJ ASN_GLYCOSYLATION 1 

tKWJ SIGNALPEPTIDE 25 

[KW] TRANSMEMBRANE 3 

fKW) LOW_COMPLEXITY 5.96 % 


SEQ MEVPPPAPRSFLCRALCLFPRVFAAEAVTADSEVLEERQKRLPYVPEPYYPESGWDRLRE 

SEG 

PRD cccccccceeeeeeeeeehhhhhhhhhhhhhhhhhhhi^ 

MEM 

SEQ LFGKDEQQRISKDLANICKTAATAGIIGWVYGGIPAFIHAKQQYIEQSQAEIYHNRFDAV 
SEG 

PRD hhcccchhhhhhhhhhhhhhhhcccceeeeccccchhhhhhhhhhhhhh^^ 

W^-M MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ QSAHRAATRGFIRYGWRWGWRTAVFVTIFNTVNTSLNVYRNKDALSHFVIAGAVTGSLFR 

SEG 

PRD hhhhhhhhhhhccccccccceeeeeeeeccccccceeecccccccceeeeeccccc^^ 

MEM MMMMMMMMMMMMMMMMM^4MMMMMMMM M 

SEQ INVGLRGLVAGGIIGALLGTPVGGLLMAFQKYSGETVQERKQKDRKALHELKLEEWKGRL 
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SEG - , xxxxxxxxxxxxxxxxx 

PRD eecccccccccceeeeeccccccchhiihhhhhccchhhhhhtihhhhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMM^ffllMMMMMMMMMMMM 

SEO QVTEHLPEKIESSLQEDEPENDAKKIEALLNLPRNPSVIDKQDKD 

SEG 

PRD ccccccccchhhhhccccccchhhhhhhhhhcccccceeeccccc 

MEM 


Prosite for DKFZphfbr2_2bl7 . 3 


PSOOOOl 

153->157 

ASN GLYCOSYLATION 

PDOCOOOOl 

PS00006 

53->57 

CK2 PHOSPHO 

SITE 

PDOC00006 

PS00006 

108->112 

CK2 PHOSPHO^ 

"SITE 

PDOC00006 

PS00006 

216->220 

CK2 PHOSPHO" 

"SITE 

PDOCO0006 

PS00006 

253->257 

CK2 PHOSPHO*" 

"SITE 

PDOC00006 

PS00006 

277->281 

CK2 PHOSPHO" 

'site 

PDOC00006 

PS00008 

92->98 

MYRISTYL 


PDOC0O008 

PS00008 

172->178 

MYRISTYL 


PDOC00008 

PS00008 

187->193 

MYRISTYL 


PDOC00008 

PS00008 

191->197 

MYRISTYL 


PDOC00008 

PS00008 

195->201 

MYRISTYL 


PDOC00008 

PS00008 

199->205 

MYRISTYL 


PDOC00008 

PS00008 

204->210 

MYRISTYL 


PDOC00008 


(No Pfam data available for DKFZphfbr2_2bl7.3) 
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DKFZphfbr2_2b5 


group: cell structure and motility 

DKFZphfbr2_2b5 encodes a novel 957 amino acid protein with strong similarity to collagens. 

The novel protein contains the typical (xxG)n repeat of collagen proteins and a 

Pfam von Willebrand factor type A domain. Therefore, the protein seems to be a new collaaen 

alpha chain- ^ 

The new protein can find application in modulation of connective tissue, bone and cartilage 
development and maintainance. 


similarity to collagen proteins 

shows typical (xxG)n repeat of collagen proteins 
[PFAM] von Willebrand factor type A domain 

Sequenced by Qiagen 

Locus: /raap="6** 

Insert length: 4160 bp 

Poly A stretch at pos. 4141, polyadenylation signal at pos. 4119 


1 GGGGGCCCGC TGCAGGGAGA ACGGACTCCG GGCGGAGGGC AGCCAATCCG 
51 TTTCAGCGCA GGTCTTGCTC GGGTTGGGCl TGCCACTGCC TGGAACATAC 
101 CTGTCCCCCT GGCGCAACAC TCAGCTGGCT GCGACCGCAA CCCCGAGCCT 
151 GGACACTGCG CCAGGAATCC TAAAACCAAA ATATTAGAAC GAAAACAGAA 
201 ACATGGCTCA CTATATTACA TTTCTCTGCA TGGTTTTGGT GCTGCTTCTT 
251 CAGAATTCTG TGTTAGCTGA AGATGGGGAA GTAAGATCAA GTTGTCGTAC 
301 TGCTCCGACA GATTTAGTTT TCATCTTAGA TGGCTCTTAT AGTGTTGGCC 
351 CAGAAAACTT TGAAATAGTG AAAAAGTGGC TTGTCAATAT CACAAAAAAC 
401 TTTGACATAG GGCCGAAGTT TATTCAAGTT GGAGTGGTTC AATATAGTGA 
451 CTACCCTGTG CTGGAGATTC CTCTCGGAAG CTATGATTCA GGAGAACATT 
501 TGACGGCAGC AGTGGAATCC ATACTCTACT TAGGAGGAAA CACAAAGACA 
551 GGGAAGGCCA TCCAGTTTGC GCTCGATTAC CTTTTTGACA AGTCCTCACG 
601 ATTTCTGACT AAGATAGCAG TGGTACTTAC GGATGGCAAG TCCCAAGATG 
651 ACGTCAAGGA TGCAGCTCAA GCAGCAAGAG ATAGTAAGAT AACATTATTT 
701 GCTATTGGTG TTGGTTCAGA AACAGAAGAT GCCGAACTTA GAGCTATTGC 
751 CAACAAGCCT TCGTCTACTT ATGTGTTTTA TGTGGAAGAC TATATTGCAA 
801 TATCCAAAAT AAGGGAAGTG ATGAAGCAGA AACTTTGTGA AGAATCTGTC 
851 TGTCCAACAC GAATTCCAGT GGCAGCTCGT GATGAAAGGG GATTTGATAT 
901 TCTTTTGGGT TTAGATGTAA ATAAAAAGGT TAAGAAAAGA ATACAGCTTT 
951 CACCAAAAAA GATAAAAGGA TATGAAGTAA CATCAAAAGT TGATTTATCA 
1001 GAACTCACAA GCAATGTTTT CCCAGAAGGT CTTCCTCCAT CATATGTATT 
1051 TGTGTCTACT CAAAGATTTA AAGTCAAGAA AATTTGGGAT TTATGGAGAA 
1101 TATTAACTAT TGATGGAAGG CCACAAATAG CAGTTACCTT AAATGGTGTG 
1151 GACT^AAATCT TATTATTTAC AACAACCAGC GTAATTAATG GCTCACAAGT 
1201 GGTTACCTTT GCTAACCCTC AAGTTAAGAC GTTGTTTGAT GAAGGCTGGC 
1251 ACCAAATTCG TCTCTTAGTA ACAGAACAAG ATGTGACTTT GTATATTGAT 
1301 GACCAACAAA TTGAAAACAA GCCCTTACAT CCAGTTTTAG GGATCTTGAT 
1351 CAATGGGCAA ACCCAAATTG GAAAATATTC TGGAAAAGAA GAAACTGTTC 
1401 AGTTTGATGT CCAAAAGTTG CGAATCTACT GTGACCCAGA ACAGAACAAG 
1451 CGGGAGACAG CATGTGAGAT TCCTGGATTT AATGGAGAGT GCCTTAATGG 
1501 TCCCAGTGAT GTAGGTTCAA CTCCAGCTCC CTGTATTTGT CCTCCGGGAA 
1551 AACCAGGACT TCAAGGCCCC AAAGGTGACC CTGGACTGCC TGGGAACCCT 
1601 GGCTACCCTG GACAACCTGG TCAAGATGGT AAGCCTGGAT ATCAGGGAAT 
1651 TGCAGGGAGA CCAGGTGTTC CAGGATCTCC AGGAATACAA GGAGCTCGAG 
1701 GACTACCAGG TTACAAAGGA GAACCAGGGC GAGATGGTGA CAAGGGTGAT 
1751 CGTGGACTTC CTGGTTTTCC TGGGCTTCAT GGCATGCCAG GATCAAAGGG 
1801 TGAAATGGGT GCCAAAGGAG ACAAAGGATC ACCTGGATTT TATGGCAAAA 
1851 AGGGTGCAAft AGGTGAAAAG GGGAATGCTG GCTTCCCTGG CCTCCCTGGA 
1901 CCTGCTGGAG AACCAGGAAG ACATGGAAAG GATGGATTAA TGGGTAGTCC 
1951 CGGTTTCAAG GGAGAAGCAG GATCCCCTGG TGCTCCGGGG CAGGATGGAA 
2001 CACGGGGAGA GCCTGGAATC CCAGGATTTC CTGGAAACCG AGGATTAATG 
2051 GGCCAAAAGG GAGAAATTGG GCCTCCAGGA CAGCAAGGAA AAAAAGGAGC 
2101 CCCAGGGATG CCTGGTTTAA TGGGAAGCAA TGGCTCACCA GGCCAGCCTG 
2151 GAACACCGGG ATCTAAGGGA AGCAAAGGTG AACCTGGAAT TCAAGGGATG 
2201 CCTGGGGCTT CAGGGCTCAA GGGAGAACCA GGAGCAACGG GTTCCCCAGG 
2251 AGAACCAGGA TACATGGGTT TACCCGGGAT TCAAGGAAAA AAGGGGGACA 
2301 AAGGAAATCA AGGTGAAAAA GGTATTCAGG GTCAAAAGGG AGAAAATGGA 
2351 AGACAGGGAA TTCCAGGGCA ACAGGGAATT CAAGGCCATC ATGGTGCAAA 
2401 AGGAGAGAGA GGTGAAAAGG GAGAACCTGG TGTCCGAGGT GCCATTGGAT 
2451 CAAAAGGAGA ATCTGGGGTG GATGGCTTGA TGGGGCCCGC AGGTCCTAAG 
2501 GGGCAACCTG GGGATCCAGG TCCTCAGGGA CCCCCAGGTT TGGATGGGAA 
2551 GCCCGGAAGA GAGTTTTCAG AACAATTTAT TCGACAAGTT TGCACAGATG 
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2601 TAATAAGAGC CCAGCTACCA GTCTTACTTC AGAGTGGAAG AATTAGAAAT 
2651 TGTGATCATT GCCTGTCCCA ACATGGCTCC CCGGGTATTC CTGGGCCACC 
2701 TGGTCCGATA GGCCCAGAGG GTCCCAGAGG ATTACCTGGT TTGCCAGGAA 
2751 GAGATGGTGT TCCTGGATTA GTGGGTGTCC CTGGACGTCC AGGTGTCAGA 
2801 GGATTAAAAG GCCTACCAGG AAGAAATGGG GAAAAAGGGA GCCAAGGGTT 
2851 TGGGTATCCT GGAGAACAAG GTCCTCCTGG TCCCCCAGGT CCAGAGGGCC 
2901 CTCCTGGAAT AAGCAAAGAA GGTCCTCCAG GAGACCCAGG TCTCCCTGGC 
2951 AAAGATGGAG ACCATGGAAA ACCTGGAATC CAAGGGCAAC CAGGCCCCCC 
3001 AGGCATCTGC GACCCATCAC TATGTTTTAG TGTAATTGCC AGAAGAGATC 
3051 CGTTCAGAAA AGGACCAAAC TATTAGTGTC TGATGCCTCA TTCAGCAGCC 
3101 TAGGCATGGT GCTTTTTCTG TGGTCTTTTG CATCTCAGGA AGATAACCAA 
3151 CAGTATCCCT TGAAAAGAAA CTTAAGTACC TCGGTGTTTT TATTTTTTTT 
3201 TTCTTATGGA AAAAAATATA AAAGATCACA TATACTGATT TTAAAGGCTC 
3251 CTCAGTCATT TGGAGCCCTT GGATTAGCAG CATTAATTAA ATCTCAAGGG 
3301 TTTCTTGTAA AGTCCATTTA TGTTAATCAA AGTTGAATAT AAAAATCCAC 
3351 CATTGCCTGT TAGCCAGTCA GTTTTAGTCA CTGTGAAATA TTTCACATTC 
3401 AGCCTCCATG CAGTAGAGAT TTGAGTTTAA TTTCATGTCC ATGTGACTTT 
3451 CATGTTTCCT ATCTCATAGC TCATGCTACT ACATAAGCCA AAACATGTAT 
3501 CTCATCATTG GAAGTAAGAT CAGGGCTGAT ATTCACCTGG GATAGACAGT 
3551 ATTGGTGAAC TACTCATTTA CTACAGTGTC TCAGCCTTGA TAAAGGGCAG 
3601 TGGATTGCCT GTTGTTCGGT GTTGTGAATA GCACCTCTGA ATAAGATTAG 
3651 AGTGTTTCTT AATTCATTTC AAACTCTAAA ATTAGATTAA TGGTGGTGCT 
3701 AAGT^GAGT ATTAATTACT TTGGGAATGG TCAAAATTAA CATTAAAAAC 
3751 ATTTTAGACA AAAAGTTTCA TTGTACATTC AAAGAAAATG TAAGTTTGGA 
3801 AGTACTAAAA GACTATTTTA TACTTGTTGA TTAATCGGAA TGTTTGTTGT 
3851 ATGCCTTCAT TTTCCATTTC ACTTATATGT GCATGTCCAT ATATGTTAAT 
3901 TTTCATTGTA GCAAAGCTAA TGGAAATAAA GCTAATGCTC TAGTTGAAAG 
3951 AAAAGGAAT^ CTCCTGAAAT CCTAGAATGT CTTGTTATTT TTAGCTGACT 
4001 GTAAAATATT ATGAACAGTC TTTGTGTATT GTGCTTAATG CTTTTGTAAG 
4051 AAACAGAATT TGAAATATTT CATCCTTGTC ATGCTCAAAA TTTTGTTACA 
4101 TGCTTGTTAT TCAGAGTATA ATAAAGTTTT GTACAGGCCT GAAAAAAAAA 
4151 AAAAAAAAAA 


BLAST Results 


Entry HS682J15 from database EMBLNEW: 

Human DNA sequence *** SEQUENCING IN PROGRESS *** from clone 682J15 
Score *= 6240, P ^ O.Oe+00, identities = 1256/1263 
13 exons matching Bp 2015-4118 

Entry HS708F5 from database EMBLNEW: 

Human DNA sequence *** SEQUENCING IN PROGRESS *** from clone 708F5 
Score = 2775, P « l,0e-221, identities = 739/912 
10 exons matching Bp 5-1745 


Medline entries 


No Medline entry 


Peptide information for frame 2 


ORE from 203 bp to 3073 bp; peptide length: 957 
Category: similarity to known protein 


1 MAHYITFLCM VLVLLLQNSV LAEDGEVRSS CRTAPTDLVF ILDGSYSVGP 
51 ENFEIVKKWL VNITKNFDIG PKFIQVGVVQ YSDYPVLEIP LGSYDSGEHL 
101 TAAVESILYL GGNTKTGKAI QFALDYLFDK SSRFLTKIAV VLTDGKSQDD 
151 VKDAAQAARD SKITLFAIGV GSETEDAELR AIANKPSSTY VFYVEDYIAI 
201 SKIREVMKQK LCEESVCPTR IPVAARDERG FDILLGLDVN KKVKKRIQLS 
251 PKKIKGYEVT SKVDLSELTS NVFPEGLPPS YVFVSTQRFK VKKIWDLWRI 
301 LTIDGRPQIA VTLNGVDKIL LFTTTSVING SQWTFANPQ VKTLFDEGWH 
351 QIRLLVTEQD VTLYIDDQQI ENKPLHPVLG ILINGQTQIG KYSGKEETVQ 
401 FDVQKLRIYC DPEQNNRETA CEIPGFNGEC LNGPSDVGST PAPCICPPGK 
451 PGLQGPKGDP GLPGNPGYPG QPGQDGKPGY QGIAGTPGVP GSPGIQGARG 
501 LPGYKGEPGR DGDKGDRGLP GFPGLHGMPG SKGEMGAKGD KGSPGFYGKK 
551 GAKGEKGNAG FPGLPGPAGE PGRHGKDGLM GSPGFKGEAG SPGAPGQDGT 
601 RGEPGIPGFP GNRGLMGQKG EIGPPGQQGK KGAPGMPGLM GSNGSPGQPG 
651 TPGSKGSKGE PGIQGMPGAS GLKGEPGATG SPGEPGYMGL PGIQGKKGDK 
701 GNQGEKGIQG QKGENGRQGI PGQQGIQGHH GAKGERGEKG EPGVRGAIGS 
751 KGESGVDGLN GPAGPKGQPG OPGPQGPPGL DGKPGREFSE QFIRQVCTDV 
801 IRAQLPVLLQ SGRIRMCDHC LSQHGSPGIP GPPGPIGPEG PRGLPGLPGR 
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851 DGVPGLVGVP GRPGVRGLKG LPGRNGEKGS QGFGYPGEQG PPGPPGPEGP 
901 PGISKEGPPG DPGLPGKDGD HGKPGIQGQP GPPGICDPSL CFSVIARRDP 
951 FRKGPNY 

BLASTP hits 
Entry HSC0L7A1X_1 from database TREMBL: 

gene: "C0L7A1"; product: "collagen type VII"; Homo sapiens (clones - 
CW52-2, CW27-6, CW15-2, CW26-5, 11-67) collagen type vii interqenic' 
region and (C0L7A1) gene, complete cds. 

Score « 949, P 3.4e-122, identities « 237/553, positives « 281/553 
Entry CA17_HUMAN from database SWISSPROT: 

COLLAGEN ALPHA 1 (VII) CHAIN PRECURSOR (LONG-CHAIN COLLAGEN) (LC 
COLLAGEN), >TREMBL : HSC0L7A1_1 gene: "C0L7A1"; product: "alpha-l type 
vn collagen"; Human alpha-1 type VII collagen (C0L7A1) mRNA, complete 
cds . *^ 

Score = 949, P » 3.6e-122, identities =« 237/553, positives = 281/553 


Alert BLASTP hits for DKFZphfbr2_2b5, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_2b5, frame 2 


Report for DKFZphfbr2_2b5. 2 


[LENGTH] 

[MW) 

IPIJ 

[HOMOL] 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

(BLOCKS] 

[SCOP] 

[SCOP] 

[EC] 

[PIRKW] 

[PIRKW] 

[PIRKWJ 

[PIRKW] 

(PIRKW) 

(PIRKW] 

(PIRKW) 

(PIRKW) 

(PIRKW) 

(PIRKW) 

(PIRKW) 

(PIRKW) 

(PIRKW) 

(PIRKW) 

(PIRKW) 

(PIRKW] 

(PIRKW) 

(PIRKW] 

(PIRKW) 

(PIRKW] 

(PIRKWJ 

(PIRKW) 

[PIRKW] 

(PIRKW) 

(PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

(PIRKW) 

(PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

(PIRKWJ 

[SUPFAMJ 

[SOPFAM] 

[SUPFAM] 

(SUPFAM) 

(SOPFAM] 


957 

99413.38 
8.49 

PIR:A40020 collagen alpha l(XII) chain precursor - chic)cen 9e-90 

BL01119B Copper-fist domain proteins 

BL00313B 

BL01113A Clq domain proteins 

BL00420A Speract receptor repeat proteins domain proteins 

dlzoob_ 3.45.1.1.1 integrin CDlla/CD18 (LFA-l) (Human (Horn 2e-58 

dlido 3.45.1.1.2 Integrin CR3 (CDllb/CD18) , alpha subunit (Huma 86-62 

3.1.1.7 Acetylcholinesterase 7e-24 

blocked amino end le-43 

duplication 7e-46 

cornea le-35 

lung 2e-40 

leukocyte le-42 

skin le-40 

transmembrane protein le-37 
cartilage 3e-59 
hydroxylysine 4e-62 
connective tissue 3e-43 
triple helix 5e-82 
homotrimer 2e-37 
bone 6e-40 

Alport syndrome le-42 

laroinin binding 2e-40 

liver 2e-40 

glycoprotein 5e-82 

carboxylic ester hydrolase 7e-24 

disulfide bond 7e-4 6 

cell binding 7e-4 6 

heterotrimer 4e-62 

calcium binding 8e-28 

alternative splicing 5e-82 

coiled coil 5e-82 

basement membrane 7e-4 6 

trimer 5e-82 

pyroglutamic acid 3e-43 

hydroxyproline 4e-62 

extracellular matrix 5e-82 

chondroitin sulfate proteoglycan 6e-41 

sulfoprotein 7e-39 

Icidney le-42 

angiogenesis inhibitor 6e-36 

Ehlers-Danlos syndrome 2e-40 

fibronectin type III repeat homology 5e-82 

scavenger receptor cysteine-rich domain homology le-37 

C-type lectin homology 6e-30 

collagen alpha 2(1) chain 5e-40 

collagen alpha 1(1) chain 6e-44 
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(SUPFAMl 

(SUPFAM) 

(SUPFAMJ 

[SUPFAMJ 

(SUPFAMJ 

[SUPFAMJ 

I SUPFAM I 

[SUPFAMJ 

(SUPFAMl 

[SUPFAMl 

(SUPFAM] 

(SUPFAM] 

(SUPFAMJ 

(SUPFAMl 

(SUPFAMl 

(PROSITEJ 

(PROSITE) 

[PROSITEJ 

(PROSITE] 

(PROSITEJ 

(PFAMJ 

[KWJ 

IKWJ 

(KW) 

[KW] 


fibrillar collagen carboxyl-terminal homology 6e-4 4 
animal Kunitz-type proteinase inhibitor homology 2e-38 
fibronectin type II repeat homology 6e-21 
complement Clq carboxyl- terminal homology le-38 
collagen alpha 3(VI) chain 2e~31 
collagen alpha 1(IV) chain 7e-46 
collagen alpha 1(VI) chain 2e-37 

von willebrand factor type C repeat homology 6e-44 
unassigned collagens 4e-62 

von Willebrand factor type A repeat homology 5e-82 

collagen alpha 1 (XIV) chain 5e-82 

pulmonary surfactant protein D 6e-30 

collagen alpha 1 (V) chain 7e-39 

collagen alpha I(VIII) chain le-38 

EGF homology le-35 

AMI DAT I ON 3 

MYRISTYL 14 

CK2_PH0SPH0_SITE 13 

PKCPHOSPHOSITE 8 

ASN__GLYCOSYLATION 2 

von Willebrand factor type A domain 

Irregular 

3D 

SIGNAL_PEPTIDE 23 
LOW_COMPLEXITY 24.24 % 


SEQ 
SEG 
latzB 

SEQ 
SEG 
latzB 

SEQ 
SEG 
latzB 

SEQ 
SEG 
latzB 

SEQ 
SEG 
latzB 

SEQ 
SEG 
latzB 


MAHYITFLCMVLVLLLQNSVLAEDGEVRSSCRTAPTDLVFILDGSYSVGPENFEIVKKWL 

CCCEEEEEEEECCCCCCHHHHHHHHHHH 

VNITKNFDIGPKFIQVGWQYSDYPVLEIPLGSYDSGEHLTAAVESILYLGGNTKTGKAI 
HHHHHHCCBTTTTEEEEEEEETTTEEEEETTTTTTTHHHHHHHHHHCCCCCCCCCHHHHH 
QFALDYLFDKSSRFLTKIAVVLTDGKSQDDVKDAAOAARDSKITLFAIGVGSETEDAELR 
HHHHHHHHCCTTTTTEEEEEEEECCCTTTTHHHHHHHHHHHCEEEEEEEECCCCCHHHHH 
AIANKPSSTYVFYVEDYIAISKIREVMKQKLCEESVCPTRIPVAARDERGFDILLGLDVN 
HHHGGGGGGGCECCHHHHHHHHHCHHHHHHHH 

KKVKKRIQLSPKKIKGYEVTSKVDLSELTSNVFPEGLPPSYVFVSTQRFKVKKIWDLWRI 
LTI DGRPQI AVTLNGVDKI LLFTTTS VI NGSQWTFANPQVKTLFDEGWHQI RLLVTEQD 


SEQ 
SEG 
latzB 

SEQ 
SEG 
latzB 


VTL YI DDQQI ENKPLH PVLGILINGQTQIGKYSGKEETVQFDVQKLRI YCDPEQNNRETA 


CEIPGFNGECLNGPSDVGSTPAPCICPPGKPGLQGPKGDPGLPGNPGYPGQPGQDGKPGY 
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 


SEQ 
SEG 
latzB 


QGIAGTPGVPGSPGIQGARGLPGYKGEPGRDGDKGDRGLPGFPGLHGMPGSKGEMGAKGO 

XX 


SEQ 
SEG 
latzB 

SEQ 
SEG 
latzB 

SEQ 
SEG 
latzB 


KGSPGFYGKKGAKGEKGNAGFPGLPGPAGEPGRHGKDGLMGSPGFKGEAGSPGAPGQDGT 
XXXXXXXXXXXXX 


RGEPGIPGFPGHRGLMGQKGEIGPPGQQGKKGAPGMPGLMGSMGSPGQPGTPGSKGSKGE 

XXXXXXXXXXXXXXXXXXXXXX 


PGIQGMPGASGLKGEPGATGSPGEP6YMGLPGIQGKKGDKGNQGEKGIQGQKGENGRQGI 

XXXXXXXXXXXXXXXXXXXXX 


SEQ 
SEG 
latzB 


PGQOGIQGHHGAKGERGEKGEPGVRGAIGSKGESGVDGLMGPAGPKGQPGDPGPQGPPGL 
xxxxxxxxxxx XXXXXXXXXXXXXXXXXXXX 


SEQ 
SEG 


[DGK PGREFSEQFI RQVCTDVI RAQLPVLLQSGRIRNCDHCLSQHGS PGI PGPPGPIGPEG 
ycxxxy. xxxxxxxxxxxxxxxx 
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latzB 

SEQ PRGLPGLPGRDGVPGLVGVPGRPGVRGLKGLPGRNGEKGSQGFGYPGEQGPPGPPGPEGP 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxx 

latzB 

SEQ PGISKEGPPGDPGLPGKDGDHGKPGIQGQPGPPGICDPSLCFSVIARRDPFRKGPNY 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

latzB 


Prosite for DKFZphfbr2_2b5.2 

PSOOOOl 62->66 ASN_GLYCOSYLATIC»I PDOCOOOOl 

PSOOOOl 329->333 ASN_GLYCOSYLATION PDOCOOOOl 

PS00005 30->33 PKC_PHOSPH0_SITE PDOC00005 

PS00005 116->119 PKC_PHOSPHO_SITE PDOC00005 

PS00005 131->134 PKC_PHOSPHO_SITE PDOC00005 

PS00005 250->253 PKC_PHOSPH0_SITE PDOC00005 

PS00005 260->263 PKC PHOSPHO_SITE PDOC00005 

PS00005 286->289 PKC"PHOSPHO_SITE PDOC00005 

PS00005 393->396 PKC>HOSPHO_SITE PDOC00005 

PS00005 811->814 PKC_PHOSPH0_SITE PDOC0O005 

PS00006 147->151 CK2_PHOSPH0_SITE PDOC00006 

PS00006 172->176 CK2_PHOSPH0_SITE PDOC00006 

PS00006 261->265 CK2_PHOSPH0_SITE PDOC0Q006 

PS00006 343->347 CK2_PHOSPH0_SITE PDOC00006 

PS00006 357->361 CK2_PH0SPH0_SITE PDOC00006 

PS00006 393->397 CK2_PHOSPH0_SITE PDOC00006 

PS00006 419->423 CK2_PHOSPH0_SITE PDOC00006 

PS00006 531->535 CK2_PHOSPH0_SITE PDOC00006 

PS00006 600'>604 CK2_PHOSPH0_SITE PDOC00006 

PS00006 657->661 CK2_PHOSPH0_SITE PDOC00006 

PSpOOOe 681->685 CK2_PH0SPH0_SITE PDOC0O006 

PS00006 750->754 CK2_PHOSPH0_SITE PDOC00006 

PS00006 754->758 CK2_PHOSPH0_SITE PDOC00006 

PS00008 92->98 MYRISTYL PDOC00008 

PS00008 112->118 MYRISTYL PDOC00008 

PS00008 236->242 MYRISTYL PDCXIOOOOS 

PS00008 276->282 MYRISTYL PDOC00008 

PS00008 380->386 MYRISTYL PDOC0O008 

PS00008 494->500 MYRISTYL PDOC00008 

PS00008 527->533 MYRISTYL PDOC00008 

PS00008 596->602 MYRISTYL PDOC00008 

PS00008 638->644 MYRISTYL PDOC00008 

PS00008 650->656 MYRISTYL PDOC00008 

PS00008 653->659 MYRISTYL PDOC00008 

PS00008 665->671 MYRISTYL PDOC00008 

PS00008 743->749 MYRISTYL PDOC00008 

PSO00O8 746->752 MYRISTYL PDOC00008 

PS00009 547->551 AMIDATION PDOC00009 

PS00009 628->632 AMIDATION PDOC00009 

PS00009 694->698 AMIDATION PDOC00009 


Pfara for DKFZphf br2_2b5 . 2 


HNM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 


von Wlllebrand factor type A domain 

*DIVFLIDGSdSIGpqNFNrMKDFIeRJfi4ERMDIgPDwIRVGWQYSdNP 
D+VF++DGS S+GP NF+++K+ ++++ ++DIGP+ I+VGVVQYSD P 
37 DLVFILDGSYSVGPENFEIVKKWLVNITKNFDIGPKFIQVGWQYSDYP 


85 


RqEmr FmFNDYQNKe EI LQa I qqMMyWMgggTNTGeAIQYVvrNMFwee r 
E +++ Y + E++++A+ ++ ++GG T+TG AIQ++++++F +++ 
86 VLE— IPLGSYDSGEHLTAAVESIL-YLGGNTKTGKAIQFALDYLFDKSS 132 

GmRWenvPQVMIIITDGRSQDDIRDplneMrrmaGIqvFalGIGNhDNnn 
•f +++++++TDG+SQDD++D+++++R+ 1+ FAIG-H3 

133 RF LTKIAWLTDGKSQDDVKDAAQAARD-SKITLFAIGVGSETE— 175 

WeELRelASePdEdHVFyVdDFeeLdnMqeqL* 
+ELR IA++P++ +VFYV+D+ +++ ++E + 
176 DAELRAIANKPSSTYVFYVEDYIAISKIREVM 207 
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DKFZphfbr2_2cl 


group: brain derived 

DKFZphfbr2_2cl encodes a novel 697 amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 


unknown 

complete cDNA, complete cds, EST hits 

Sequenced by Qiagen 

Locus: unknown 

Insert length: 3973 bp 

Poly A stretch at pos. 3914, polyadenylation signal at pos. 3900 


1 GGGGGGATTT CGGCGGCGGA AACATGGCGG TCGCGGCCGG GCCGGTAACG 
51 GAGAAAGTTT ACGCCGACAC TGGCCTGTAT TAGCGCGTAT GGCCTCGGGC 
101 CCTCGTTCCC CAAGGCGTGC CGCCTCCCTG TTCTCAGTCG CAGGCTGAAG 
151 CCTTGTCTGC TCTCCTCCTT TTTGGTTTGG TTTTGGAACT GACTCCGAGG 
201 GTTGGGAGAG CGCGTTGGTG GCGACGGCCG AGTCAGATCA CTATAAACAA 
251 AATTTCCACA AGAGAAAATG TTGAAATAGG AGTTGCGGAT ACATTGGATA 
301 TACTGGATGA AATACAAGCG GTTAATTTTT GTAACGTGAG GGAAAAGCCC 
351 ACATTGCTGG TTACATGTGT AAATCACTGC GTTATTGCTT TAGTCATTGT 
401 CTCTATTTAG CAATGACAAG ACTGGAAGAA GTAAATAGAG AAGTGAACAT 
4 51 GCATTCTTCA GTGCGGTATC TTGGCTATTT AGCCAGAATC AATTTATTGG 
501 TTGCTATATG CTTAGGTCTA TACGTAAGAT GGGAAAAAAC AGCAAATTCC 
551 TTAATTTTGG TAATTTTTAT TCTTGGTCTT TTTGTTCTTG GAATCGCCAG 
601 CATACTCTAT TACTATTTTT CAATGGAAGC AGCAAGTTTA AGTCTCTCCA 
651 ATCTTTGGTT TGGATTCTTG CTTGGCCTCC TATGTTTTCT TGATAATTCA 
701 TCCTTTAAAA ATGATGTAAA AGAAGAATCA ACCAAATATT TGCTTCTAAC 
751 ATCCATAGTG TTAAGGATAT TGTGCTCTCT GGTGGAGAGA ATTTCTGGCT 
801 ATGTCCGTCA TCGGCCCACT TTACTAACCA CAGTTGAATT TCTGGAGCTT 
851 GTTGGATTTG CCATTGCCAG CACAACTATG TTGGTGGAGA AGTCTCTGAG 
901 TGTCATTTTG CTTGTTGTAG CTCTGGCTAT GCTGATTATT GATCTGAGAA 
951 TGAAATCTTT CTTAGCTATT CCAAACTTAG TTATTTTTGC AGTTTTGTTA 
1001 TTTTTTTCCT CATTGGAAAC TCCCAAAAAT CCGATTGCTT TTGCGTGTTT 
1051 TTTTATTTGC CTGATAACTG ATCCTTTCCT TGACATTTAT TTTAGTGGAC 
1101 TTTCAGTAAC TGAAAGATGG AAACCCTTTT TGTACCGTGG AAGAATTTGC 
1151 AGAAGACTTT CAGTCGTTTT TGCTGGAATG ATTGAGCTTA CATTTTTTAT 
1201 TCTTTCCGCA TTCAAACTTA GAGACACTCA CCTCTGGTAT TTTGTAATAC 
1251 CTGGCTTTTC CATTTTTGGA ATTTTCAGGA TGATTTGTCA TATTATTTTT 
1301 CTTTTAACTC TTTGGGGATT CCATACCAAA TTAAATGACT GCCATAAAGT 
1351 ATATTTTACT CACAGGACAG ATTACAATAG CCTTGATAGA ATCATGGCAT 
1401 CCAAAGGGAT GCGCCATTTT TGCTTGATTT CAGAGCAGTT GGTGTTCTTT 
1451 AGTCTTCTTG CAACAGCGAT TTTGGGAGCA GTTTCCTGGC AGCCAACAAA 
1501 TGGAATTTTC TTGAGCATGT TCCTAATCGT TTTGCCATTG GAATCCATGG 
1551 CTCATGGGCT CTTCCATGAA TTGGGTAACT GTTTAGGAGG AACATCTGTT 
1601 GGATATGCTA TTGTGATTCC CACCAACTTC TGCAGTCCTG ATGGTCAGCC 
1651 AACACTGCTT CCCCCAGAAC ATGTACAGGA GTTAAATTTG AGGTCTACTG 
1701 GCATGCTCAA TGCTATCCAA AGATTTTTTG CATATCATAT GATTGAGACC 
1751 TATGGATGTG ACTATTCCAC AAGTGGACTG TCATTTGATA CTCTGCATTC 
1801 CAAACTAAAA GCTTTCCTCG AACTTCGGAC AGTGGATGGA CCCAGACATG 
1851 ATACGTATAT TTTGTATTAC AGTGGGCACA CCCATGGTAC AGGAGAGTGG 
1901 GCTCTAGCAG GTGGAGATAC ACTACGCCTT GACACACTTA TAGAATGGTG 
1951 GAGAGAAAAG AATGGTTCCT TTTGTTCCCG GCTTATTATC GTATTAGACA 
2001 GCGAAAATTC AACCCCTTGG GTGAAAGAAG TGAGGAAAAT TAATGACCAG 
2051 TATATTGCAG TGCAAGGAGC AGAGTTGATA AAAACAGTAG ATATTGAAGA 
2101 AGCTGACCCG CCACAGCTAG GTGACTTTAC AAAAGACTGG GTAGAATATA 
2151 ACTGCAACTC CTGTAATAAC ATCTGCTGGA CTGAAAAGGG ACGCACAGTG 
2201 AAAGCAGTAT ATGGTGTGTC AAAACGGTGG AGTGACTACA CTCTGCATTT 
2251 GCCAACGGGA AGCGATGTGG CCAAGCACTG GATGTTACAC TTTCCTCGTA 
2301 TTACATATCC CCTAGTGCAT TTGGCAAATT GGTTATGCGG TCTGAACCTT 
2351 TTTTGGATCT GCAAAACTTG TTTTAGGTGC TTGAAAAGAT TAAAAATGAG 
2401 TTGGTTTCTT CCTACTGTGC TGGACACAGG ACAAGGCTTC AAACTTGTCA 
2451 AATCTTAATT TGGACCCCAA AGCGGGATAT TAATAAGCAC TCATACTACC 
2501 AATTATCACT AACTTGCCAT TTTTTGTATG CTGTATTTTT ATTTGTGGAA 
2551 AATACCTTGC TACTTCTGTA GCTGCTCTCA CTTTGTCTTT TCTTAAGTAA 
2601 TTATGGTATA TATAAGGCGT TGGGAAAAAA CATTTTATAA TGAAAGTATG 
2651 TAGGGAGTCA AATGCTTACT GTAAATGCAT AAGAGACGTT AAAAATAACA 
2701 CTGCACTTTC AGGAATGTTT GCTTATGGTC CTGATTAGAA AGAAACAGTT 
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2751 GTCTATGCTC TGCAATGGTC AATGATGAAT TACTAATGCC TTATTTTCTA 
2801 GGCATATAAT AATAGTTTAG AGAATGTAGA CCAGATAAAT TTGTTTACTG 
2851 TTTTAAGAAA ACTACCAGTT TACTTACAGA AGATTCTTTT TTCCAAACAG 
2901 TAGGTTTCAT CCAAGACCAT TTGAAGAACT GCAAACTCTT TCTCTTAGAA 
2951 AAGAAAGAGG GCAGCCTAAA ATAAACGCAA AATTTGCTTA TACTCCATCA 
3001 CATTCAGATG TCTTGGTTGT GACTTATTAC CAGTGTGGCA GAGAACCCAA 
3051 GTTACATTTT AGATCAAAAT ATTCTTTATG TAGGTATTGT TAAAAGGCTA 
3101 GAGCCTACAA GTTGCTCTTC CATGCGTTGG TCAGGGGGCC CTGAAAACAC 
3151 TGGTAATATT AAGAGTCTTT CTCAGGGTAA CTTAATGTTT TCTTAATGAA 
3201 CAGTGTTTCC AGCTACAAAT TCTTCCAATA AATTGTCTTC CTTTTTGAAA 
3251 AGTACTCTCA TAGAAGAAAT TTAGCAATTT CTCGTTGACT GACTCAGTCT 
3301 ATTTTAAGTA TTCAGAAAAG ATTTTGATCC CCATTGAGTT AATGCTCTGC 
3351 CTTGAAAATT ATTTTTCTGA TCCTTGTTAG TGATAACATT TTTTTTCTAC 
3401 TGAAGGTCAG AGGATAGGAA ACAAGTATTT CTCTTCTGGT ATACATGTAA 
3451 TGTATTCTGT AAAAAAGTAT TCATATTGGC AATTTTAGTT AGGCATAATA 
3501 TTGTGGTTGT AATTTTTAAA ACTTAGTGTT TTGTCTGATT AAAGCAGGCA 
3551 CTGATCAGGG TATCTCCTAA GAGGTAATTC ACTTCTTATT CCTTTCCAAT 
3 601 AATTATTACA TTCTAAATTT TCATCTATGA GAAATAACAA ACAAGAAGGG 
3651 AATAGAATTA AATTGGGGTA TAATCTAATC TTCATTGTTT AAATGGTTTG 
3701 CCTTCTCACC ATTGAAGCCA TTTTTTTATA GCCTCAGAAA GAGGAAATAA 
3751 TGCCTCCACC ATTTTCTACC TGGTGACTTG AAAATTGAAC TTTTAAGTTA 
3801 GGAAGAAGTT AGAGTCAGGG AACTTGTATA CCACTATCTA TGCAGCATTG 
3851 TTATAGTCTG ATTATTTCTG TGTTTTGAAT ATGATTTTCC TAATGCTCTA 
3901 AATAAAATTT TGTTAAAAAT CAAAAAAAAA AAAAAAAAAA CTTATCGATA 
3951 CCGTCGACCT CGATGATGTC GAC 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 2 


ORF from 365 bp to 2455 bp; peptide length: 697 
Category: putative protein 
Classification: unset 


1 MCKSLRYCFS HCLYLAMTRL EEVNREVNMH SSVRYLGYLA RINLLVAICL 
51 GLYVRWEKTA NSLILVIFIL GLFVLGIASI LYYYFSMEAA SLSLSNLWFG 
101 FLLGLLCFLD NSSFKNDVKE ESTKYLLLTS IVLRILCSLV ERISGYVRHR 
151 PTLLTTVEFL ELVGFAIAST TMLVEKSLSV ILLWALAML IIDLRMKSFL 
201 AIPNLVIFAV LLFFSSLETP KNPIAFACFF ICLITDPFLD lYFSGLSVTE 
251 RWKPFLYRGR ICRRLSVVFA GMIELTFFIl SAFKLRDTHL WYFVIPGFSI 
301 FGIFRMICHI IFLLTLWGFH TKLNDCHKVY FTHRTDYNSL DRIMASKGMR 
351 HFCLISEQLV FFSLLATAIL GAVSWQPTNG IFLSMFLIVL PLESMAHGLF 
401 HELGNCLGGT SVGYAIVIPT NFCSPDGQPT LLPPEHVQEL NLRSTGMLNA 
451 IQRFFAYHMI ETYGCDYSTS GLSFDTLHSK LKAFLELRTV DGPRHDTYIL 
501 YYSGHTHGTG EWALAGGDTL RLDTLIEWWR EKNGSFCSRL IIVLDSENST 
551 PWVKEVRKIN DQYIAVQGAE LIKTVDIEEA DPPQLGDFTK DWVEYNCNSC 
601 NNICWTEKGR TVKAVYGVSK RWSDYTLHLP TGSDVAKHWM LHFPRITYPL 
651 VHLANWLCGL NLFWICKTCF RCLKRLKMSW FLPTVLDTGQ GFKLVKS 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_2cl, frame 2 

PIR:A71148 hypothetical protein PH0395 - Pyrococcus horikoshii. 
Score = 96, P = 0.12 


>PIR:A71148 hypothetical protein PH0395 - Pyrococcus horikoshii 
Length = 288 

HSPs: 


Score = 96 (14.4 bits), Expect = 1.3e-01/P = 1.2e-01 
Identities = 59/234 (25%). Positives = 116/234 (49%) 
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Query: 

77 

Sbjct: 

57 

Query: 

133 

Sbjct: 

116 

Query: 

193 

Sbjct: 

169 

Query: 

-252 

Sbjct: 

226 

Query: 

309 

Sbjct: 

283 


IASILYYYFSMEAASLSLSNLWFGFLL--GL--LCPLDNSSFKNDVKEESTKYLLLTSIV 132 
++ +LYY F+ A ++ L G+LL + L +L N + V+ + K + ++ 


192 


+L +L I YV ++ T+ FL+LVG ++ +L E +L ++ L+ L 


M SF + + F +LL F T +N I + FI I 


P ++ R+ R S+++ + L TF +L +F L +T L ++IP F++ 


Pedant information for DKFZphfbr2_2cl, frame 2 
Report for DKF2phfbr2_2cl . 2 


[LENGTH J 697 

[MW] 79741.46 

Cpl] 8.41 

[KW] TRANSMEMBRANE 11 

(KWJ LOW COMPLEXITY 9.76 % 


SEQ MCKSLRYCFSHCLYLAMTRLEEVNREVNMHSSVRYLGYLARINLLVAICLGLYVRWEKTA 

SEG 

PRD ccceeehhhhhhhhhhhhhhhhhhhhhhccceeeehhhhhhhhhhhhhhhhhhhcccccc 

MEM .•••..■>•■••«•...........,......... MMMMMMMMMMMMMMMMM 

SEQ NSLILVIFILGLFVLGIASILYYYFSMEAASLSLSNLWFGFLLGLLCFLDNSSFKNDVKE 

SEG . .xxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxx 

PRD ccceeeeccccchhhhhchhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccc 

MEM ...MMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMM 

SEQ ESTKYLLLTSIVLRILCSLVERISGYVRHRPTLLTTVEFLELVGFAIASTTMLVEKSLSV 

SEG xxxxxxxxxxxx xxxx 

PRD ccchhhhhhhhhhhhhhhhhhhceeeeccccccchhhhhhhhhhhhhhhhhhhhhhhhhh 

MEM .... MMMMMMMMMMMMMMMMM MMM 

SEQ ILLVVALAMLXIDLRMKSFLAIPNLVIFAVLLFFSSLETPKNPIAFACFFICLITDPFLD 

SEG xxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhcccccccccchhhhhhhhhcccccee 

MEM MMMMMMMMMMMMMM. . .MMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMM. 

SEQ lYFSGLSVTERWKPFLYRGRICRRLSWFAGMIELTFFILSAFKLRDTHLWYFVIPGFSI 

SEG 

PRD eeeccccccccccceeecccccccchhhhhhhhhhhhhhhhhhhccccceeeeeeccccc 

MEM MMMMMMMMMMMMMMMMM M 

SEQ FGIFRMICHIXFLLTLWGFHTKLNDCHKVYFTHRTDYNSLDRIMASKGMRHFCLISEQLV 

SEG 

PRD hhhhhhhhhhhhhhhhhcccccccceeeeeeeccccccchhhhhhhcccchhhhhhhhhh 

MEM MMMMMMMMMMMMMMMM MM 

SEQ FFSLLATAILGAVSWQPTNGIFLSMFLIVLPLESMAHGLFHELGNCLGGTSVGYAIVIPT 

SEG 

PRD hhhhhhhhhhhhcccccccchhhhhhhheeehhhhhhhhhhccccccccccceeeeeeec 

MEM MMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMM 

SEQ NFCSPDGQPTLLPPEHVQELNLRSTGMLNAIQRFFAYHMIETYGCDYSTSGLSFDTLHSK 

SEG 

PRD ccccccccccccccccccccccccchhhhhhhhhhhhhhhhccccccccccccchhhhhh 

MEM 

SEQ LKAFLELRTVDGPRHDTYILYYSGHTHGTGEWALAGGDTLRLDTLIEWWREKNGSFCSRL 

SEG 

PRD hhhhhhhhhccccccceeeeeeccccccccceeeccccchhhhhhhhhhhhccccceeee 

MEM 

SEQ IIVLDSENSTPWVKEVRKINDQYIAVQGAELIKTVDIEEADPPQLGDFTKDWVEYNCNSC 
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SEG 

PRD eeeeecccccccchhhhhhccceeeeccceeeee^eecccccccccccccceeeec^ 

ribM 

SEQ NNICWTEKGRTVKAVYGVSKRWSDYTLHLPTGSDVAKHWMLHFPRITYPLVHLANWLCGL 

S EG 

PRD <=^^eeeecccceeeeeeeecccccceeeecccccchhhhhhhccccccc^ 

MEM 

SEQ NLFWICKTCFRCLKRLKMSWFLPTVLDTGQGFKLVKS 

SEG 

PRD eeeeeehhhhhhhhhhhhhhcceeeeccccccccccc 

MEM 

(No Prosite data available for DKFZphfbr2 2cl.2) 
(No Pfam data available for DKFZphfbr2_2cl .2) 
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group: signal transduction 

DKFZphfbr2_2cl7.3 encodes a novel 146 amino acid protein with similarity to yeast YMR131c and 

mammalian retinoblas toma-binding protein RbAp46 

The protein contains 1 WD-40 repeat, which is typical for the beta-transducin subunit of G- 
proteins. The beta subunits seem to be required for the replacement of GDP by GTP as well as 
for membrane anchoring and receptor recognition. 

The new protein can find application in modulating/blocking G-protein-dependent pathways. 


similarity to YMR131c and retinoblas toma-binding protein RbAp46 

complete cDNA, complete cds, EST hits 

Sequenced by Qiagen 

Locus: unknown 

Insert length: 2248 bp 

Poly A stretch at pos. 2230, polyadenylation signal at pos. 2200 


1 TGGGGAAGAT GGCGGCGCGC AAGGGTCGGC GTCGCACGTG TGAAACCGGG 
51 GAACCCATGG AAGCCGAGTC CGGCGACACA AGTTCCGAGG GCCCGGCCCA 
101 GGTCTACCTG CCCGGCCGGG GGCCGCCGCT ACGCGAAGGG GAGGAGCTGG 
151 TCATGGACGA GGAGGCCTAT GTGCTCTACC ACCGAGCGCA GACTGGCGCC 
201 CCCTGTCTCA GCTTTGACAT AGTCCGGGAT CACCTGGGAG ACAACCGGAC 
251 AGAGCTTCCT CTTACACTTT ACTTCTGTGC TGGGACCCAG GCTGAGAGCG 
301 CCCAGAGCAA CAGACTGATG ATGCTTCGGA TGCACAATCT GCATGGGACA 
351 AAGCCCCCAC CCTCAGAGGG CAGTGATGAA GAAGAAGAGG AGGAAGATGA 
401 AGAGGATGAA GAAGAGCGGA AACCTCAGCT GGAGCTGGCC ATGGTGCCCC 
451 ACTATGGTGG CATCAACCGA GTTCGGGTGT CATGGCTGGG TGAAGAGCCT 
501 GTGGCTGGGG TGTGGTCAGA GAAGGGCCAG GTGGAGGTGT TTGCGCTGCG 
551 GCGGCTTCTG CAGGTGGTGG AGGAGCCCCA GGCCCTGGCA GCCTTCCTCC 
601 GGGATGAGCA GGCCCAAATG AAGCCCATCT TCTCCTTCGC TGGACACATG 
651 GGCGAGGGCT TTGCCCTTGA CTGGTCCCCC CGGGTGACCG GTCGCCTGCT 
701 GACCGGTGAC TGTCAAAAGA ACATCCACCT CTGGACACCT ACGGACGGCG 
751 GCTCCTGGCA CGTGGACCAG CGGCCATTCG TGGGCCACAC ACGCTCTGTG 
801 GAGGACCTGC AGTGGTCACC GACTGAGAAC ACGGTGTTTG CCTCCTGCTC 
851 AGCTGACGCC TCCATCCGCA TCTGGGACAT CCGGGCAGCC CCCAGCAAGG 
901 CCTGCATGCT CACCACAGTC ACCGCCCATG ATGGGGACGT CAATGTCATC 
951 AGCTGGAGCC GCCGGGAGCC CTTCCTGCTC AGTGGCGGGG ATGATGGGGC 
1001 CCTCAAGATC TGGGACCTTC GGCAGTTCAA GTCTGGTTCC CCAGTGGCCA 
1051 CCTTCAAGCA GCACGTGGCC CCCGTGACCT CCGTCGAGTG GCACCCCCAG 
1101 GACAGCGGGG TCTTTGCAGC CTCGGGTGCA GACCACCAGA TCACACAGTG 
1151 GGACCTGGCA GTGGAGCGGG ACCCTGAGGC GGGCGACGTG GAGGCCGACC 
1201 CCGGACTGGC CGACCTCCCG CAGCAGCTGC TGTTCGTGCA CCAGGGCGAG 
1251 ACCGAGCTGA AGGAGCTGCA CTGGCACCCG CAGTGCCCAG GGCTCCTGGT 
1301 CAGCACGGCG CTGTCAGGCT TCACCATCTT CCGCACCATC AGCGTCTGAG 
1351 GCGTCCCACT GGCTCTGATC TTGCTTCCTG CTTGGAAACT GAAGTCGAAT 
1401 TGGGCTCCCC TGGAAGGGGT TCATTCAGGT CTGTTGACTG AGACTGGCCG 
1451 GCCTGTGGGC TGCCGTGATG GATTCTGTTT GACGTATTGT TCTCTAGAAG 
1501 GCCTGGCTCT GATCCAGTGA CCCCTCTCAC CAAAGAACTC GGTTTAACCA 
1551 GGGCTCTGTA AGACCACTCC CACCCAGAGA CTTGTGTGGC CTGGTGTGGC 
1601 CTGTGTGTCG GATTCCTTCC TGTCAGCTGT GACCCATTTG ACCTGTGTCC 
1651 CCAGAACCCA GTTTTTTGTT TGTTTGTTTG AGACGGAGTC TTGGTCTGTC 
1701 GCCCAGGCTG GAGTGCAGTA GCACGATCTT GGCTCACTGC AACCTCCGCC 
1751 TCCTGGGTTA AAGTGATTCT CTCAGCTCAG TCTCCCAGGT AGCTGGGATT 
1801 ACAGGCATGT GCCACCACAC CCCGTTAATT TTTGTATTTT TAGTAGAGAC 
1851 GGGGTTTCAC CATGTTGGCC AGGCTGGTCT CAAATTCTTG ATCTCAAGTG 
1901 ATCTGTCCGC CCCGGCCTCC CAGAGTGCTG GGTTGGGATT ACAGGCGTGA 
1951 GCCACCGCGT CCGGCTCAGG ACCCAGTTTT GGCTGCTGGT TCCCAGCAGG 
2001 GGACTCGGGG GATATACAGT GGCTGCACCA AATTGGAGGT GTGGGTTCCT 
2051 CCAACACAAT TTGCTTCTGC CCGTTGTCTT CCTGCCAGCT GGGTTTGGCC 
2101 AGGATTTCTC CGTGTGGGGG CTACATGCGA CCCTCTCCCC TCCTCCCTGA 
2151 CTTTAGAGGC TGGTGCTGTG TCGGGAGGAA GGTCAGGGCT CCTGAGCAGC 
2201 AATAAAGGAC CAGGAAGAGG CCTGAGGTGG AAAAAAAAAA AAAAAAAA 


BLAST Results 


No BLAST result 
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Medline entries 


No Medline entry 


Peptide information for frame 3 


ORF from 9 bp to 1346 bp; peptide length: 446 
Category: similarity to known protein 
Classification: unset 
Prosite motifs: WD_REPEATS (323-338) 


1 MAARKGRRRT CETGEPMEAE SGDTSSEGPA QVYLPGRGPP LREGEELVMD 
51 EEAYVLYHRA QTGAPCLSFD IVRDHLGDNR TELPLTLYLC AGTQAESAQS 
101 NRLMMLRMHN LHGTKPPPSE GSDEEEEEED EEDEEERKPQ LELAMVPHYG 
151 GINRVRVSWL GEEPVAGVWS EKGQVEVFAL RRLLQWEEP QALAAFLRDE 
201 QAWKPIFSF AGHMGEGFAL DWSPRVTGRL LTGDCQKNIH LWTPTDGGSW 
251 HVDQRPFVGH TRSVEDLQWS PTENTVFASC SADASIRIWD IRAAPSKACM 
301 LTTVTAHDGD VNVISWSRRE PFLLSGGDDG ALKIWDLRQF KSGSPVATFK 
351 QHVAPVTSVE WHPQDSGVFA ASGADHQITQ WDLAVERDPE AGDVEADPGL 
401 AOLPCX>LLFV HQGETELKEL HWHPQCPGLL VSTALSGFTI FRTISV 

BLAST? hits 

No BLASTP hits available 

Alert BLASTP hits for DKrZphfbr2_2cl7, frame 3 

TREMBL:AC005917_14 gene: "F3P11.14"; product: "putative WD'40 repeat 
protein"; Arabidopsis thaliana chromosome II BAG F3P11 genomic 
sequence, complete sequence., N = 1, Score = 910, p = 2.7e-91 

PIR:SS3061 hypothetical protein YMR131c - yeast (Saccharomyces 
cerevisiae), N = 1, Score -691, P = 4.3e-68 

PIR: 149367 retinoblastoma-binding protein mRbAp46 - mouse, N = 1, Score 


PIR: 139181 retinoblastoma-binding protein RbAp46 - human, N = 1, Score 
= 338, P « l-le-30 


>TREMBL:AC005917_14 gene: -F3P11.14-; product: -putative WD-40 repeat 

protein"; Arabidopsis thaliana chromosome II BAC F3P11 genomic sequence, 
complete sequence. 

Length - 469 

HSPs: 

Score 910 (136.5 bits). Expect = 2.7e-91, P = 2.7e-91 
Identities - 195/442 (44%), Positives = 259/442 (58%) 


Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 


18 EAESGDTSSEGPAQVYLPGRGPPLREGEELVMDEEAYVLYHRAQTGAPCLSFDIVRDHLG 77 

EA S + S P +V+ PC L +GEEL D AY H G PCLSFOI+ D LG 
18 EASSSEIPSI-PTRVWQPGVDT-LEDGEELQCDPSAYNSLHGFHVGWPCLSFDILGDKLG 75 

78 DNRTELPLTLYLCAGTQAESAQSNRLMMLRMHNLHGTKP-— PPSEGSDEEEEEEDEED- 133 

NRTE P TLY+ AGTQAE A N + + ++ N+ G + P + G+ E+E+E+DE+D 
76 LNRTEFPHTLYMVAGTQAEKAAHNSIGLFKITNVSGKRRDWPKTFGNGEDEDEDDEDDS 135 

134 EEERKPQLELAMVPHYGGINRVRVSWLGEEPVAGVWSEKGQVEVFALRRLLQ 185 

E + P.+++ V H+G +NR+R + W++ G V+V+ + L 

136 DSDDDDGDEASKTPNIQVRRVAHHGCVNRIRAMPQNSH-ICVSWADSGHVQVWDMSSHLN 194 

186 WEEPQALAAFLRDEQAQMKPIFSFAGHMGEGFALDWSPRVTGRLLTGDCQKNIHLWTPT 245 

+ E + p+ +F+GH EG+A+DWSP GRLL+GDC+ IHLW P 

195 ALAESETEGKDGTSPVLNQAPLVNFSGHKDEGYAIDWSPATAGRLLSGDCKSMIHLWEPA 254 

246 DGGSWHVDQRPFVGHTRSVEDLQWSPTENTVFASCSADASIRIWDIRAAPSKACMLTTVT 305 

G SW VD PF GHT SVEDLQWSP E VFASCS D S+ +WDIR S A + 
255 SG-SWAVDPIPFAGHTASVEDLQWSPAEENVFASCSVDGSVAVWDIRLGKSPAL SFK 310 

306 AHDGDVNVISWSRREPFLL-SGGDDGALKIWDLRQFKSGSPV-ATFKQHVAPVTSVEWHP 363 

AH+ DVNVISW+R +L SG DDG I DLR KG V A F+ H P+TS+EW 
311 AHNADVNVISWNRLASCMLASGSDDGTFSIRDLRLIKGGDAWAHFEYHKHPITSIEWSA 370 
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Query: 364 QDSGVFAASGADHQITQWDLAVERDPE AGDVEADPGLADLPQQLLFVHQGETEL 417 

++ A + 0+Q+T WDL++E+D E A E DLP QLLFVHQG+ +L 

Sbjct: 371 HEASTLAVTSGDNQLTIWDLSLEKDEEEEAEFNAQTKELVNTPQDLPPQLLFVHQGQKDL 430 

Query: 418 KELHWHPQCPGLLVSTALSGFTIFRTISV 446 

KELHWH Q PG+++STA GF I ++ 
Sbjct: 431 KELHWHNQIPGMIISTAGDGFNILMPYNI 459 


Pedant information for DKFZphfbr2_2cl7, frame 3 


Report for DKFZphfbr2_2cl7 , 3 


[LENGTH] 446 
(MWJ 49447.38 
[pi] 4.82 

[HOMOLJ TREMBL:AC005917_14 gene: ••F3P11.14-; product: "putative WD-40 repeat protein"; 

Arabidopsis thaliana chromosome II BAG F3P11 genomic sequence, complete sequence, le-90 


[FUNG AT] 
f FUNG AT] 
[FUNCAT] 
t FUNG AT] 
palmitylation, 
[FUNCAT] 
[FUNCAT] 
( FUNCAT 1 
(FUNCAT I 
(FUNCAT J 
(FUNCAT J 
(FUNCATl 
(FUNCAT} 
(FUNCAT) 
[FUNCAT) 
yDL145c) 5e-09 
(FUNCAT) 
5e->09 
[FUNCAT) 

TAF90 - TFIID subunit) 6e-09 


99 unclassified proteins (S. cerevisiae, YMR131c) 4e-65 

30.03 organization of cytoplasm [S. cerevisiae, YEL056w] 4e-15 

04.05.01.04 transcriptional control [S. cerevisiae, YEL056w] 4e-15 
06.07 protein modification (glycolsylation, acylation, myristylation, 
farnesylation and processing) [S. cerevisiae, YEL056w] 4e-15 

04.05.01.07 chromatin modification [S. cerevisiae, YBR195c) 2e-13 
10.04.09 regulation of g-protein activity {S. cerevisiae, YBR195c) 2e-13 
06-10 assembly of protein complexes [S. cerevisiae, YBR195c] 2e-13 
03.16 dna synthesis and replication (S, cerevisiae, ybr195c) 2e-13 
09.13 biogenesis of chromosome structure [S. cerevisiae, YBR195c] 2e-13 
30.10 nuclear organization (S. cerevisiae, YPR178wJ le-11 
04.05.03 mrna processing (splicing) (S. cerevisiae, yPR17BwJ le-11 
06.13 proteolysis (S. cerevisiae, YGLOOSc] 4e-09 

03.22 cell cycle control and mitosis (S. cerevisiae, YGL003c) 4e-09 
30.09 organization of intracellular transport vesicles (S. cerevisiae. 


08.07 vesicular transport (golgi networlc, etc.) 
04.05.01.01 general transcription activities 


[S. cerevisiae, YDLl45c] 
(S. cerevisiae, YBR198c 


(FUNCATl 
YMRllSc) 5e-08 
(FUNCAT] 
(FUNCAT] 
(FUNCAT) 
(FUNCAT) 
3e~06 
[FUNCAT) 
[FUNCAT) 
[FUNCAT) 
(FUNCAT) 
[FUNCAT) 
[FUNCAT] 
[FUNCAT) 
2e-05 
(FUNCAT] 
2e-05 
(FUNCAT) 
(FUNCAT] 
(FUNCAT) 
(FUNCAT) 

(S 

( FUNCAT ) 
(BLOCKS) 
(SCOP) 


05.04 translation (initiation, elongation and termination) (S. cerevisiae, 

02.16 fermentation (S. cerevisiae, YMR116c) 5e-08 

30.04 organization of cytoskeleton (S. cerevisiae, YLR429w) 3e-07 

30.19 peroxisomal organization (S. cerevisiae, YDR142c) 3e-06 

06.04 protein targeting, sorting and translocation [S. cerevisiae, YDR142c] 

08-10 peroxisomal transport [S. cerevisiae, YDR142c) 3e-06 

03.13 meiosis (S. cerevisiae, YLR129w] 4e-06 

08.01 nuclear transport [S. cerevisiae, YER107cl 4e-06 

03.01 cell growth (S. cerevisiae, YKL021c] 4e-06 

04.07 rna transport [S. cerevisiae, YER107c] 4e-06 

03.25 cytolcinesis [S. cerevisiae, YCR057c] 2e-05 " 

03.04 budding, cell polarity and filament formation [S. cerevisiae, YCR057c] 


01.01.04 regulation of araino-acid metabolism 


IS. cerevisiae, YIL046w) 


06.13.01 cytoplasmic degradation (S. cerevisiae, YIL046w) 2e-05 

04.01.04 rrna processing (S, cerevisiae, YLLOllw) 3e-05 

30.02 organization of plasma membrane (S. cerevisiae, YOR212w) 5e-05 

03.07 pheromone response, raating-type determination, sex-specific proteins 
cerevisiae, YOR212w) 5e-05 

10.05.07 g-proteins [S. cerevisiae, YOR212w) 5e-05 
BL00678 

d2trcb_ 2.51-3.1.1 Transducin (heterotrimeric G protein), gamm 5e-29 
(PIRKW) plasma 6e-07 

(PIRKW) duplication 4e-12 

(PIRKW) hormone 6e-07 

[PIRKW] transmembrane protein le-07 

(PIRKW) stomach 6e-07 

(PIRKW) actin binding le-07 

(PIRKW) leucine zipper le-07 

(PIRKW) signal transduction 2e-06 

(PIRKW) heterotrimer 2e-06 

(PIRKW) peripheral membrane protein 6e-07 

(PIRKW) GTP binding 2e-06 

[SUPFAM] WD repeat homology le-63 

(SUPFAM) yeast coatoraer complex alpha chain le-07 

(SUPFAM] GTP-binding regulatory protein beta chain 4e-07 

[SUPFAM] PRLl protein 8e-09 
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(SUPFAMJ MSIl protein 4e-12 

(SUPFAM] coatomer complex beta* chain le~09 

(PROSITE] WD_REPEATS 1 

[PFAM] WD domain, G-beta repeats 

[KW] Ali_Beta 

[KWJ 3D 

tKWj LOW_COMPLEXITY 3.14 % 

SEQ MAARKGRRRTCETGEPMEAESGDTSSEGPAQVYLPGRGPPLREGEELVMDEEAYVLYHRA 
SEG 

igotB 

SEQ QTGAPCLSFDIVRDHLGDNRTELPLTLYLCAGTQAESAQSNRLMMLRMHNLHGTKPPPSE 
SEG 

IgotB ^ ^ ^ i ' i ! 

SEQ GSDEEEEEEDEEDEEERKPQLELAMVPHYGGINRVRVSWLGEEPVAGVWSEKGQVEVFAL 
SEG . .xxxxxxxxxxxxxx 

IgotB !!!!!!!.*!!.'!!!.*!!!],* ]" 

SEQ RRLLQWEEPQALAAFLRDEQAQMKPIFSFAGHMGEGFALDWSPRVTGRLLTGDCQKNIH 

SEG 

^90tB EEECCCCCEEEEEETTT-TCEEEEEETTTEEE 

SEQ LWTPTDGGSWHVDQRPFVGHTRSVEDLQWSPTENTVFASCSADASIRIWDIRAAPSKACM 
SEG 

IgotB EEETTTT CEEEEEECCCCCEEEEEEETTTTCE-EEEEET^^ 

SEQ LTTVTAHDGDVNVISWSRREPFLLSGGDDGALKIWDLRQFKSGSPVATFKQHVAPVTSVE 
SEG 


IgotB EECBTTBTCCEEEEEETTTTTEEEEEETTTEEEEEE . 


SEQ WHPQDSGVFAASGADHQITQWDLAVERDPEAGDVEAOPGLADLPQQLLFVHQGETELKEL 
SEG ••••....•«..♦,.... 


IgotB 


SEQ HWHPQCPGLLVSTALSGFTIFRTISV 

SEG 

IgotB [[ 


Prosite for DKFZphfbr2_2cl7 .3 
PS00678 323->338 WD_REPEATS PDOC00574 

Pfam for DKFZphfbr2_2cl7 . 3 

HMMJJAME WD domain, G-beta repeats 

HMM *MrGHnnWVWCVaFSPDGrWFIvSGSWDgTCRLWD* 

++GH+ V ++ +SP + +++S S D ++R+WD 
Q"ery 257 FVGHTRSVEDLQWSPTENTVFASCSADASIRIWD 290 

bindLg pro?ei„ ' <i«^P«br2_2cl7.3 si,nilarity to YHRI31c and retinoblastoma- 

Alignment to HMM consensus: 
Q^ery *MrGHnnWVWCVaFSPDGrWFIvSGSWDgTCRLWD* 
+ H+++V+ +++S + ++SG+4-DG +++WD 

dkfzphfbr2 304 VTAHDGDVNVISWSRREPF-LLSGGDDGALKIWD 336 
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DKFZphfbr2_2cl8 


group: brain associated 

DKrZphfbr2_2cl8 encodes a novel 302 amino acid protein with weak similarity to cyclin- 
dependent kinase pl30-PITSLRE . 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes. 


weak similarity to cyclin-dependent kinase pl30-PITSLRE 
complete cDNA, complete cds, EST hits 
Sequenced by Qiagen 
Locus : unknown 

Insert length: 2835 bp 

Poly A stretch at pos . 2817, polyadenylation signal at pos. 2796 


1 TGGGGCGGAC GGCGAGGGAG TCCAGAGCCT TGAGCCCGGT GCTCCTCCCT 
51 CGCGCAGCGG TGGCTCTGCG GCCGCTGGAG TAAACACTGC CTTTGTTCCC 
101 TAGCGCCTCG TCTTTCGTCG CCCCGTGCCC TCACGCCGCC GGGCTCTGGC 
151 CGGCCCGCCC TCGGTCCTTG AACCCCATTT CGGCTCGTGC CGTGCGGATG 
201 CAGCTGCCGG GCCTGGGTTT GGGCATTGAG CGGGAGGAGG AGGAGGAGCG 
251 GCGGCGCCTG GGCGGCATGC GATGGGGAAC TGCTGCTGGA CGCAGTGCTT 
301 CGGACTGCTT CGCAAGGAAG CGGGGCGGCT GCAGCGAGTA GGCGGCGGCG 
351 GAGGATCCAA GTATTTTAGA ACATGCTCAA GAGGTGAGCA CTTGACAATA 
401 GAGTTTGAGA ATCTAGTAGA AAGTGATGAA GGGGAGAGCC CAGGAAGCAG 
451 TCATAGGCCT CTTACTGAGG AAGAAATTGT TGACCTAAGA GAAAGGCATT 
501 ATGATTCCAT TGCCGAAAAA CAAAAAGATC TTGATGAGAA AATTCAAAAA 
551 GAGTTAGCCT TACAAGAAGA GAAGTTAAGA CTAGAAGAAG AAGCTTTATA 
601 CGCTGCACAG CGTGAAGCAG CCAGGGCAGC AAAGCAGCGA AAGCTCTTGG 
651 AGCAAGAAAG GCAGAGAATT GTGCAGCAAT ATCATCCTTC CAACAATGGA 
701 GAATATCAAA GTTCAGGACC AGAAGATGAC TTCGAATCTT GTTTGAGAAA 
751 TATGAAGTCA CAGTATGAAG TTTTTCGAAG TAGTAGACTC TCATCAGATG 
801 CTACAGTTTT GACACCAAAT ACAGAAAGCA GTTGTGATTT AATGACCAAA 
851 ACTAAATCAA CTAGTGGAAA TGACGACAGC ACATCCTTAG ATCTAGAGTG 
901 GGAAGATGAA GAAGGAATGA ATAGAATGCT TCCAATGAGA GAACGTTCCA 
951 AAACAGAGGA AGACATTCTA CGGGCAGCAC TTAAGTATAG CAACAAGAAG 
1001 ACTGGAAGTA ATCCTACATC AGCCTCTGAT GATTCCAATG GGCTGGAGTG 
1051 GGAAAATGAT TTTGTTAGTG CCGAAATGGA TGATAATGGA AATTCCGAGT 
1101 ATTCTGGATT TGTAAATCCT GTATTAGAAC TGTCTGATTC TGGCATAAGG 
1151 CATTCTGACA CAGATCAACA GACTCGATAG GGTAAAATTG TGTGACCTTG 
1201 TTTATCAGTT ATGACCAAAT GTTAAAAACC AACTAGAATG TATAAGTGAT 
1251 TGTGCTTAGC CTTTTTGTAA GGGAGATGTG TAAGAAACCA TGCTGTAAAT 
1301 GCTTATTTTA TTACAAAGGA GTAGGGATGA TAGGATCTGA ATTGATACAG 
1351 AATTAAGTGC AATTTCATCA TCTGCCTTCT GCTTTTCAAG ACCAATTTAA 
1401 TGGTCCTGTC ATGTTACTGA TTAAATTTAC TTTGTCTTGT CTTTATAGCA 
1451 TTTCTGTTTA CTATGGTAGA TTTCCACTTT CAATTTTTAA AATTAATTTT 
1501 ACTTTGAATG ATTTATGAAG CCTATTTCAT TGTCTAACTA TGAAAATATT 
1551 AAGACTTTTT TGTTAATTCT CAGCCGATGT GAAGGAAGCA TGAGGAGGGA 
1601 TCGTCAGACT CAGATTTAGA ATAGTGTTCC CGTTTCCAGC ATTATTTATT 
1651 TCTATGACTT CTTTGGATTT TATTATCTAA TAGTAAGTAC AGTTGATGTG 
1701 GGTAGATGAC TCTAAGAAAT GCTGAAGTAT CGGCATTACA TGTGTTTATT 
1751 TACATGTCCT AGTTTGATAA TGTTGATTCA ATCTGAACAA AAGATAATAT 
1801 AAAAATAACC CTTCAGAGTT TGGACATTTC AAGTTGGTAA TAAT7UU\AAA 
1851 TAATATTTAA GAAGATATAT ATATATATAT ATTTAGTTTT TTCCACTTCA 
1901 TTTTACATGC CACTATATTG ACTTTAATTG ATATACAGTA TTAAGTTTTT 
1951 AGGTGCCATT ATTTTTAAAA AATTCTATAT TTCCAATGAA CGATGTTAGA 
2001 TTTTACACAG AACATATTCT CTGCATGATT TCAGAAAAGA AAATCTAAAA 
2051 AGGTAATACG GGTATTTCAA ATAAAATCCT TTCTGGTATG AAAGGCTCCA 
2101 TTGATTTTAT TAAGCCTTCC TTTACCTTGT AGTACAAGGT GCTTTAATGG 
2151 GATAGAACTA AGCATATCAA TATCTATAAC TGCATTTTGT GCTAGACAAT 
2201 TACTGTTCTT TTCTCTAAAA TGTATATGTC AATTTACAAG GCCAGGGATA 
2251 GAAAACACTC CATAATTGCT TTCCTTGATT TTGCTGAGGA TTTGGTATGA 
2301 TTTTAGTAAG CAAACTGTTT TTTGGTTTTT CCTTAATGTT TTTAATTTTT 
2351 TTTCCTCTTG CAACAATGAC GGTGCATGTT CTTATAAATA TAGGAAGGTC 
2401 CAGATATAAA TAGTAACCTA AAGTTCTTGC TGTGCTTAAA AAAAAAAATC 
24 51 ATGTGGCTCT TTCAATATTT GAACTGCTAA GCAATGACAT CTGTAGTTTT 
2501 ATCTCCTTTT TTATGTCATA GAAATTAATA TGATACTTTA AATATGTAAA 
2551 TATAATACAT TGGTAATGCT ATTATTTATA TCTGTCTTAA CATAATTTAA 
2601 GTTGTAGCTG TGTCTTGGAA ATATTTTTAA GGTAATCTAT ATTCACATTG 
2651 CCTGTGTTAA TGCTTTTTAA GGTTTGTATA CATCAGATGT ATATTTTTGG 
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2701 TTTGGCATAA GCTACGATTG TAATTTTTCT TGGCTTTTTG TTCATAAAGA 
2751 ATTTTTTGAA GGAATGGTAA CAAATGGTAA TTTACAAATG GTTGTGAATA 
2801 AACACATTTT TACACTTAAA AAAAAAAAAA AAAAA 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 2 


ORF from 272 bp to 1177 bp; peptide length: 302 
Category: similarity to known protein 


1 MGNCCWTQCF GLLRKEAGRL QRVGGGGGSK YFRTCSRGEH LTIEFENLVE 
51 SDEGESPGSS HRPLTEEEIV DLRERHYDSI AEKQKDLDEK IQKELALQEE 
l^^l^^^^^ AAQREAARAA KQRKLLEQER QRIVQQYHPS NNGEYQSSGP 
151 EDDFESCLRN MKSQYEVFRS SRLSSDATVL TPNTESSCDL MTKTKSTSGN 
201 DDSTSLDLEW EDEEGt4NRML PMRERSKTEE DILRAALKYS NKKTGSNPTS 
251 ASDDSNGLEW ENDFVSAEMD DNGNSEYSGF VNPVLELSDS GIRHSDTDQQ 
301 TR 


BLASTP hits 


Entry A55817 from database PIR: 
cyclin-dependent kinase pl30-PlTSLRE - mouse 
Length =783 

Score « 123 (43.3 bits). Expect * 0.00013, P - 0.00013 
Identities = 53/197 (26%). Positives « 96/197 (48%) 


Alert BLASTP hits for DKF2phfbr2_2cl8, frame 2 
No Alert BLASTP hits found 

Pedant information for DKF2phfbr2_2cl8, frame 2 

Report for DKFZphfbr2_2cl8.2 

[LENGTH! 302 

JMW] 34281.39 

[pl] 4.73 

tPROSITEJ MYRISTYL 5 

[PROSITE] CK2 PHOSPHO^SITE 12 

[PROSITEJ TYR~PHOSPHO_SITE 2 

(PROSITE J PKCPHOSPHO SITE 3 

IKW] All_Alpha " 

tKW] LOW_COMPLEXITY 13.58 % 

fKWJ COILED^COIL . 13.58 % 

SEQ MGNCCWTQCFGLLRKEAGRLQRVGGGGGSKYFRTCSRGEHLTIEFENLVESDEGESPGSS 
^•^G ....... xxxxx . 

COILS '^'^''^'''^'''''^^^^^^*^*'*'^^^®^^^^*=^^^^«^^eccccccchhh^ 

SEQ HRPLTEEEIVDLRERHYDSIAEKQKDLDEKIQKELALQEEKLRLEEEALYAAQREAARAA 
PRn ' * * ^^xxxxxxxxxxxxxxxxxxxxxxxxxxx 

ro?fc '''=''^^^^^^^^*'*'^^^^^^^^^^^hh*^^»h*»»ihhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

^^^^^ cccccccccccccccccccccccccccccccc 

SEQ KQRKLLEQERQRIVQQYHPSNNGEYQSSGPEDDFESCLRNMKSQYEVFRSSRLSSDATVL 
oCjG xxxxxxx 

COILS ccccccccc!'!'''''''''''"''''"''"*'*'"^''^^'*^^^ 
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SEQ TPNTESSCDLMTKTKSTSGNDDSTSLDLEWEDEEGMNRMLPMRERSKTEEDILRAALKYS 

SEG 

PRD ccccccccccccccccccccccccchhhhhhhccccccchhhhhhhcchhhhhhhhhhhc 

COILS 

SEQ NKKTGSNPTSASDDSNGLEWENDFVSAEMDDNGNSEYSGFVNPVLELSDSGIRHSDTDQQ 

SEG 

PRD cccccccccccccccccccccccceeeecccccccccccccceeeecccccccccccccc 

COILS 

SEQ TR 
SEG 

PRD CC 
COILS 


Prosite for DKFZphfbr2_2cl8 .2 


PS00005 

60->63 

PKC PHOSPHO 

SITE 

PDOC00005 

PS00005 

170->173 

PKC PHOSPHO' 

"site 

PDOC00005 

PS00005 

240->243 

PKC PHOSPHO' 

SITE 

PDOC00005 

PSO0006 

35->40 

CK2 PHOSPHO" 

"site 

PDOC00006 

PS00005 

65->69 

CK2 PHOSPHO' 

'site 

PDOC00006 

PSO0OO6 

79->83 

CK2 PHOSPHO 

'site 

PDOC00006 

PS00006 

148->152 

CK2 PHOSPHO 

'site 

PDOC00006 

PS00006 

163->167 

CK2 PHOSPHO 

'site 

PDOC00006 

PS00006 

186->190 

CK2 PHOSPHO' 

"site 

PDOC00006 

PS00006 

198->202 

CK2 PHOSPHO' 

'site 

PDOC00006 

PS00006 

204->208 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

226->230 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

228->232 

CK2 PHOSPHO 

site 

PDOC00006 

PS00006 

250->254 

CK2_PHOSPHO" 

"site 

PDOC00006 

PS00006 

295->299 

CK2 PHOSPHO 

"site 

PDOC00006 

PS00007 

103->111 

TYR PHOSPHO~ 

SITE 

PDOC00007 

PS00007 

103->111 

TYR PHOSPHO SITE 

PDOC00007 

PS00008 

24->30 

MYRISTYL 

PDOC00008 

PS00008 

25->31 

MYRISTYL 


PDOC00008 

PS00008 

199->205 

MYRISTYL 


PDOC00008 

PS00008 

245->251 

MYRISTYL 


PDOC00008 

PS00008 

291->297 

MYRISTYL 


PDOC00008 


(No Pfam data available foe 0Kr2phfbr2_2cX8.2) 
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DKFZphfbr2_2dl5 


group: differentiation/development 

DKFZphfbr2_2dl5 encodes a novel 438 amino acid protein similarity to Mus musculus testis- 
specific Y-encoded-like protein (Tspyll) . 

The TSPY genes are arranged in clusters on the Y chromosome of many mammalian species. TSPY is 
believed to function in early spermatogenesis and is a candidate for GBY, the putative 
gonadoblastoma-inducing gene on the Y. The novel protein is a new member of the TSPY-SET- 
NAPILI family, which represents proteins closely related to TSPY. Therefore, the new protein 
seems to be involved in early spermatogenesis. 

The new protein can find application in modulating early spermatogenesis. 


strong similarity to testis-specif ic Y-encoded-like protein 

complete cDNA, complete cds, EST hits 
localisation: primer B does not match perfect 

Sequenced by Qiagen 

Locus: /raap="729-2 cR from top of Chr6 linkage group" 
Insert length; 3229 bp 

Poly A stretch at pos. 3206, polyadenylation signal at pos. 3184 


1 GGAGACTGTA GGGTGGGCGG TGCGAGCGGC GGTTAGCTCC CAGTTCGGCC 
51 TCTGAGGAAA AC6GGCGTTC GCCTGCGGTT GGTCCGACTG TTAGCAACAT 
101 GAGCGCCCTG GATGGGGTCA AGAGGACCAC TCCCCTCCAA ACCCACAGCA 
151 TCATTATTTC TGACCAAGTC CCGAGCGACC AGGACGCACA CCAGTACCTG 
201 AGGCTCCGCG ACCAAAGCGA GGCGACACAG GTGATGGCGG AGCCGGGTGA 
251 GGGAGGCTCG GAGACCGTCG CGCTCCCGCC TTCACCGCCT TCAGAGGAGG 
301 GGGGCGTACC CCAGGATCCC GCGGGCCGTG GCGGTACTCC CCAGATCCGA 
351 GTTGTTGGGG GTCGCGGTCA TGTGGCGATC AAAGCCGGGC AGGAAGAGGG 
401 CCAGCCTCCC GCCGAAGGCC TGGCAGCCGC TTCTGTGGTG ATGGCAGCCG 
451 ACCGCAGCCT GAAAAAGGGC GTTCAGGGTG GAGAGAAGGC CCTAGAAATC 
501 TGTGGCGCCC AGAGATCCGC GTCTGAGCTG ACGGCGGGGG CGGAGGCTGA 
551 GGCGGAGGAG GTGAAGACAG GAAAGTGCGC CACCGTCTCA GCAGCCGTGG 
601 CTGAGAGGGA GAGCGCTGAG GTGGTGGTGA AGGAAGGCCT GGCGGAGAAG 
651 GAGGTAATGG AGGAGCAGAT GGAGGTAGAG GAGCAGCCGC CAGAAGGTGA 
701 AGAAATAGAA GTGGCGGAGG AGGATAGATT GGAGGAGGAG GCGAGGGAGG 
751 AAGAAGGGCC CTGGCCTTTG CATGAGGCTC TCCGCATGGA CCCTCTGGAG 
801 GCCATCCAGC TGGAACTGGA CACTGTGAAT GCTCAGGCCG ACAGGGCCTT 
851 CCAACAGCTG GAGCACAAGT TTGGGCGGAT GCGTCGACAC TACCTGGAGC 
901 GGAGGAACTA CATCATTCAG AATATCCCGG GCTTCTGGAT GACTGCTTTT 
951 CGAAACCACC CCCAGTTGTC CGCCATGATT AGGGGCCAAG ATGGAGAGAT 
1001 GTTAAGGTAC ATAACCAATT TAGAGGTGAA GGAACTCAGA CACCCTAGAA 
1051 CCGGTTGCAA GTTCAAGTTC TTCTTTAGAA GAAACCCCTA CTTCAGAAAC 
1101 AAGCTGATTG TCAAGGAATA TGAGGTAAGA TCCTCCGGCC GAGTGGTGTC 
1151 TCTTTCTACT CCAATTATAT GGCGCAGGGG GCATGAACCC CAGTCCTTCA 
1201 TTCGCAGAAA CCAAGACCTC ATCTGCAGCT TCTTCACTTG GTTTTCAGAC 
1251 CACAGCCTTC CAGAGTCCGA CAAAATTGCT GAGATTATTA AAGAGGATCT 
1301 GTGGCCAAAT CCACTGCAAT ACTACCTGTT GCGTGAAGGA GTCCGTAGAG 
1351 CCCGACGTCG CCCGCTAAGG GAGCCTGTAG AGATCCCCAG GCCCTTTGGG 
1401 TTCCAGTCTG GTTAACATTT GCCCTTGGGA ATACTCCTGC ACAAGGTCTC 
1451 CTACCACCTT CTGCTGGACC TGTGCTTGGG CATCAGCAAT GAGTATGCCT 
1501 TCTATTGTGC TTTGTTTTTG CTGACTTTTC TGCACCCTGT TTCCTTTGGA 
1551 TATTCAGTTC TCTCAACCTC AAGATTGAGA CGGTGGTGGG TATGCTTCTC 
1601 CACTTCCATA TGACCTTCAT GCTGTTCTGG AATATCACAT GCTACGAGGT 
1651 CATCCTTCAC ACTACTTGTA AGCCAAGCAA ATGATACTGT AGATTGTACT 
1701 GCCTTTATCT GCACTGCTTG GACCCTGTTT ATTCCCAGGG CCTCTGAACT 
1751 GGTTGCTGTC ACTTGGATTT CTAGCTTTGG GAGCCTGTTC CACCTACTCA 
1801 GCTCTGCATT GAGCAGTATG GGCACATGCC CTGTGGACAG TTACTGGACG 
1851 TTAATGAACT CAGAGGAGAA AAGCAGTGAG CCACTTGTTC TGTGTGATTT 
1901 ATGGTACTTC ATTGCTCTTC CTTCACCTCT AGTCACTTTC TATTGCTACC 
1951 TGCCCTACAT TGGCTCCTGC CAAGGTCCCT CTCTCTCCCT GTTTTCCTTT 
2001 TTTTTTTTTT tTTTTTTTTT TTTTGAGACG GAGGACGGAG TCTTGCTCTG 
2051 TCGCCCAGGT TGGAGTGCAG TGGCGCGATC TCGGCTCACT GCAACCTCCA 
2101 CCTCCCGGGT TCAAGCGATT CTCCTGCCTC AGCCTCCCGA GTAGCTGGGA 
2151 CTACAGGCGC GCGCCGCCAC GCCCGGCTAA TTTTTATATT TTTAGTAGAG 
2201 ACGGGGTTTC ACCATGCTGG CCAGGCTGGT CTCGAACCCC GACCTCGTGA 
2251 TCCGCCCTCC TTAGCCTCCC AATCCTCTCT TAAAAAAGTG ATAGCTCAGA 
2301 AATATTTGTA AAAGCAAGGT TTTTATTTCA TTTTGGCTCT GTCATTTTCA 
2351 GAGGCAAAGA AGTTGGCCTG TAAAATAGAG TGCTAGAGCT CTTACGCCCC 
2401 TCCCCTTCTT CCCAACTTCC TACTTCCTAG CCCTTTTATC AACTCCTAGA 
2451 ATAGTTAAAG AGAGACACAT CTAGATGGGA TGAAAGGTGC CCTAAGCAGG 
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2501 AGAAACTGAA CAAAAGGCTA GAGGCATGGG CCAGGTAAAA ATTGGGCCTA 
2551 GAGTGAAGAC TGTGCTGCCG TTAAGAGCTT TCGAGGAAGG AGTACTTACT 
2601 CCCCAATGAT GATGAATGGA GAAATACTTT TCAGGGAGAA TTGAAGGGGT 
2651 TAAAGTGTTA AATATGTTGC CTAGACAAGG GTTCTTTAAA GAAAGACAGC 
2701 GCAACTTTGA ATGCTTTCTT ACTTGTTTTG TGACCTAATT TATGTGGAAG 
27 51 ATTGTTATTT CATTAGGATT TAGTAAAATT TTTTTTTCTG ATTCTAAACT 
2801 TATTGTGAAA ATTGAGCTGT ACAGATATTC TTTTGATTTC AATTGGGAAC 
2851 ATTTGGAAGA ACAACAGTCT TACTTGCCTG TACAATATAG AGACATATGA 
2901 ATAGTCATAA CAGTTTTCAA CTTGTTCTTG TTTCTGTTAA ACTATATTCC 
2951 TAGAAACATA GTTTGAACAA CTTGGTCTTT GTTAGGCTTG TCAAATTGCC 
3001 TTCATGGAAA AATAATCTAC AAAAGTATGG TTTAATTGAT TGTCTTACAT 
3051 GATAATTTTC CCTGGCAACA ACTTAGTAAG TGATATATCT TTTTTCCTAA 
3101 ATTGCTTAAA TACTGTGAAA TTGCTCTGAC AAATTGGAAG TGTACCATTG 
3151 GCATATTTGT CTTCCTTTTT ATGCATGATG GTAAAATAAA AGCATGTTGT 
3201 TCTGCTAAGA AAAAAAAAAA AAAAAAAAA 


BLAST Results 


Entry AF04 2181 from database EMBLNEW: 

Homo sapiens testis-specific Y-encoded-like protein {TSPYL) mRNA, 

partial cds. 

Score ^ 3411, P » 6,9e-148, identities = 685/687 

Entry HS938343 from database EMBL: 
human STS WI-11947. 
Score = 1195, P = 2.1e-46, identities = 273/299 


Medline entries 


98399864: 

Murine and human TSPYL genes: novel members of the TSPY-SET-NAPlLl family 


Peptide information for frame 3 


ORF from 99 bp to 1412 bp; peptide length: 438 
Category: strong similarity to known protein 
Classification : Differentiation/Development 

1 MSGLDGVKRT TPLQTHSIII SDQVPSDQDA HQYLRLRDQS EATQVMAEPG 

51 EGGSETVALP PSPPSEEGGV PQDPAGRGGT PQIRVVGGRG HVAIKAGQEE 

101 GQPPAEGLAA ASVVMAADRS LKKGVQGGEK ALEICGAQRS ASELTAGAEA 

151 EAEEVKTGKC ATVSAAVAER ESAEVWKEG LAEKEVMEEQ MEVEEQPPEG 

201 EEIEVAEEDR LEEEAREEEG PWPLHEALRM DPLEAIQLEL DTVNAQADRA 

251 FQQLEHKFGR MRRHYLERRN YIIQNIPGFW MTAFRNHPQL SAMIRGQDAE 

301 MLRYITNLEV KELRHPRTGC KFKFFFRRNP YFRNKLIVKE YEVRSSGRVV 

351 SLSTPIIWRR GHEPQSFXRR NQDLICSFFT WFSDHSLPES DKIAEIIKED 

401 LWPNPLQYYL LREGVRRARR RPLREPVEIP RPFGFQSG 


BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_2dl5, frame 3 

TREMEL: AF04 2180_1 gene: "Tspyll**; product: "testis-specific 
Y-encoded-like protein"; Mus musculus testis-specific Y-encoded-like 
protein (Tspyll) mRNA, complete cds., N = 1, Score = 1202, P = 3.1e-122 

TREMBL:AB018264_1 gene: "KIAA0721"; product: "KIAA0721 protein"; Homo 
sapiens mElNA for KIAA0721 protein, partial cds., N = 1, Score = 798, P 

= 2e-79 

TREMBL:AB01534 5_1 gene: "HRIHFB2216"; Homo sapiens HRIHFB2216 mRNA, 
partial cds., N = 1, Score = 570, P = 2-9e-55 


>TREMBL:AF042180_1 gene: "Tspyll"; product: "testis-specific Y-encoded-like 
protein"; Mus musculus testis-specific Y-encoded-like protein (Tspyll) 
mRNA, complete cds. 

Length = 379 

HSPs: 
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Score » 1202 (180.3 bits), Expect = 3,le-122, P = 3.1e-122 
Identities = 258/377 (68%), Positives « 283/377 (75%) 


Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 


62 SPPSEEGGVPQDPAGR GGTPQIRVVGGRGHVAIKAGQEE— GQP-P— AEGLAA 110 

SP +EG D G GTP R + G G+ GPP EGL 

3 SPERDEGTPVPDSRGHCDADTVSGTPDRRPLLGEEKAVTGEGRAGIVGSPAPRDVEGLVP 62 

111 ASVVffAADRSLKK-GVQGGEKALEICGAQRSASELTAGAEAEAEEVKTGKCATVSAAVAE 169 
V AA + V+G A+ + ++ T GAE++A +VKT + TV+AA 

63 QIRVAAARQGESPPSVRGPAAAVFVTPKYVEKAQETRGAESQARDVKT-EPGTVAAAA— 119 

170 RESAEWVKEGLAEKEVMEEQMEVEEQPPEGEEIEVAEEDRLEEEAREEEGPWPLHEALR 229 

E +EV EE MEVE Q P GEE+E+ E EA EE GPW L LR 

120 -EKSEVATPGS EEVMEVE-QKPAGEEMEMLEASGGVREAPEEAGPWHLGIDLR 170 

230 MDPLEAIQLELDTVNAQADRAFQQLEHKFGRMRRHYLERRNYIIQNIPGFWMTAFRNHPQ 289 

+ PLEAIQLELDTVNAQADRAFQ LE KFGRMRRHYLERRNYIIQNIPGFWMTAFRNHPQ 

171 RNPLEAIQLELDTVNAQADRAFQHLEQKFGRMRRHYLERRNYIIQNIPGFWMTAFRNHPQ 230 

290 LSAMIRGQDAEMLRYITNLEVKELRHPRTGCKFKFFFRRNPYFRNKLIVKEYEVRSSGRV 349 
LSAMIRG+DAEMLRY+T+LEVKELRHP+TGCKFKFFFRRNPYFRNKLIVKEYEVRSSGRV 

231 LSAMIRGRDAEMLRYVTSLEVKELRHPKTGCKFKFFFRRNPYFRNKLIVKEYEVRSSGRV 290 

350 VSLSTPIIWRRGHEPQSFIRRNQDLICSFFTWFSDHSLPESDKIAEIIKEDLWPNPLQYY 409 
VSLSTPIIWRRGHEPQSFIRRNQDLICSFFTMFSDHSLPESD+IAEIIKEDLWPNPLQYY 

291 VSLSTPIIWRRGHEPQSFIRRNQDLICSFFTWFSDHSLPESDRIAEIIKEDLWPNPLQYY 350 

410 LLREGVRRARRRPLREPVEIPRPFGFQSG 438 
L REG+RR RRRP+REPVEIPRPFGFQSG 

351 LCREGIRRPRRRPIREPVEIPRPFGFQSG 379 


Pedant information for DKFZph£br2_2dl5, frame 3 


Report for DKFZphfbr2_2dl5.3 


[LENGTH] 

(MWl 

[pl] 

[HOMOL) 

protein"; 

107 

[FUNCATl 

(FUNCATl 

[FUNG AT) 

lc-07 

tFUNCAT) 

[FUNCATJ 

(BLOCKS) 

rPIRKWj 

[PIRKW) 

[ PIRKW) 

I PIRKW] 

[KW) 

[KWJ 


438 

49307.65 
5.36 

TREMBL:AF042180_1 gene: "Tspyll«; product: "testis-specific Y-encoded-lilce 
Mus musculus testis-specific Y-encoded-li)ce protein (Tspyll) inRNA, complete cds. le- 

06.10 assembly of protein complexes (S. cerevisiae, YKR048c] le-07 
03.22 ceil cycle control and mitosis (S. cerevisiae, YKR048c) le-07 
03.04 budding, cell polarity and filament formation [S. cerevisiae, yKR048c) 


09.13 biogenesis of chromosome structure 


30.10 nuclear organization 
BL00376F 
nucleus 6e-39 
DNA binding 3e-06 
phosphoprotein 6e-39 
alternative splicing 6e-39 
Alpha_Beta 

LOW_COMPLEXITY 22.83 * 


IS. cerevisiae, YKR048c) le-07 


(S. cerevisiae, YKR048c] le-07 


SEQ 
SEG 
PRO 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 


MSGLDGVKRTTPLQTHSIIISDQVPSDQDAHQYLRLRDQSEATQVMAEPGEGGSETVALP 
X 

ccccccccccccccceeeeecccccccccchhlihhhhhchhlihhcccccccccceeeecc 

PSPPSEEGGVPQDPAGRGGTPQIRWGGRGHVAIKAGQEEGQPPAEGLAAASWMAADRS 
xxxxxxxxx 

ccccccccccccccccccccceeeeecccceeeeecccccccccchJihhlihhhhhhhlicc 

LKKGVQGGEKALEICGAQRSASELTAGAEAEAEEVKTGKCATVSAAVAERESAEVVVKEG 

xxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxx . 

ccccccccccceeeccchhhhhhhlihhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

LAEKEVMEEQMEVEEQPPEGEEIEVAEEDRLEEEAREEEGPWPLHEALRMDPLEAIQLEL 

. xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

hh)ihhhhlihhhhhhhhccccchhhhhhhhhhhhhhhcccccccchhhhhhhhhhhhhhhh 

DTVNAQADRAFQQLEHKFGRMRRHYLERRNYIIQNIPGFWMTAFRNHPQLSAMIRGQDAE 

hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccceeeeecccccccccccccchhh 

MLRYITNLEVKELRHPRTGCKFKFFFRRNPYFRNKLIVKEYEVRSSGRVVSLSTPIIWRR 
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SEG 

PRD hhhhhhhhhhhhhcccccceeeeeeeccccccchhhhhhccccccccccccccceeeecc 

SEQ GHEPQSFIRRNQDLICSFFTWFSDHSLPESDKIAEIIKEDLWPNPLQYYLLREGVRRARR 

SEG xxxxxxxxxxx 

PRD ccccchhhhhhcccccceeeeeccccccccchhhhhhhhhcccccceeeeccccchhhhh 

SEQ RPLREPVEIPRPFGFQSG 

SEG xxxxxxxx 

PRD hccccccccccccccccc 

(No Prosite data available for DKFZphfbr2_2cll5 . 3) 
{No Pfam data available for DKFZphfbr2_2dl5. 3) 
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DKFZphfbr2_2dl7 


group: transmembrane proteins 

DKFZphfbr2_2dl7 encodes a novel 292 amino acid protein with similarity to a C eleqans 
hypothetical protein. 

One transmembrane region is predicted for the protein. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes and as a new marker for neuronal cells. 

similarity to C.elegans hypothetical protein 

TRANSMEMBRANE 1 

Sequenced by Qiagen 

Locus : unknown 

Insert length: 1009 bp 

Poly A stretch at pos. 990, polyadenylation signal at pos. 969 


1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 


TGGGCCTGTG 
CTATTTCCTT 
CAAATCCAGC 
AGATTTATTA 
CGCCTCTTGA 
GAAAAGGGAC 
CTACCGGGAA 
GAGAGATCAT 
TTGAAAAAAG 
TTTTATCTTT 
TTTTAATTCA 
CTTATTATAA 
AAGAGCTGTG 
ACTATATTGA 
AGTTCAGATG 
AGTAACAAAG 
GAGAAAAAGA 
TTCCTTTACT 
TTTTGTTTTT 
CCCATGGTCG 
AAAAAAAAA 


GCTGGGGGCA 
GAGCTCTTTA 
AGGGAGGTCC 
TCTAGGATAG 
TTTTCCTGAT 
AGTTAAGACA 
GATTTACACA 
CACGAAGTAT 
TATCTATTCC 
ATGAGTGAGG 
TGGTAGTGGT 
ATGAAGATCT 
GCTGAAGGAT 
AGTAGAAAAG 
AACCAGCAGA 
AAGCGACGTG 
AATGATGCAA 
ATTTTCTTTA 
CTTCAAGAAT 
TCTACTTGGA 


GAGCTCAGAC 
ATTTTGTTGC 
AGATGAAAAA 
ATTTGGATGA 
ACCCTGGAAG 
CATAAAAACT 
GATGGAACCA 
GTATATGAGC 
AGTAGATGCC 
ATGCTTTGAC 
GTTGTCAGGG 
GGACAGTGGC 
ATGGAGTAAT 
CCGAAGATAC 
AAAACGGGAA 
ATTTCTATGA 
TTGTATATCA 
CCTTGTATAT 
ATTAATTTCT 
TTAAATGGGT 


TGTCTTCTGA 
CAATTTGGAT 
GAAAAGACTA 
ACTAATGAAA 
GATTTGAATA 
GGGGAACCAT 
GAAAAGATAC 
TCCTGGAAAA 
ACTGAGAGTG 
AAATCCACAG 
CAGGGCAGTG 
ACACAGATAC 
AGTACTAAAT 
ACGTACAGTC 
AGAAAAGATA 
GAAGTATCGT 
GAGTGAGTGA 
ATTTTATTAT 
TTATTTGTCA 
TTTTAAATTC 


AGATTGATGT 
AAACATGGCA 

CCGCACTGAA 
AAAGATGAAC 
TGCTTTTAAT 
TTGTTTTTAA 
GAGGCTCTAG 
GGATTGTAAT 
AACCAAAGAG 
AAACTGATGG 
GGCTAGAAGA 
CGTTTATTAA 
CCCAATGAAA 
ATCATCTGAT 
AAGTTTCTAA 
AACCCCCAAA 
GATCACTACT 
ATGTAGATTG 
TCATTTATTT 
AAAAAAAAAA 


BLAST Results 


Entry 189937 from database EMBL: 
Sequence 11 from patent US 5723315. 
Score = 1083, P = 2.2e-42, identities 


223/231 


Entry 189938 from database EMBL: 

Sequence 12 from patent US 5723315. 

Score = 875, P « 7.4e-33, identities « 175/175 


\ 


No Medline entry 


Medline entries 


Peptide information for frame 2 


ORE from 47 bp to 922 bp; peptide length: 292 
Category: similarity to unknown protein 
Classification: unset 


1 MSISLSSLIL LPIWINMAQI QQGGPDEKEK TTALKDLLSR IDLDELMKKD 
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51 EPPLDFPDTL EGFEYAFNEK GQLRHIKTGE PFVFNYREDL HRWNQKRYEA 

101 LGEIITKYVY ELLEKDCNLK KVSIPVDATE SEPKSFIFMS EDALTNPQKL 

151 MVLIHGSGVV RAGQWARRLI INEDLDSGTQ IPFIKRAVAE GYGVIVLNPN 

201 ENYIEVEKPK IHVQSSSDSS DEPAEKRERK DKVSKVTKKR RDFYEKYRNP 

251 QREKEMMQLY IRVSEITTFL YYFLYLVYIL LYVDCFVFLQ EY 

BLASTP hits 

Entry S67436 from database PIR: 

hypothetical protein - fission yeast (Schizosaccharomyces pombe) 
Length = 266 

score = 112 (39.4 bits), Expect = 0.00037, P = 0.00037 
Identities = 33/147 (22%), Positives = 69/147 (46%) 

Entry CEY75B8A_12 from database TREMBLNEW: 

gene: "Y75B8A.31"; Caenorhabditis elegans cosmid Y75B8A 

score = 327, p « 1.5e-29, identities = 72/140, positives = 93/140 


Alert BLASTP hits for DKFZphfbr2_2dl7, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_2dl7, frame 2 


Report for DKFZphfbr2_2dl7 .2 


( LENGTH! 292 

[MW] 34260.50 

Cpl) 5,50 

(HOMOL) TREMBLNEW : AFO 64 7 8 2_1 product: "unknown"; Mas musculus clone pEN87 unknown mRNA, 

partial cds. le-119 

[KW] SIGNAL_PEPTIDE 19 

[KW] TRANSMEMBRANE 1 

(KWJ LOW_COMPLEXITY 10.96 % 

SEQ MSISLSSLILLPIWINMAQIQQGGPDEKEKTTALKDLLSRIDLDELMKKDEPPLDFPDTL 

SEG .xxxxxxxxxxxxxx 

PRD ccchhhhhhchhhhhhhccccccccccchhhhhhhhhhhhhcchhhhhhccccccccccc 
MEM 


SEQ EGFEYAFNEKGQLRHIKTGEPFVFNYREDLHRWNQKRYEALGEIITKYVYELLEKDCNLK 

SEG 

PRD hhhhhhcccccceeeecccccceeecccccccchhhhhhhhhhhhhhhhhhhhhhhhhhe 


MEM 


SEQ KVS I PVDATESEPKSFIFMSEDALTNPQKLMVLIHGSGVVRAGQWARRLI INEDLDSGTQ 

SEG 

PRD eeeccccccccccceeeeeeccccccccceeeeeecccccchhhhhcccccccccccccc 

MEM 

SEQ IPFIKRAVAEGYGVIVLNPNENYIEVEKPKIHVQSSSDSSDEPAEKRERKDKVSKVTKKR 

SEG 

PRD chhhhhhhhccceeeeeccccceeeeeccceeeeccccccccchhhhhhhhhhhhhhhhh 

MEM 

SEQ RDFYEKYRNPQREKEMMQLYIRVSEITTFLYYFLYLVYILLYVDCFVFLQEY 

SEG xxxxxxxxxxxxxxxxxx 

PRD hhhhhhhcccchhhhhhhhhhhhheeeeehhhhhhhhhhhhheeeeeeeccc 

MEM MMMMMMNMMMMMMMMMMMMMM 

(No Prosite data available for DKFZphfbr2_2dl7. 2) 
(No Pfam data available for DKFZphfbr2_2dl7 .2) 
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DKFZphfbr2 2d20 


group: brain derived 

P7l5^rhy^?^^?i^a"^^%^ ^D^proLi" '''' ^^"""ity to Synechocystis sp. 

No informative BLAST results; No predictive prosite, pfain or SCOP motif e. 

protein can find application in studying the expression profile of brain-specific 


The new 
genes . 

similarity to Synechocystis sp. (PCC 6803) 
complete cDNA, complete cds, EST hits 

potential start at bp 67 matches kozak consensus ANCatgG 

Sequenced by Qiagen 

Locus: unknown 

Insert length: 1787 bp 

Poly A stretch at pos. 1768, polyadenylation signal at pos. 1743 

1 TGGGGCGGCC GCGGCGGGAA CATGGAGGAG CTGCTGAGGC GCGAGCTGGG 
51 CTGCAGCTCT GTCAGGGCCA CGGGCCACTC GGGGGGCGGG TGCATCAGCC 
101 AGGGCCGGAG CTACGACACG GATCAAGGAC GAGTGTTCGT GAAAGTGAAC 
151 CCCAAGGCGG AGGCCAGAAG AATGTTTGAA GGTGAGATGG CAAGTTTAAC 
201 TGCCATCCTG AAAACAAACA CGGTGAAAGT GCCCAAGCCC ATCAAGGTTC 
251 TGGATGCCCC AGGCGGCGGG AGCGTGCTGG TGATGGAGCA CATGGACATG 
301 AGGCATCTGA GCAGTCATGC TGCAAAGCTT GGAGCCCAGC TGGCCGATTT 
351 ACACCTTGAT AACAAGAAGC TTGGAGAGAT GCGCCTGAAG GAGGCGGGCA 
401 CAGTGTGGAG AGGAGGTGGG CAGGAGGAAC GGCCCTTTGT GGCCCGGTTT 
451 GGATTTGACG TGGTGACGTG CTGTGGATAC CTCCCCCAGG TGAATGACTG 
501 GCAGGAGGAC TGGGTCGTGT TCTATGCCCG GCAGCGCATT CAGCCCCAGA 
551 TGGACATGGT GGAGAAGGAG TCTGGGGACA GGGAGGCCCT CCAGCTTTGG 
601 TCTGCTCTGC AGTAAAAGAT CCCTGACCTG TTCCGTGACC TGGAGATCAT 
651 CCCAGCCTTA CTCCACGGGG ACCTCTGGGG TGGAAACGTA GCAGAGGATT 
701 CCTCTGGGCC GGTGATTTTT GACCCAGCTT CTTTCTACGG CCACTCGGAA 
751 TATGAGCTGG CAATAGCTGG CATGTTTGGG GGCTTTAGCA GCTCCTTTTA 
801 CTCCGCCTAC CACGGCAAAA TCCCCAAGGC CCCAGGATTC GAGAAGCGCC 
851 TTCAGTTGTA TCAGCTCTTT CACTACTTGA ACCACTGGAA TCATTTTGGA 
901 TCGGGGTACA GAGGATCCTC CCTGAACATC ATGAGGAATC TGGTCAAGTG 
951 AGCGGGCCTT ACTCTGGAAG GAGGTCTCAG AGGTTTCTCC ACAGTCCTCT 
1001 TCTGGGCAAA TTCTTGTTTC TTCACATGCC GGACTAGCTT AAGACCAATG 
1051 CAGTAGCTTA TTTCCAAGCC TTGCAAAGTA TATAATATCT AAGAGGAAAG 
1101 GTTTTGTCAT CCCAGCGTTG TCCACTTTGT GGGGCTTTGT AGGTAGACGG 
1151 AGCCACACTA CAGGCAGGGT ATGAGCAGAG GGATGTATGG AGTGTGGGCG 
1201 ACTCTGAGCC TCACTGCTGC TGCAAGGTGG GGAAACTGTA AGTGAACCCC 
1251 TGTGGGTGCG GGGGAGGGTA TCCGGTGCGC AGGGAGGTGG CCAGCGCCCC 
1301 CGGGCACTGC TGCTCATAGG TACCTTTCCG CTGCCTCCTC CCTGCTCTCC 
1351 TGTGCAGGAA TGTCTCTGAG CTGTTCACGT TGATGCTTCT TGGTTGGCAA 
1401 GACTTGGGTG TAGACATGAA ACCACCTTAC TAAAAGCGTC TTAAAATGAC 
1451 CAATTCCAGA ATCAAGCGTA TTCCGTTTTC CTCCTGCATG ATCCCTGGGC 
1501 CCTCCCGCAG GCTGAGCAAG TCTGTAAACT GATTCTGGGA GAAACCAAGC 
1551 TGCTGGCCGT AGGATGTCCT TGGGTACATC CAGGAGTCTT CATTGCTTCT 
1601 GTTATTACCC CGTCTCCTCT GCCATTTTCT ACAGCTTGCT GAGTTGTCAT 
1651 TCCTTTGCAA CATTAAAATA CATGCTGAAC TCATATTTTT CCTTCCTTCA 
1701 CTGTTGTAGT AAAGAGACAT ATTTCATGAA TGGCATTGAT GCTAATAAAC 
1751 CCTTTGCCCA AAAATTTGAA AAAAAAAAAA AAAAAAA 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 1 
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ORF from 22 bp to 612 bp; peptide length: 197 
Category: similarity to unknown protein 
Prosite motifs: LEUCINEZIPPER (117-139) 


1 MEELLRRELG CSSVRATGHS GGGCISQGRS YDTDQGRVFV KVNPKAEARR 
51 MFEGEMASLT AILKTNTVKV PKPIKVLDAP GGGSVLVMEH MDMRHLSSHA 
101 AKLGAQLADL HLDNKKLGEM RLKEAGTVWR GGGQEERPFV ARFGFDWTC 
151 CGYLPQVNDW QEDWVVFYAR QRIQPQMDMV EKESGDREAL QLWSALQ 

BLAST? hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_2d20, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_2d20, frame 1 


Report for DKF2phfbr2_2d20. 1 


( LENGTH! 197 

[MWJ 21963.25 

(pi) 6.96 

(HOMOLJ PIR:S76790 hypothetical protein - Synechocystis sp. (strain FCC 6803) 9e-12 

(SUPFAM) hypothetical protein bl725 le-06 

(PROSITEJ LEOCINE_ZIPPER 1 

(PROSITE J MYRISTYL 2 

[ PROSITE) GLYCOSAMINOGLYCAN 1 

(PROSITE) PKC_PH0SPHO_SITE 2 

(KW] Alpha_Beta 


SEQ MEELLRRELGCSSVRATGHSGGGCISQGRSYDTDQGRVFVKVNPKAEARRMFEGEMASLT 

PRD ccchhhhhccccceeeeccccccceeeccccccccceeeeeeccchhhhhhhhhhhhhhh 

SEQ AILKTNTVKVPKPIKVLDAPGGGSVLVMEHMDMRHLSSHAAKLGAQLADLHLDNKKLGEM 

PRD hhhhhheeeeccceeeecccccceeeeecccccccchhhhhhhhhhhhhhhcccccchhh 

SEQ RLKEAGTVWRGGGQEERPFVARFGFDWTCCGYLPQVNDWQEDWWFYARQRIQPQMDMV 

PRD hhhhhccccccccccccceeeccccceeeccccccccccccchhhhhhhhhhhhhhhhhh 

SEQ EKESGDREALQLWSALQ 

PRD hhhccchhhhhhhhccc 


Prosite for DKFZphfbr2_2d20. 1 


PS00002 
PS00005 
PS00005 
PS00008 
PS00008 
PS00029 


20->24 
13->16 
67->70 
22->28 
104->110 
96->llB 


glycosaminoglycan 
pkc_phospho_site 
pkc phospho^site 
myrTstyl 
myristyl 
leucine zipper 


PDOC00002 
PDOC00005 
PDOC00005 
PDOC00008 
PDOC00008 
PD<X00029 


(No Pfam data available for DKFZphfbr2_2d20 . 1) 
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DKFZphfbr2_2gl8 


group: brain derived 

ginf proSuct" ^ ^'"^''^ ^^'^ ^'^'^'^ "'"^ ^""'"^^ similarity to the humane 

NO informative BLAST results; No predictive prosite, pfam or SCOP motife. 

genes^'' Protein can find application in studying the expression profile of brain-specific 

J30M3.2 extension of genmodel 

complete cDNA, complete cds, EST hits 
(mouse ESTs with >90% Identities) 

Sequenced by Qiagen 

Locus: /map="'6p22 .1-22" 

Insert length: 2444 bp 

Poly A stretch at pos. 2425, no polyadenylation signal found 

1 TGGTCGAGGG TCGACGGTAT CGATAAGTTT TTTTTTTTTT tTTTTTTTTT 
51 TGGAAAGCAA GGATCACACT TCCCCCTCCC TGTTCCTTAA TCCCTTTTCT 

101 AAAAAGGGGG GAAAATCCGG ATGGATTTTA GGGATTGGTC TGGTGTCAGC 

151 TGTGTCTTAT TGCACACCTA AATCCTGATT ATAGGCTTTT CATTTCTCCG 

201 CAAAGCCTTT ATTTTGGCAG TTAAGCCAAA TGTGTTTTCC AGAAAGTTAG 

251 TTATTTTCTC CTCTTTCTTT CCTTTCTTTC CTCCCTTTTT CCCGTCTGAC 

301 CCCAAACGTT ATTGTCCAAA CATGACTGGA CAGCAGCTTT TGTTTCTTGA 

351 CCCTGTAATA TGACAGTCTG CTAATATTGA CAGAAGGTGC AGTTTTTGGG 

401 TTATAGTCGT GATTTTCGCT AATCAATCAT ATTAGCAGGA AAAAAAATGA 

451 CTTGTTTCTG TTGTACTTGA GTCTTAAGAA AAAGTGCCCA TAGTTTAGTG 

501 ACAATTTCCA AAGGCTTTAG TACCACCTGT ATTTCAAAAT GGGGGACCCA 

551 AACTCCCGGA AGAAACAAGC TCTGAACAGA CTACGTGCTC AGCTTAGAAA 

601 GAAAAAAGAA TCTCTAGCTG ACCAGTTTGA CTTCAAGATG TATATTGCCT 

651 TTGTATTCAA GGAGAAGAAG AAAAAGTCAG CACTTTTTGA AGTGTCTGAG 

701 GTTATACCAG TCATGACAAA TAATTATGAA GAAAATATCC TGAAAGGTGT 

751 GCGAGATTCC AGCTATTCCT TGGAAAGTTC " CCTAGAGCTT TTACAGAAGG 

801 ATGTGGTACA GCTCCATGCT CCTCGATATC AGTCTATGAG AAGGGATGTA 

851 ATTGGCTGTA CTCAGGAGAT GGATTTCATT CTTTGGCCTC GGAATGATAT 

901 TGAAAAAATC GTCTGTCTCC TGTTTTCTAG GTGGAAAGAA TCTGATGAGC 

951 CTTTT AGGCC TGTTCAGGCC AAATTTGAGT TTCATCATGG TGACTATGAA 
1001 AAACAGTTTC TGCATGTACT GAGCCGCAAG GACAAGACTG GAATCGTTGT 
1051 CAACAATCCT AACCAGTCAG TGTTTCTCTT CATTGACAGA CAGCACTTGC 
1101 AGACTCCAAA AAACAAAGCT ACAATCTTCA AGTTATGCAG CATCTGCCTC 
1151 TACCTGCCAC AGGAACAGCT CACCCACTGG GCAGTTGGCA CCATAGAGGA 
1201 TCACCTCCGT CCTTATATGC CAGAGTAGAG TACTGACCAG CAAAATGGAG 
1251 AAGATCAGAG AATGCAGCAG CAGTTTTTTT TCTTGTTTTC TTACCACTTT 
1301 ATTCTTTCAG AGTTTAAAGA AAATGGACTC ATGCACAGAA CACTATGCAT 
1351 TTTGAAACTT GTTCATCCTG GATTTTTTTA AATCATTTTT ATCTCAGAAC 
1401 TTAAACAAAA ATTAGATGTC GTGCACGGAC TGTGTGAAAG AAGATGCTTT 
14 51 GCATATTTGC TGCACTGCAT CAGTATCTTA CTAAAAATGT GAAATGAAAG 
1501 GACTATTGTA CACTGAAATG CTTAAATGTA TCTGAAAGCA CAAGGTGATA 
1551 CTCATTTTTA TGGTCTTCCC ATTTGTGCTG GTTTTTGCCT CTTTGACATC 
1601 TGTCATCAGT ATTTAGAGGG TGAGAAGTGA ATGTAACAGG TATAAATAAC 
1651 ATTTTTAAAA ACAATAACTT TGCTATAATC ACAGTTGTTC CAGAGCACTG 
1701 TCAGATACAT TCTAATGACC AGAACTGGTT TAAAAAAAGA AAATACAACC 
1751 ATGGGAAAGA AATCTTAAAT GAAAAACGCA TCTCATTGTA GGCATTTTTG 
1801 CCTCATATTT TACTGGGCCA TGTTTGTTTC CTGGTACTCA TGTATTTTTT 
1851 TTTTTTCCAG ATCTCTTTCC CCAAGTTGCT ATTGTAAGAG TATTCTGCTG 
1901 CGTGTGGATG CAGTTATACA CATTAAAGCA GATCTGGAGT CTGAAGTAGC 
1951 TATAAAGCAG CTATAAAACA GAAATACATG CATAGCTGCA GAAACCATGA 
2001 TAGGTAGAGG ACTTTTCTTT TGGTTTTGTT TTGTTTTGTT TTGTTTTGTt/ 

2051 tttggtttta cagagaagag atttttatta caaagaaaaa aattccagtg 

2101 AATTGTGCAG AAATGCTGGT TTTTACACCA TCCTAAAGAA AAACTTTACA 
2151 AGGGTGTTTT GGAGTAGAAA AAAGGTTATA AAGTTGGAAT CTTAAATTGT 
2201 AAAATTAACC ATTGAGTGTC AAAGTTCTAA AAGCAGAACT CATTTCGTGC 
2251 AATGAACATA AGGAAAGACT ACTGTATAGG TTTTTTTTTT TCTCCTTTTA 
2301 AATGAAGAAA AGCTTTGCTT AAGGGTTGCA TACTTTTATT GGAGTAAATC 
2351 TGAATGATCC TACTCCTTTG GAGTAAGACT AGTGCTTACC AGTTTCCAAT 
2401 TGTATTTAGC TTCTGTTGGA ATTTGAAAAA AAAAAAAAAA AAAA 


BLAST Results 
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Entry HS338352 from database EMBL: 
human STS EST171398. 
Score = 1747, P = 3.0e-74, identities = 359/365 

Entry HS447255 from database EMBL: 
human STS SHGC-10143. 
score = 1717, p = 6.5e-73, identities « 365/383 

Entry HS30M3 from database EMBLNEW: 

Human DNA sequence from clone 30M3 on chromosome 6p22. 1-22.3. Contains 
three novel genes, one similar to C. elegans y63D3A.4 and one similar 
to {predicted) plant, worm, yeast and archaea bacterial genes, and the 
first exon of the KIAA0319 gene. Contains ESTs, GSSs and putative CpG 
islands . 

Score = 6646, P = O.Oe+00, identities = 1344/1355 


Medline entries 


No Medline entry 


Peptide information for frame 2 


ORF from 539 bp to 1225 bp; peptide length: 229 
Category: putative protein 


1 MGDPNSRKKQ ALNRLRAQLR KKKESLADQF DFKMYIAFVF KEKKKKSALF 
51 EVSEVIPVMT NNYEENILKG VRDSSYSLES SLELLQKDW QLHAPRYQSM 
101 RRDVIGCTQE MDFILWPRND lEKIVCLLFS RWKESDEPFR PVQAKFEFHH 
151 GDYEKQFLHV LSRKDKTGIV VNNPNQSVFL FIDRQHLQTP KNKATIFKLC 
201 SICLYLPQEQ LTHWAVGTIE DHLRPYMPE 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_2gl8, frame 2 

TREMBLNEW:HS30M3_2 gene: "dJ30M3.2**; product: **dJ30M3.2 (novel 
protein)"; Human DNA sequence from clone 30M3 on chromosome 
6p22. 1-22.3. Contains three novel genes, one similar to C. elegans 
Y63D3A.4 and one similar to (predicted) plant, worm, yeast and archaea 
bacterial genes, and the first exon of the KIAA0319 gene. Contains 
ESTs, GSSs and putative CpG islands., N = 1, Score = 470, P = l.le-4 4 


>TREMBLNEW:HS30M3_2 gene: "dJ30M3.2"; product: "dJ30M3.2 (novel protein)-; 
Human DNA sequence from clone 30M3 on chromosome 6p22. 1-22.3. Contains 
three novel genes, one similar to C. elegans Y63D3A.4 and one similar to 
(predicted) plant, worm, yeast and archaea bacterial genes, and the first 
exon of the Ki;^0319 gene. Contains ESTs, GSSs and putative CpG islands. 
Length =86 

HSPs: 

Score = 470 (70.5 bits). Expect = l.le-44, P = l.lc-44 
Identities « 86/86 (100%), Positives - 86/86 (100%) 

Query: 144 AKFEFHHGDYEKQFLHVLSRKDKTGIVVNNPNQSVFLFIDRQHLQTPKNKATIFKLCSIC 203 

AKFEFHHGDYEKQFLHVLSRKDKTGIVVNNPNQSVFLFIDRQHLQTPKNKATIFKLCSIC 
Sbjct: 1 AKFEFHHGDYEKQFLHVLSRKDKTGIVVNNPNQSVFLFIDRQHLQTPKNKATIFKLCSIC 60 

Query: 204 LYLPQEQLTHWAVGTIEDHLRPYMPE 229 

LYLPQEQLTHWAVGTIEDHLRPYMPE 
Sbjct: 61 LYLPQEQLTHWAVGTIEDHLRPYMPE 86 


Pedant information for DKF2phfbr2_2gl8, frame 2 


Report for DKFZphfbr2_2gl8.2 
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[LENGTH) 229 

t^W] 27083.42 

[pri 9.04 

{HOMOL] TREMBL:HS30M3_2 gene: "dJ30M3.2"; product: ■'dJ30M3.2 (novel protein)"; Human 

DNA sequence from clone 30M3 on chromosome 6p22. 1-22.3. Contains three novel genes, one 
Similar to C. elegans Y63D3A.4 and one similar to (predicted) plant, worm, yeast and archaea 
bacterial genes, and the first exon of the KIAA0319 gene. Contains ESTs, GSSs and putative CpG 

islands. 6e-47 


(PROSITE} 

[PROSITE] 

(PROSITE J 

[PROSITE) 

[PROSITE] 

[PROSITEl 

[KW] 

(KWJ 


MYRISTYL 2 
CAMP_PHOS PHO_S ITE 
CK2_PH0S PHO_S I TE 
TYR PHOSPHORS ITE 
PKC~PH0SPHO SITE 
ASN_GLYCOSYLAT ION 
Alpha_Beta 

LOW COMPLEXITY 


5.24 % 


SEQ MGDPNSRKKQALNRLRAQLRKKKESLADQFDFKMYIAFVFKEKKKKSALFEVSEVIPVMT 


PRO cccccchhhhhhhhhhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhheeeeec 

SEQ NNYEENILKGVRDSSYSLESSLELLQKDWQLHAPRYQSMRROVIGCTQEMDFILWPRND 

SEG xxxxxxxxxxxx 

PRO cchhhhhhhcccccccccchhhhhhhhhhhhhhccccccccceeecccccceeeecccch 

SEQ lEKIVCLLFSRWKESDEPFRPVQAKFEFHHGDYEKQFLHVLSRKDKTGIVVNNPNQSVFL 
SEG , 

hhhhhhhhhhhccccccccccccccccccccchhhhhhhhhhhcccceeeeccccceeee 
FIORQHLQTPKNKATIFKLCSICLYLPQEQLTHWAVGTIEDHLRPYMPE 


eeecccccccccceeeeeeeeeeeeeccccccccceeeecccccccccc 


PRD 

SEQ 
SEG 
PRD 


Prosite for DKFZphfbr2_2gl8 .2 


PSOOOOl 

175- 

>179 

PS00004 

22 

->26 

PS00004 

44 

->48 

PS00005 


6->9 

PS00005 

99- 

>102 

PS00005 

162- 

>165 

PS00005 

189- 

>192 

PS00006 

25 

->29 

PS00006 

80 

->84 

PS00006 

162- 

>166 

PS00006 

■ 218- 

>222 

PS00007 

69 

->77 

PS00008 

70 

->76 

PS00008 

168- 

>174 


ASN_GLYCOSYLATI0N 
CAMP_PHOSPHO_S ITE 
CAMP_PHOSPHO_SITE 
PKC_PHOS PHO_S I TE 
PKC_PHOS PHO_S I TE 
PKC_PHOS PHO_S I TE 
PKC_PHOS PHO_S I TE 
CK2_PHOS PHO_S ITE 
CK2_PH0SPHO_SITE 
CK2_PH0SPH0_SITE 
CK2_PHOSPHO_SITE 
TYR_PHOSPHO_SITE 
MYRISTYL 
MYRISTYL 


PDOCOOOOl 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
POOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00008 
PDOC00008 


(NO Pfam data available for DKFZphfbr2_2gl8.2) 
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DKFZphfbr2_2hl 
group: brain derived 

DKF2phfbr2 2hl encodes a novel 180 amino acid protein with weak similarity to C eleqans 
D2007.4 protein 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 

similarity to C.elegans D2007.4 protein 
CpG island in 5* region, complete cDNA 
Sequenced by Qiagen 
Locus : unknown 
Insert length: 957 bp 

Poly A stretch at pos. 939, polyadenylation signal at pos. 916 

1 GGGGGTCCCT GACTTTATAT GGCTGCTCCT GGCGAGCGAC TGAGTCGTCC 
51 GTGAGGAAAA AGAGGCGAGG CTTTTCCGAG ATCGTCTCAG CGATGGCGCT 
101 TCGGTCGCGG TTTTGGGGGT TGTTCTCGGT TTGCAGG/VAC CCTGGGTGCA 
151 GGTTCGCAGC CCTGTCAACC AGCTCCGAGC CGGCAGCGAA ACCTGAAGTG 
201 GACCCTGTGG AAAATGAAGC TGTCGCCCCA GAATTCACCA ACCGGAACCC 
251 CCGGAACCTG GAGCTTTTGT CTGTAGCCAG GAAAGAGCGG GGCTGGCGGA 
301 CGGTGTTTCC CTCCCGTGAG TTCTGGCACA GGTTGCGAGT TATAAGGACT 
351 CAGCATCATG TAGAAGCACT TGTGGAGCAT CAGAATGGCA AGGTTGTGGT 
40 X TTCGGCCTCC ACTCGTGAGT GGGCTATTAA AAAGCACCTT TATAGTACCA 
451 GAAATGTGGT GGCTTGTGAG AGTATAGGAC GAGTGCTGGC ACAGAGATGC 
501 TTAGAGGCGG GAATCAACTT CATGGTCTAC CAACCAACCC CGTGGGAGGC 
551 AGCCTCAGAC TCGATGAAAC GACTACAAAG TGCCATGACA GAAGGTGGTG 
601 TGGTTCTACG GGAACCTCAG AGAATCTATG AATAAATGGA AGCATTAATT 
651 GTTTTGAACA TGTAAATATA AATCTGTCAG CCACTACAGC CATCAAAAGA 
701 GAGCATCTGG AAGAACAGCC AGCTTGGAAG TTTTACAGCA ATAATGTTGC 
751 AGTGGAATAT TATTTGTAGT TAAGGTCATC CTCCTCCCCT TTCTGTTTTT 
801 TTAAATCAAG AACTACGTTC TGCCCCTCTC TTGGGCTTCA GAAGCATCTA 
851 AGAAAAGCAG TCATCAATTA TAATTAACTT TCAAAGGGCA AGTCAGAAGT 
901 TGTTTATAAA TTACAAAATA AAGGCATATT ATGAACTCTA AAAAAAAAAA 
951 AAAAAAA 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 3 


ORF from 93 bp to 632 bp; peptide length: 180 
Category: similarity to known protein 
Classification: unset 


1 MALRSRFWGL FSVCRNPGCR FAALSTSSEP AAKPEVDPVE NEAVAPEFTN 
51 RNPRNLELLS VARKERGWRT VFPSREFWHR LRVIRTQHHV EALVEHQNGK 
101 WVSASTREW AIKKHLYSTR NWACESIGR VLAQRCLEAG INFMVYQPTP 
151 WEAASDSMKR LQSAMTEGGV VLREPQRIYE 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_2hl, frame 3 
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PIR:S44789 D2007.4 protein - Caenorhabditis elegans, N = 1, Score = 
194, p « 2e-15 

PIR:JC5753 ribosomal protein L18 - Vibrio proteolyticus, N = 1, Score 
121, P = l.le-07 


>PIR:S44789 D2007.4 protein - Caenorhabditis elegans 
Length = 170 

HSPs: 

Score = 194 (29.1 bits). Expect = 2.0e-15, P = 2.0e-15 
Identities = 51/134 (38%), Positives = 78/134 (58%) 

Query: 48 FTNRNPRNLELLSVARKERGWRTVFP— SREFWHRLRVIRTQHHVEA-LVEHQNGKVVVS 104 

F NRNPRN EL+ G++ +R + +++ ++ + H E LV +Q+G VV+S 

Sbjct: 9 FVNRNPRNNELMGRQAPNTGYQFEKDRAARSYIYKVELVEGKSHREGRLVHYQDG-WIS 67 

Query: 105 ASTREWAIKKHLYSTRNWACESIGRVLAQRCLEAGINFMVYQPTPWEAASDSMKRLQ— 162 

AST+E +1 LYS + A +IGRVLA RCL++GI+F + T EA S + 
Sbjct: 68 ASTKEPSIASQLYSKTDTSAALNIGRVLALRCLQSGIHFAMPGATK-EAIEKSQHQTHFF 126 

Query: 163 samteggvvlrepqri 178 

A+ E G+ L+EP + 
Sbjct: 127 KALEEEGLTLKEPAHV 142 


Pedant information for DKF2phfbr2_2hl, frame 3 


Report for DKFZphfbr2_2hl . 3 


{LENGTH) 180 

IMW] 20576.57 

IpIJ 9.63 

[HOMOLJ PIR: 344789 D2007 . 4 protein - Caenorhabditis elegans 2e-13 

(FUNCATJ j mrna translation and ribosome biogenesis (H. influenzae, H10794J 2e-04 

(SUPFAMI Escherichia coli ribosomal protein L18 8e-06 

(KWl Alpha Beta 


SEQ MALRSRFWGLFSVCRNPGCRFAALSTSSEPAAKPEVDPVENEAVAPEFTNRNPRNLELLS 

PRD ccccccceeeeeeeecccccceeeecccccccccccccccceeeecccccccccchhhhh 

SEQ VARKERGWRTVFPSREFWHRLRVI RTQHHVEALVEHQNGKWVSASTREWAI KKHLYSTR 

PRD hhhhcccccccchhhhhhhhhhccccchhhhhhhhhcccceeeeechhhhhhhhhhhhcc 

SEQ NWACESIGRVLAQRCLEAGINFMVYQPTPWEAASDSMKRLQSAMTEGGVVLREPQRIYE 

PRD ccceeehhhhhhhhhhhhhcceeeeeccccchhhhhhhhhhhhhhhccceeecccccccc 


(No Prosite data available for DKFZphfbr2_2hl . 3) 
(No Pfam data available for DKFZphfbr2_2hl . 3) 
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DKrZphfbr2_2hlO 


group: brain derived 

DKFZphfbr2_2hlO encodes a novel 220 amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 


unknown 

complete cDNA, complete cds, EST hits 

Sequenced by Qiagen 

Locus : unknown 

Insert length: 2176 bp 

Poly A stretch at pos. 2161, polyadenylation signal at pos. 2143 


1 TGGGGAGTAT TCTAATTATA TTTTATATTT AATAAATTAT TTTTCTATTT 
51 CTTTGTTATA TTAAGTTGCA CACTTGTTTC TTTTATCCAG AAAGTTTAGT 
101 ATAATA7VAAA TAGTTTTAAG ATTAACTGTG AATGTAAAGG AAAAGTATTA 
151 TTAATTATTT CAGGAAATTG CAAGACCTAA CATGGCTGAA AGAGAAACAG 
201 AAACATCAAA TTCTGAAAGT AAACAAGATA AAGCTGCTTC TTCAAAAGAA 
251 AAAAATGGAT GTAATGCAAA TTCATTTGAA GGCTCATCAA CAACAAAAAG 
301 TGAAGAAAGC ATAACAGTTT CAGATAAGGA AAATGAAACC TGTCTTGCAG 
351 ACCAGGAAAC TGGCTCAAAA AACATCGTCA GTTGTGATTC AAATATTGGT 
401 GCAGATAAAG TGGAAAAGAA AAAACAAATA CAACACGTTT GTCAGGAAAT 
451 GGAGTTGAAG ATGTGCCAGA GTTCAGAAAA CATAATCTTA TCTGATCAGA 
501 TTAAAGATCA CAACTCCAGT GAAGCCAGAT TTTCTTCAAA GAATATTAAG 
551 GATTTGCGAT TAGCATCAGA TAATGTAAGC ATTGATCAGT TTTTGAGAAA 
601 AAGACATGAA CCTGAATCTG TTAGTTCTGA TGTTAGCGAG CAAGGCAGTA 
651 TTCATTTGGA ACCTCTGACT CCATCCGAGG TACTTGAGTA TGAAGCCACA 
701 GAGATTCTTC AGAAAGGTAG TGGTGATCCT TCAGCCAAGA CTGATGAAGT 
751 AGTGTCTGAT CAAACAGATG ACATTCCTGG AGGAAATAAC CCTAGCACAA 
801 CAGAGGCAAC AGTAGACCTG GAAGATGAAA AAGAAAGAAG TTGAAATTAG 
851 TCATTTTAAG TTTCAGTGTA CCAACGATAA GGGCATTTGG AACAGTGCTA 
901 TCAGGTGAGC TCAGTGGTGC TGTTGTAGGT TCAGAAATGG AAATATGTAA 
951 GGGAGGTCAC ACATACACTT TACCTGTATG TTCAACCTAT GTTATCAAAC 
1001 AAACCAATTC ACCAATAATA GCATGATTAG TAGGGATTCC CAAAAAGTTT 
1051 TTAAAAACAC GAACAGGATT TTAATGATAA TTAAATTTGC AGTGGAAAGG 
1101 TCTCATTTAA TGGTTTTCAA GGAAATGGGA TTTGGTTGCT GACATGAATT 
1151 GATGATATTA GTAATATTTA TAAAGCCTTT CAAACTTCCA TCAATCCTAA 
1201 GCTAAAAATC TTTATTACCT GTATATCCTT TTCAGTTAAC TGAGAGGAAG 
1251 GGATTTGGAA ACCATGTACT TTTGGGGAGT AATTGATTAA AAACAATGGC 
1301 TGATTGGCAT TGTTAATGAA GGCTTTATTT GTGAGGATGA TGCTGGTAAA 
1351 TGGAGCATGC TTAGAGTACT AAATTGATCT AATGAGAATT TGGATGAACA 
1401 TAAACTTAAT TTTGGATTTA ATATAACATT CCAGTCAGAC GCATGTAAAC 
1451 AGAATATTTG AATCTTTGTA CCTCCATACA AGTGTTAGCC TGCCAGGCTG 
1501 TAAGCTTACC TTAATTAAAC TTTCAGTGAA AGTGGAATTA TTAAGATATA 
1551 AATTTATATT TGTGCTTTTT GTCAGTGTGT AAGCTGTGTA GAAATTCTTT 
1601 GATGTATTAG TTGTATTAAT GTAAAGTAGA AACCCATTCT TGAAACTCCT 
1651 GTAGCTATTA TGCTTTTAAT ATTGTTTTAA TGTTCTTCCT TAGAAATAGG 
1701 CCCATAAAAA TGGTCTGGAA GCCAAACCAA AGTATGGTAT AATGTAGATA 
17 51 TTGTAAAGCA GTAAACTGAA AACATGTCCT GGCATGTATT CAGCCATGTT 
1801 TAAGTGACTT TTCTGTAATT GTAAAATAAA AACTTCAAAT GGGACCTAAA 
1851 ACAGTGATGT AAAAGAACTG GTTTTGGAAA TTTAGCCTAA TTTATCTATA 
1901 AGATGGCTGC TAAATTGATT TTTCAGTTCT TTTTATCATC TAAAATATAA 
1951 TAGATATAGA AATGAATAAT ATGAAGAACA GTAGTTTGCT TTGAAATACT 
2001 AATAAACTTT TATTTAAGAT GCTTCATTTT TACTTCTTAA AACGTGCTTT 
2051 GGATTCTTAA ATTTTGTTTC ACTGAATGTT CAATGTTTTA AATGGCGATT 
2101 7VAAATACTCT GCTGTATATA GTAGTTTTTG AGTAAATATT TGCAATAAAA 
2151 ATCTGCCCCC GAAAAAAAAA AAAAAA 


BLAST Results 


Entry G35287 from database EMBL: 
human STS SHGC- 37375. 
Score = 2163, P = 2.8e-91, identities = 437/441 
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Medline entries 


No Medline entry 


Peptide information for frame 2 


ORF from 182 bp to 841 bp; peptide length: 220 
Category: putative protein 


1 MAERETETSN SESKODKAAS SKEKNGCNAN SFEGSSTTKS EESITVSDKE 
51 NETCLADQET GSKNIVSCDS NIGADKVEKK KQIQHVCQEM ELKMCQSSEN 
101 IILSDQIKDH KSSEARFSSK NIKDLRLASD NVSIDQFLRK RHEPESVSSD 
151 VSECJGSIHLE PLTPSEVLEY EATEILQKGS GDPSAKTDEV VSDQTDDIPG 
201 GNNPSTTEAT VDLEDEKERS 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_2hl0, frame 2 
No Alert BLASTP hits found 

Pedant information for DKF2phfbr2_2hlO, frame 2 


Report for DKFZphfbr2__2hl0.2 


[LENGTH] 220 

[MW} 24109.02 

rpll 4.51 

[FUNCAT] 04,99 other transcription activities 

[FUNCATl 30.10 nuclear organization 

[PROSITEJ MYRISTYL 3 

[PROSITE] CK2_PH0SPHO_SITE 8 

(PROSITEJ PKC_PHOSPHO SITE 5 

(PROSITEJ ASN_GLYCOSYLATI0N 3 

IPFAMJ TNFR/NGFR cysteine-rich region 

[KWJ Alpha_Beta 


{S. cerevisiae, YKR092c] 4e-05 
(S. cereyisiae, YKR092cJ 4e-05 


SEQ MAERETETSNSESKQDKAASSKEKNGCNANSFEGSSTTKSEESITVSDKENETCLADQET 

PRO cccccccccccccchhhhhhhhccccccccccccccccceeeeeeeeccccccccccccc 

SEQ GSKNIVSCDSNIGADKVEKKKQIQHVCQEMELKMCQSSENIILSDQIKDHNSSEARFSSK 

PRO cccceeeecccccchhhhhhhhhhhhhhhhhhhhhhccceeeeccccccccccccccccc 

SEQ NIKDLRLASDNVSIDQFLRKRHEPESVSSDVSEQGSIHLEPLTPSEVLEYEATEILQKGS 

PRO cchhhhhhcccchhhhhhhhcccccccccccccccceeecccccccchhhhhhhcccccc 

SEQ GDPSAKTDEWSDQTDDI PGGNNPSTTEATVDLEDEKERS 

PRD ccccccccccccccccccccccccccceeeehhhhhhccc 


Prosite for DKFZphfbr2_2hl0. 2 


PSOOOOl 

51 

->55 

ASM 

GLYCOSYLATION 

PDOCOOOOl 

PSOOOOl 

111- 

>115 

asn' 

"glycosylation 

PDOCOOOOl 

PSOOOOl 

131- 

>135 

asn' 

"GLYCOSYLATION 

PDOCOOOOl 

PS00005 

20 

->23 

PKC" 

PHOSPHO 

SITE 

PDOC00005 

PS00005 

37 

->40 

PKC' 

"pHOSPHO" 

"site 

PI)OC00005 

PS00OO5 

47 

->50 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

118- 

>121 

PKC'PHOSPHO SITE 

PDOC00005 

PS00005 

184- 

>187 

PKC 

PHOSPHO 

SITE 

PDOC00005 

PS00006 

9 

->13 

CK2~ 

""PHOSPHO" 

"site 

PDOC00006 

PS00006 

13 

->17 

CK2' 

"PHOSPHO" 

'site 

PDOC00006 

PS00006 

20 

->24 

CK2' 

"PHOSPHO" 

"site 

PDOC00006 

PS00006 

38 

->42 

CK2' 

'pHOSPHO' 

"site 

PDOC00006 

PS00006 

45 

->49 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

47 

->51 

CK2~PH0SPH0 SITE 

PDOC00006 

PS00006 

163-: 

>167 

CK2 

PHOSPHO SITE 

PDOC00006 

PS00006 

205-; 

>209 

CK2" 

'pHOSPHO SITE 

PDOC00006 

PS00008 

26 

->32 

MYRISTYL 

PDOC00008 
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PS00008 
PS00008 


34->40 
201->207 


MYRISTYL 
MYRISTYL 


PDOC00008 
PDOC00008 


Pfam for DKF2phfbr2_2hlO. 2 


HMM_NAME TNFR/NGFR cys teine-rich region 

HMM *CpeG. tYtD.WNHvpqClpCtrCePEMGQYMvqPCTwTQNTVC* 

+E+ T +D +N ++C E G+ + +C+++ + 

Query 40 SEESITVSDKEN--ETC— LADQET— GSKNIVSCDSNIGADK 


76 
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DKF2phfbr2 2il7 


group: intracellular transport and trafficking 

DKFZphfbr2_2il7.3 encodes a novel 201 amino acid putative GTP-binding protein related to 

rt^nS?«»^"^ ■"^'^"^ superfamily of GTPases. Rab proteins are localised to the 

^^H^^^ °^ organelles and vesicles involved in the secretory (biosynthetic) and 

trfn^nor^^^ pathways in eukaryotic cells. Rab proteins direct the targeting and fusion of 
transport vesxcles to their acceptor membranes. RablB is essential fo? the intrace^^uLr 
medi^for "^T' density lipoprotein (ldl) receptor, it is discussed as a universal 
mediator of endoplasn^tic reticulum to Golgi transport of membrane glycoproteins in ^I^Uan 

I!;:irc^uTe",p:2ifi;y ^fXloL^^e^^^t^r """^"^"^"^ glycoproteins 


Medline 


96245776: Intracellular transport and maturation of nascent low density 
lipoprotein receptor is blocked by mutation in the Ras-related 
GTP-binding protein, RABIB 

strong similarity to rabl 

complete cDNA, complete cds, start at 47, EST hits 

Sequenced by Qiagen 

Locus : unknown 

Insert length: 1985 bp 

Poly A stretch at pos. 1901, polyadenylation signal at pos. 1859 

1 GGGAGCAGAG TCGACTGGGA GCGACCGAGC GGGCCGCCGC CGCCGCCATG 
51 AACCCCGAAT ATGACTACCT GTTTAAGCTG CTTTTGATTG GCGACTCAGG 
101 CGTGGGCAAG TCATGCCTGC TCCTGCGGTT TGCTGATGAC ACGTACACAG 
151 AGAGCTACAT CAGCACCATC GGGGTGGACT TCAAGATCCG AACCATCGAG 
201 CTGGATGGCA AAACTATCAA ACTTCAGATC TGGGACACAG CGGGCCAGGA 
251 ACGGTTCCGG ACCATCACTT CCAGCTACTA CCGGGGGGCT CATGGCATCA 
301 TCGTGGTGTA TGACGTCACT GACCAGGAAT CCTACGCCAA CGTGAAGCAG 
351 TGGCTGCAGG AGATTGACCG CTATGCCAGC GAGAACGTCA ATAAGCTCCT 
401 GGTGGGCAAC AAGAGCGACC TCACCACCAA GAAGGTGGTG GACAACACCA 
451 CAGCCAAGGA GTTTGCAGAC TCTCTGGGCA TCCCCTTCTT GGAGACGAGC 
501 GCCAAGAATG CCACCAATGT CGAGCAGGCG TTCATGACCA TGGCTGCTGA 
551 AATCAAAAAG CGGATGGGGC CTGGAGCAGC CTCTGGGGGC GAGCGGCCCA 
601 ATCTCAAGAT CGACAGCACC CCTGTAAAGC CGGCTGGCGG TGGCTGTTGC 
651 TAGGAGGGGC ACATGGAGTG GGACAGGAGG GGGCACCTTC TCCAGATGAT 
701 GTCCCTGGAG GGGGGAGGAG GTACCTCCCT CTCCCTCTCC TGGGGCATTT 
751 GAGTCTGTGG CTTTGGGGTG TCCTGGGCTC CCCATCTCCT TCTGGCCCAT 
801 CTGCCTGCTG CCCTGAGCCC CGGTTCTGTC AGGGTCCCTA AGGGAGGACA 
851 CTCAGGGCCT GTGGCCAGGC AGGGCGGAGG CCTGCTGTGC AGTTGCCTCT 
901 AGGTGACTTT CCAAGATGCC CCCCTACACA CCTTTCTTTG GAACGAGGGC 
951 TCTTCTGTCG GTGTCCCTCC CACCCCCATG TATGCTGCAC TGGGTTCTCT 
1001 CCTTCTTCTT CCTGCTGTGC TGCCCAAGAA CTGAGGGTCT CCCCGGCCTC 
1051 TACTGCCCTG GCTGCAGTCA GTGCCCAGGG CGAGGAATGT GGCCAGGGGA 
1101 TCCAGGACCT GGGATCCAGG GCCCTGGGCT GGACCTCAGG ACAGGCATGG 
1151 AGGCCACAGG GGCCCAGCAG CCCACCCTTT CCTCTCCCCA CTGCCTCCTC 
1201 TCCCTTCCTA CACTCCCAGC TCGAGCCGTC CAGCTGCGGT GGGATCTGAG 
1251 TATATCTAGG GCGGGTGGGC GGGTAGCAGT GCTGGGCCTG TGTCTTGAGC 
1301 CTGGAGGGAG ACTGCTCCTG CCGCCCTCTG CCCTGCCGGA GACAGACCCA 
1351 TGCGCTGCCT GCCCACCGTG CCCCTTTGTC CCCATGTCAG GCGGAGGCGG 
1401 AAGGCCCACC GTGCCAGAGG CTGGGCACCA GCCTTAACCC TCACTCTGCT 
1451 AGCACCTCCT CCCTTTCCCC AAGGTAGCAC ATCTGGCTCA CTCCCCACTC 
1501 CGTCTCTGGA GCCCACCAGG GAAGGCCCTC ATCCCCTGCC GCTACTTCTC 
1551 TGGGGAATGT GGGTTCCATC CAGGATTGGG GGCCTCTCTG CTCACCCACT 
1601 CTGCACCCAG GATCCTAGTC CCCTGCCCTC TGGCACAGCT GCTTCCTGCA 
1651 AGAAAGCAAG TCTTTGGTCT CCCTGAGAAG CCATGTCCCT CGTGCTGTCT 
1701 CTTGCCTGTC CCACCTGTGC CCTGCCCTCC AGCTTGTATT TAAGTCCCTG 
1751 GGCTGCCCCC TTGGGGTGCC CCCCGCTCCC AGGTTCCCCT CTGGTGTCAT 
1801 GTCAGGCATT TTGCAAGGAA AAGCCACTTG GGGAAAGATG GAAAAGGACA 
1851 AAAAAAATTA ATAAATTTCC ATTGGCCCTC GGGTGAGCTG AGGGTTTTTG 
1901 CAAGGAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
1951 AAAAAAAAAA AAAAGAAAAA AAAAAAAAAA AAAAA 
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BLAST Results 


No BLAST result 


Medline entries 


91115900: 

A family of ras-like GTP-binding proteins expressed in electromotor 
neurons . 


Peptide information for frame 3 


ORF from 48 bp to 650 bp; peptide length: 201 
Category: strong similarity to known protein 


1 MNPEYDYLFK LLLIGDSGVG KSCLLLRFAD DTYTESYIST IGVDFKIRTI 
51 ELDGKTIKLQ IWDTAGQERF RTITSSYYRG AHGIIVVYDV TDQESYANVK 
101 QWLOEIDRYA SENVNKLLVG NKSDLTTKKV VDNTTAKEFA DSLGIPFLET 
151 SAKNATNVEQ AFMTMAAEIK KRMGPGAASG GERPNLKIDS TPVKPAGGGC 
201 C 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_2il7, frame 3 

SWISSPR0T:RB1B_RAT RAS-RELATED PROTEIN RAB-lB., N = 1, Score = 1023, P 

- 2.7e-103 


PIR:S06147 GTP-binding protein rablB - rat, N = 1, Score = 1013/ P = 
3.2e-102 


SWISSPR0T:RAB1_DIS0M RAS-RELATED PROTEIN ORAB-1., N = 1, Score = 967, P 
= 2.4e-97 


PIR:TVHUYP GTP-binding protein Rabl - human, N = 1, Score = 966, P = 

3e-97 


>SWISSPROT:RBlB_RAT RAS-RELATED PROTEIN RAB-IB. 
Length = 201 

HSPs: 

Score « 1023 (153.5 bits). Expect = 2.7e-103, P = 2.7e-103 
Identities = 197/201 (98%), Positives = 199/201 (99%) 

(}uery: 1 MNPEYDYLFKLLLIGDSGVGKSCLLLRFADDTYTESYISTIGVDFKIRTIELDGKTIKLQ 60 

MNPEYDYLFKLLLIGDSGVGKSCLLLRFADDTYTESYISTIGVDFKIRTI ELDGKTIKLQ 
Sbjct: 1 MNPEYDYLFKLLLIGDSGVGKSCLLLRFADDTYTESYISTIGVDFKIRTIELDGKTIKLQ 60 

Query: 61 IWDTAGQERFRTITSSYYRGAHGIIWYDVTDQESYANVKQWLQEIDRYASENVNKLLVG 120 

IWDTAGOERFRT+TSSYYRGAHGIIVVYDVTDQESYANVKOWLQEIDRYASENVNKLLVG 
Sbjct: 61 IWDTAGQERFRTVTSSYYRGAHGIIVVYDVTDQESYANVKQWLQEIDRYASENVNKLLVG 120 

Query: 121 NKSDLTTKKVVDNTTAKEFADSLGIPFLETSAKNATNVEQAFMTMAAEIKKRMGPGAASG 180 

NKSDLTTKKWDNTTAKEFADSLG+PFLETSAKNATNVEQAFMTMAAEIKKRMGPGAASG 
Sbjct: 121 NKSDLTTKKVVDNTTAKEFADSLGVPFLETSAKNATNVEQAFMTMAAEIKKRMGPGAASG 180 

Query: 181 GERPNLKIDSTPVKPAGGGCC 201 

GERPNLKIDSTPVK A GGCC 
Sbjct: 181 GERPNLKIDSTPVKSASGGCC 201 


Pedant information for DKF2phfbr2_2il7, frame 3 


Report for DKFZphfbr2_2il7 . 3 


(LENGTH] 201 
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4e-57 


le-44 


[MW) 

(pi) 

IHOMOL) 

[FONCATJ 

2e-77 

[FUNCATJ 

[FUNCATJ 

YFLOOSw] 

[FUNCAT] 

IFUNCAT] 

4e-57 

[FONCATJ 

(FUNCAT) 

[FUNCAT J 
YGL210W] 

[FUNCAT] 
le-30 

IFUNCAT] 

[FUNCAT] 

[FUNCAT] 
cerevisiae 
[FUNCAT] 
3e-25 
(FUNCAT) 
3e-25 
(FUNCAT) 
[FUNCAT] 
(FUNCAT] 
[FUNCAT] 
(FUNCAT) 
(FUNCAT] 
(FUNCAT) 
( FUNCAT) 
[FUNCAT] 
[FUNCAT] 
IFUNCAT) 

is. 

[FUNCAT] 
(FUNCAT] 
[FUNCAT] 
[FUNCAT] 

palmitylation, 

[FUNCAT] 

(BLOCKS] 

(BLOCKS) 

[SCOP] 

[SCOP] 

[SCOP] 

(SCOP) 

[PIRKW] 

(PIRKW) 

(PIRKW) 

[PIRKW] 

(PIRKW) 

[PIRKW] 

(PIRKW) 

(PIRKW) 

[PIRKW] 

(PIRKW) 

[PIRKW] 

(PIRKW) 

(PIRKW) 

(PIRKW) 

[PIRKW] 
[PIRKW] 
(PIRKW) 
[PIRKW] 
(PIRKW) 
[PIRKW] 
(SUPFAM) 
(PROSITE) 
(PROSITE] 
(PROSITE) 
[PROSITE] 
(PROSITE) 
(PROSITE) 
(PROSITE) 
[PROSITE) 
(PFAM) 
(KW) 
(KW) 


22171.25 
5.56 

SWISSPR0T:RB1B__RAT RAS-RELATED PROTEIN RAB-IB. le-112 

08.07 vesicular transport (golgi networ)^, etc.) (s. cerevisiae, YFL038c] 

30.08 organization of golgi (S. cerevisiae, YFL038c) 2e-77 

30.09 organization of intracellular transport vesicles (S. cerevisiae, 

l^'nl K^'H^"^"^^"'?,''^ P^^^"^^ membrane [s. cerevisiae, YFLOOSw) 4e-57 

03.04 budding, cell polarity and filament formation (S. cerevisiae, YFLOOSwI 

08.19 cellular import (S. cerevisiae, YER031c] 8e-4 6 

vacuolar transport [S. cerevisiae, YER03lc] 8e-46 

09.09 biogenesis of intracellular transport vesicles (s. cerevisiae, 

06.04 protein targeting, sorting and translocation [S. cerevisiae, YOR089c) 


03.10 sporulation and germination [S. cerevisiae, YNL098cJ 3e-25 
11.01 stress response (S. cerevisiae, YNL098c] 3e-25 
yNL098c? 3e-25 growth, cell division and dna synthesis activities 

01.03.13 regulation of nucleotide metabolism (s. cerevisiae, 

01.05.04 regulation of carbohydrate utilization [s. cerevisiae. 


(S. 
YNL098C1 
YNL098cI 


3e-25 
9e-24 


10.04.07 g-proteins [s. cerevisiae, YNL098c] 3e-25 
^I'll ^^"^-^ cycle control and mitosis (S. cerevisiae, YNL098c] 
30.03 organization of cytoplasm [s, cerevisiae, YORlOlw] 

ii.io cell death [s. cerevisiae, YORlOlw) 9e-24 

04.07 rna transport [S. cerevisiae,- YOR185c) 4e-23 
30.10 nuclear organization [s, cerevisiae, yORlSSc] 4e-23 
08.01 nuclear transport (s. cerevisiae, YOR185c) 4e-23 

?n*n? organization of cytoskeleton [S. cerevisiae, YPR165w) 7e-17 
10.02.07 g-protems (S. cerevisiae, YPRl65w) 7e-17 

IVli n^o^nJ^:^"^^'^""^'^"''^^''" activities (S. cerevisiae, YCR027c) le^l6 
cerev';Tae^, YL^^^^^^^ mating-type determination, sex-specific proteins 

10.05.07 g-proteins [s, cerevisiae, YLR229c] le-11 
06.10 assembly of protein complexes (S. cerevisiae, YDL192w) 4e-10 
03.01 cell growth (s. cerevisiae, YNL180c] 9e-09 

06.07 protein modification (glycolsylation, acylation, nyristylation, 
farnesylation and processing) (s. cerevisiae, YPL051w)^ 3e-08 

proteins [s. cerevisiae, YAL048c] 5e-05 

BL0I019A ADP-ribosylation factors family proteins 
BL01115A GTP-binding nuclear protein ran proteins 
dlplk_ ^-25.1.3.1 CH-P21 Ras protein (human (Homo sapiens) 2e-41 
dlguaa_ 3.25.1.3.10 RaplA (Human (Homo sapiens) 5e-60 

nicieul L-li ADP-ribosylation factor 1 (ARFl) [human (Horn 2e03 

membrane traffic)cing le-110 
oncogene le-25 

endoplasmic reticulum le-105 
phosphoprotein le-105 
glycoprotein 3e-25 
prenylated cysteine le-110 
signal transduction 4e-23 
transforming protein le-105 
purine nucleotide binding 2e-24 
alternative splicing 5e-2€ 
P-loop le-110 
lipoprotein le-110 
proto-oncogene 3e-27 
methylated carboxyl end 3e-27 
hydrolase 7e-25 
membrane protein le-105 
GTP binding le-110 
thiolester bond 5e-76 
Golgi apparatus le-105 
ras transforming protein le-110 
ATPGTP^A 1 
MYRISTYL 2 
CK2_PH0SPH0_SITE 5 
SIGMA54_INTERACT_1 1 
TYR_PHOSPH0_SITE 1 
GLYCOSAMINOGLYCAN 1 
PKCPHOSPHO SITE 4 
ASNGLYCOSYLATION 3 

Ras family (contains ATP/GTP binding P-loop) 
Alpha_Beta 
3D 
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SEQ MNPEYDYLFKLLLIGDSGVGKSCLLLRFADDTYTESYISTIGVDFKIRTIELDGKTIKLQ 

221p- EEEEEEETTTTCHHHHHHHHHHCCCCCCCCCTTTEEEE-EEEEETTEEEEEE 

SEQ IWDTAGQERFRTITSSYYRGAHGIIVVYDVTDQESYANVKQWLQEIDRYASENVNKLLVG 

22ip- EEECTTTTTTCGGGHHHHHHCCEEEEEEETTBHHHHHHHHHHHHHHHHHHTTTTCEEEEE 

SEQ NKSDLTTKKWDNTTAKEFADSLGIPFLETSAKNATNVEQAFMTMAAEIKKRMGPGAASG 

22 Ip- ETTTTCCC-CCCHHHHHHHHHHCCCCEEEETTTTTTTHHHHHHHHHHHHHH 

SEQ GERPNLKI DSTPVKPAGGGCC 

221p- 


Prosite for DKFZphfbr2_2il7 . 3 


PSOOOOl 
PSOOOOl 
PSOOOOl 
PS00002 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00007 
PS00008 
PS00008 
PS00017 
PS00675 


121->125 
133->137 
154->158 

17- >21 
56->59 

126->129 
135->138 
151->154 
32->36 
91->95 
135->139 
156->160 
179->183 
27->34 

18- >24 
176->182 

15->23 
ll'>25 


ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASH^GLYCOSYLATION 

GLYCOSAMI NOGLYCAN 

PKC_PH0SPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PH0SPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PH0SPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PH0SPHO_SITE 

T Y R_PHOS PHO_S I TE 

MYRISTYL 

MYRISTYL 

ATP_GTP_A 

SIGMA54 INTERACT 1 


PDOCOOOOl 
PDOCOOOOl 
PDOCOOOOl 
PDOC00002 
PDOC00005 
PDOC00005 
PDOC00005 
PDCX:00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00017 
PDOC00579 


Pfam for DKFZphfbr2_2il7 . 3 


HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 


Ras family (contains ATP/GTP binding P-loop) 


10 


♦KLVLIGDSGVGKSCLLIRFTQNeFnEeYIPTIGvDFYtKTIEIDGKtIK 
KL+LIGDSGVGKSCLL+RF +++++E+YI+TIGVDF+++TIE+DGKTIK 
KLLLIGDSGVGKSCLLLRFADDTYTESYISTIGVDFKIRTIELDGKTIK 


58 


LQIWDTAGQERYRsMRPMYYRGAMGFMLVYDITNRqSFENIrNWweEIrR 
LQIWDTAGQER+R+++++YYRGA+G+++VYD+T+++S+ N+++W++EI+R 
59 LQIWDTAGQERFRTITSSYYRGAHGIIWYDVTDQESYANVKQWLQEIDR 108 

HCDrDENVPIMLVGNKCDLEDQRQVStEEGQeFAREWGAIPFMETSAKTN 
+++ ENV ++LVGNK+DL +++V+ +++EFA+++G IPF+ETSAK++ 
109 YAS— ENVNKLLVGNKSDLTTKKWDNTTAKEFADSLG-IPFLETSAKNA 155 

iNVEEAFMEIvRellqrMqe.q.NqteNinidQpsrnrk. . . rCCCIM* 
+NVE+AFM+-f+ EI++RM+ +++E +N++ +S++ K +CC 
156 TNVEQAFMTMAAEIKKRMGPGAASGGERPNLKIDSTPVKPAGGGCC— 201 
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DKFZphfbr2_2kl9 
group: brain derived 

prodSct^''^-^''^^ ^ "^""^^ ^""^ ^^^^ P*^^^^^" "^^^ Similarity to human KIAA0378 

The protein contains a leucine zipper, which can mediate protein-protein-interaction 
NO informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

genes^"* Protein can find application in studying the expression profile of brain-specific 

similarity to KIAA0378 

encoded by the genomic clones HS147M19/HS608E8 

Sequenced by Qiagen 

Locus: unknown 

Insert length: 1931 bp 

Poly A stretch at pos. 1866, no polyadenylation signal found 

1 GGGGGGGGCG CGCGGTGACA GCGCGGGGTT GGCGGCGTGG GACCCAGGGG 
51 GCGACAGAGG CAGCAGCAGC CCGAGGCCTG AGGAGAGGAG ACCGGCGGCG 

101 GCGGCAATGC TGGAGACCCT TCGCGAGCGG CTGCTGAGCG TGCAGCAGGA 

151 TTTCACCTCC GGGCTGAAGA CTTTAAGTGA CAAGTCAAGA GAAGCAAAAG 

201 TGAAAAGCAA ACCCAGGACT GTTCCATTTT TGCCAAAGTA CTCTGCTGGA 

251 TTAGAATTAC TTAGCAGGTA TGAGGATACA TGGGCTGCAC TTCACAGAAG 

301 AGCCAAAGAC TGTGCAAGTG CTGGAGAGCT GGTGGATAGC GAGGTGGTCA 

351 TGCTTTCTGC GCACTGGGAG AAGAAAAAGA CAAGCCTCGT GGAGCTGCAA 

401 GAGCAGCTCC AGCAGCTCCC AGCTTTAATC GCAGACTTAG AATCCATGAC 

451 AGCAAATCTG ACTCATTTAG AGGCGAGTTT TGAGGAGGTA GAGAACAACC 

501 TGCTGCATCT GGAAGACTTA TGTGGGCAGT GTGAATTAGA AAGATGCAAA 

551 CATATGCAGT CCCAGCAACT GGAGAATTAC AAGAAAAATA AGAGGAAGGA 

601 ACTTGAAACC TTCAAAGCTG AACTAGATGC AGAGCACGCC CAGAAGGTCC 

651 TGGAAATGGA GCACACCCAG CAAATGAAGC TGAAGGAGCG GCAGAAGTTT 

701 TTTGAGGAAG CCTTCCAGCA GGACATGGAG CAGTACCTGT CCACTGGCTA 

751 CCTGCAGATT GCAGAGCGGC GAGAGCCCAT AGGCAGCATG TCATCCATGG 

801 AAGTGAACGT GGACATGCTG GAGCAGATGG TCCTGATGGA CATATCGGAC 

851 CAGGAGGCCC TGGACGTCTT CCTGAACTCT GGAGGAGAAG AGAACACTGT 

901 GCTGTCCCCC GCCTTAGGTA GGGTTGACAA ACTTGCATTA GCTGAACCAG 

951 GGCAGTATCG ATGCCACTCC CCTCCAAAGG TGAGACGTGA GAACCATCTG 
1001 CCAGTCACTT ACGCATAAAC CCCCAAGCTC ACAGCCAGCT CCTGGCTCCC 
1051 TAACCCCACG GTTCCACACG GCTGTGTGGC AGCTGCAACA GTGGTGTGGT 
1101 TCCGTCATGA ATTCTTCTCA AAGATTTGAC ATGCTCCACT CCGGTAACTT 
1151 TGGTGAGTTG AGAGCTTTCT TGTTTGTTTT CCCTCCTTTA CCATCCAGAA 
1201 ATCCATTTGA GTCTGCTCCT TGTGGTTAAG GACTGGCGTT TGCAGGGAGG 
1251 TGCGGACTCT CCTGCGGGGC TCACGGGAAA CTCTTCCCTC TTCGTGCGAC 
1301 AGGCATTTAG GGGCGTGCCT GCCATGGGCA AAGCCATGGT GTGTGTTCAG 
1351 CTCTTGGCCT GTGTTGTAAA CTTAGTTGCA CTTCAGTTCC TTTCATCCCT 
1401 TCACAAAATT TTGTTTCACA TTCATGCAGC AAATATGGGC TGAGGTGCCA 
1451 GACCTGTACC TGGGCTTGGT GCGTTTCAAA TTTCAGACCA GTTCTTTGGG 
1501 CTGGGTCAAG GCAAAGCTCA GTCGTCCCAG CAGCACCTCA GCCATCTGTA 
1551 GAAGGTTCTA CCATTACCAC GGTTTCAGCT TCCTCTAAAC TTCTCACCCG 
1601 CTTCTCCTGG CAATCTGTCA GAACGGTGTC ATCCTGGGGA AGAGAAGGAG 
1651 CTTGGGTGCA TTTGCCCTCA TCCTGAGAAG GCCAGAATAC TGGAGACCAG 
1701 CGTGAACCCT CACCCAGAGT CAGGGGAAGA TTTAGAAACA GTGACACCTG 
1751 CATATAGAAT TTTGATTCCT TGAAGAGCCT ATTTAGTTCC ATAAAATTGG 
1801 AGAACTGCTG AAGGTCAGTA ATTCCGACTT TCTCAGCAGT GGTGTCTCTG 
1851 AATTACTGCA AAGGGTAAAA AAAAAAAAAA AAAAAACTTA TCGATACCGT 
1901 CGACCTCGAT GATGATGATG ATGATGTCGA C 


BLAST Results 


Entry HS147M19 froin database EMBL: 

Homo sapiens DNA sequence from pac 147M19 on chromosome 6p22.1-22 3 

Contains an unknown gene, ESTs and GSSs. 

Score = 5540, P - 4.1e-275, identities = 1114/1120 

3 exons 592-1884 

Entry HS608E8 from database EMBL: 

Human DNA sequence SEQUENCING IN PROGRESS from clone 608E8 

Score = 797, P = 1.2e-78, identities = 161/163 
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6 exons 1-592 


Medline entries 


90294724: 

The involucrin gene of the gibbon: The middle region shared by the 
hominoids 


Peptide information for frame 2 


ORF from 107 bp to 1015 bp; peptide length: 303 
Category: similarity to knovm protein 
Classification: unset 

Prosite motifs: LEUCINE_ZIPPER (97-119) 


1 MLETLRERLL SVQQDFTSGL KTLSDKSREA KVKSKPRTVP FLPKYSAGLE 
51 LLSRYEDTWA ALHRRAKDCA SAGELVDSEV VMLSAHWEKK KTSLVELQEQ 
101 LQQLPALIAD LESMTANLTH LEASFEEVEN NLLHLEDLCG QCELERCKHM 
151 QSQQLENYKK NKRKELETFK AELDAEHAQK VLEMEHTQQM KLKERQKFFE 
201 EAFQQDMEQY LSTGYLQIAE RREPIGSMSS MEVNVDMLEQ MVLMDISDQE 
251 ALDVFLNSGG EENTVLSPAL GRVDKLALAE PGQYRCHSPP KVRRENHLPV 
301 TYA 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_2kl9, frame 2 

TREMBL:HSAB2376_1 gene: "KIAA0378"; Human mRNA for KIAA0378 gene, 
partial cds., N =* 1, Score « 137, P = 4.8e-06 

PIR: 137037 involucrin - common gibbon, N » 1, Score = 124, P = 7.4e-05 

PIR:A57013 early endosome antigen 1 - human, N = 1, Score » 128, P = 
9.5e-05 


>TREMBL:HSAB2376_1 gene: "KIAA0378"; Human mRNA for KIAA0378 gene, partial 
cds . 

Length = 808 

HSPs: 


Score - 137 (20.6 bits). Expect « 4.8e-06, P = 4.8e-06 
Identities - 59/222 (26%), Positives = 103/222 (46%) 


Query: 

2 

LETLRERLLSVQQDFTSGLKTL— SDKSREAKVKS-KPRTVPFLPKYSAGLELLSRYED 

57 



L TL E L S ++ LK D+ R +++S + K +A L+ E 


Sbjct: 

434 

LATLEEAL-SEKERIIERLKEQRERDDRERLEEIESFRKENKDLKEKVNALQAELTEKES 

492 

Query: 

58 

TWAALHRRAKDCASAGELVDSEWMLSAHWEKKKTSLVELQEQLQQLPALIADLESMTAN 

117 



+ L A ASAG DS++ L E+KK +L+ QL++ ID M 


Sbjct: 

4 93 

SLIDLKEHASSLASAGLKRDSKLKSLEIAIEQKKEECSKLEAQLKKAHN-IEDDSRMNPE 

551 

Query: 

118 

LTHLEASFEEVENNLLHLEDLCG— QCELERCKHMQSQQLENYKKNKRK ELETFKAE 

172 



++++ + D CG Q E++R + +++EN K +K K ELE+ 


Sbjct: 

552 

FAD— QIKQLDKEASYYRDECGKAQAEVDRLLEIL-KEVENEKNDKDKKIAELESLTLR 

607 

Query: 

173 

LDAEHAQKVLEMEHTQQMKLKERQKFFEEAFQQDMEQYLSTGYLQIAE 220 




+ +KV +4H QQ++ K+ + EE +++ ++ +LQI E 


Sbjct: 

608 

HMKDQNKKVANLKHNQQLEKKKNAQLLEEVRRREDSMADNSQHLQIEE 655 


Score 

= 100 

(15.0 bits), Expect « 6.2e-02, P « 6.0e-02 


Identities = 

= 44/156 (28%), Positives •= 76/156 (48%) 


Query: 

57 

DTWAALHRRAKDCASAGELVDSEVVMLSAHWEKKKTSLVELQEQLQQLPAL-IADLESMT 

115 



D A+ +R +C A VD + +L E +K + +L+ L + D 


Sbjct: 

560 

DKEASYYR— DECGKAQAEVDRLLEILK-EVENEKNDKDKKIAELESLTLRHMKDQNKKV 

616 

Query: 

116 

ANLTHLEASFEEVENNLLHLEDLCGQCE— LERCKHMQSQQLENYKKNKRKELETFKAEL 

173 
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cw- . ^'^^ " ^'^ L ^E"*"* + + + +H+Q ++L N + R+EL+ KA L 

Sb3Ct: 617 ANLKHNQ-QLEKKKNAQL-LEEVRRREDSMADNSQHLQIEELMNALEKTRQELDATKARL 674 

Query: 174 DAEHAQKVLEME-HTQQMKLKERQKFFEEAFQQDMEQYLS 212 

A Q + E E H +++ ER+K EE + E L+ 
Sbjct: 675 -ASTQQSLAEKEAHLANLRI-ERRKQLEEILEMKQEALLA 712 

Pedant information for DKFZphfbr2_2kl9, frame 2 

Report for DKF2phfbr2_2kl9.2 

I LENGTH! 303 

fMW) 34814,78 

[pD 5.23 

{PROSITE] LEUCINE_ZIPPER 1 

IKWJ All_Alpha 

(KWJ LOW_COMPLEXITY 3 . 63 % 

IKWJ COILED_C0IL 14.52 % 

SEQ MLETLRERLLSVQQDFTSGLKTLSDKSREAKVKSKPRTVPFLPKYSAGLELLSRYEDTWA 

SEG 

COILS '^''''*'^^^^^^^^^^'''=^''^^'=^*'^'^*^'^*^***^^*^*^^^C^ 

SEQ ALHRRAKDCASAGELVDSEWMLSAHWEKKKTSLVELQEQLQQLPALIADLESMTANLTH 
**''*'•'*"•**••**•"••••-••••••••• xxxxxxxxxxx 

^ CCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ I-EASFEEVENNLLHLEDLCGQCELERCKHMQSQQLENYKKNKRKELETFKAELDAEHAQK 
SEG 

CMLS cCcJcCCCCCCCTCC^^^^ 

SEQ VLEMEHTQQMKLKERQKFFEEAFQQDMEQYLSTGYLQIAERREPIGSMSSMEVNVDMLEQ 

SEG 

COILS ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^"^^^^^^^ 

SEQ MVLMDISDQEALDVFLNSGGEENTVLSPALGRVDKLALAEPGQYRCHSPPKVRRENHLPV 

SEG 

COXLS ■•*■*-•■-•-•.•.......,.. 

SEQ TYA 
SEG 

PRD ccc 
COILS 


Prosite for DKFZphfbr2_2kl9.2 
PS00029 97->119 LEUCINE_2IPPER PDOC00029 

(No Pfam data available for DKFZphfbr2_2kI9.2) 
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group: cell cycle 

DKFZphfbr2_2kl4 encodes a novel 335 amino acid protein with strong similarity to rattus rattus 
IAG2 "implantation-associated protein" and the human N33 tumour-suppressor gene. 

Tumour-suppressor genes are known to be involved in the control of cell growth and division, 
interacting with proteins which control the cell cycle. The N33 gene is significantly 
methylated in tumour cells, a mechanism by which tumor-suppressor genes are inactivated in 
cancer. In addition, the novel protein contains a RGD cell attachment site. Therefore the 
novel protein is a new putative tumour-suppressor gene. 

The new protein can find application in modulating/blocking the cell cycle and in the therapy 
of tumours. 


strong similarity to human N33 tumor suppressor gene 
complete cDNA, complete cds, EST hits, 

potential start at Bp 30 matches kozak consensus ANCatgG 
potential transmembran protein (4 TM) 

similarity to yeast 0ST3p (oligosaccharyltransf erase gamma chain) 

Sequenced by Qiagen 

Locus; unknown 

Insert length: 2241 bp 

Poly A stretch at pos. 2221, no polyadenylation signal found 


1 TGGGACTTAT AGAAGGGAGA GGAGCGAACA TGGCAGCGCG TTGGCGGTTT 
51 TGGTGTGTCT CTGTGACCAT GGTGGTGGCG CTGCTCATCG TTTGCGACGT 
101 TCCCTCAGCC TCTGCCCAAA GAAAGAAGGA GATGGTGTTA TCAGAAAAGG 
151 TTAGTCAGCT GATGGAATGG ACTAACAAAA GACCTGTAAT AAGAATGAAT 
201 GGAGACAAGT TCCGTCGCCT TGTGAAAGCC CCACCGAGAA ATTACTCCGT 
251 TATCGTCATG TTCACTGCTC TCCAACTGCA TAGACAGTGT GTCGTTTGCA 
301 AGCAAGCTGA TGAAGAATTC CAGATCCTGG CAAACTCCTG GCGATACTCC 
351 AGTGCATTCA CCAACAGGAT ATTTTTTGCC ATGGTGGATT TTGATGAAGG 
401 CTCTGATGTA TTTCAGATGC TAAACATGAA TTCAGCTCCA ACTTTCATCA 
451 ACTTTCCTGC AAAAGGGAAA CCCAAACGGG GTGATACATA TGAGTTACAG 
501 GTGCGGGGTT TTTCAGCTGA GCAGATTGCC CGGTGGATCG CCGACAGAAC 
551 TGATGTCAAT ATTAGAGTGA TTAGACCCCC AAATTATGCT GGTCCCCTTA 
601 TGTTGGGATT GCTTTTGGCT GTTATTGGTG GACTTGTGTA TCTTCGAAGA 
651 AGTAATATGG AATTTCTCTT TAATAAAACT GGATGGGCTT TTGCAGCTTT 
701 GTGTTTTGTG CTTGCTATGA CATCTGGTCA AATGTGGAAC CATATAAGAG 
751 GACCACCATA TGCCCATAAG AATCCCCACA CGGGACATGT GAATTATATC 
801 CATGGAAGCA GTCAAGCCCA GTTTGTAGCT GAAACACACA TTGTTCTTCT 
851 GTTTAATGGT GGAGTTACCT TAGGAATGGT GCTTTTGTGT GAAGCTGCTA 
901 CCTCTGACAT GGATATTGGA AAGCGAAAGA TAATGTGTGT GGCTGGTATT 
951 GGACTTGTTG TATTATTCTT CAGTTGGATG CTCTCTATTT TTAGATCTAA 
1001 ATATCATGGC TACCCATACA GCTTTCTGAT GAGTTAAAAA GGTCCCAGAG 
1051 ATATATAGAC ACTGGAGTAC TGGAAATTGA AAAACGAAAA TCGTGTGTGT 
1101 TTGAAAAGAA GAATGCAACT TGTATATTCT GTATTACCTC TTTTTTTCAA 
1151 GTGATTTAAA TAGTTAATCA TTTAACCAAA GAAGATGTGT AGTGCCTTAA 
1201 CAAGCAATCC TCTGTCAAAA TCTGAGGTAT TTGAAAATAA TTATCCTCTT 
1251 AACCTTCTCT TCCCAGTGAA CTTTATGGAA CATTTAATTT AGTACAATTA 
1301 AGTATATTAT AAAAATTGTA AAACTACTAC TTTGTTTTAG TTAGAACAAA 
1351 GCTCAAAACT ACTTTAGTTA ACTTGGTCAT CTGATCTTAT ATTGCCTTAT 
1401 CCAAAGATGG GGAAAGTAAG TCCTGACCAG GTGTTCCCAC ATATGCCTGT 
1451 TACAGATAAC TACATTAGGA ATTCATTCTT AGCTTCTTCA TCTTTGTGTG 
1501 GATGTGTATA CTTTACGCAT CTTTCCTTTT GAGTAGAGAA ATTATGTGTG 
1551 TCATGTGGTC TTCTGAAAAT GGAACACCAT TCTTCAGAGC ACACGTCTAG 
1601 CCCTCAGCAA GACAGTTGTT TCTCCTCCTC CTTGCATATT TCCTACTGCG 
1651 CTCCAGCCTG AGTGATAGAG TGAGACTCTG TCTCAAAAAA AAAGTATCTC 
1701 TAAATACAGG ATTATAATTT CTGCTTGAGT ATGGTGTTAA CTACCTTGTA 
1751 TTTAGAAAGA TTTCAGATTC ATTCCATCTC CTTAGTTTTC TTTTAAGGTG 
1801 ACCCATCTGT GATAAAAATA TAGCTTAGTG CTAAAATCAG TGTAACTTAT 
1851 ACATGGCCTA AAATGTTTCT ACAAATTAGA GTTTGTCACT TATTCCATTT 
1901 GTACCTAAGA GAAAAATAGG CTCAGTTAGA AAAGGACTCC CTGGCCAGGC 
1951 GCAGTGACTT ACGCCTGTAA TCTCAGCACT TTGGGAGGCC AAGGCAGGCA 
2001 GATCACGAGG TCAGGAGTTC GAGACCATCC TGGCCAACAT GGTGAAACCC 
2051 CGTCTCTACT AAAAATATAA AAATTAGCTG GGTGTGGTGG CAGGAGCCTG 
2101 TAATCCCAGC TGCACAGGAG GCTGAGGCAC GAGAATCACT TGAACTCAGG 
2151 AGATGGAGGT TTCAGTGAGC CGAGATCACG CCACTGCACT CCAGCCTGGC 
2201 AACAGAGCGA GACTCCATCT CAAAAAAAAA AAAAAAAAAA A 
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BLAST Results 


No BLAST result 


Medline entries 


96299740: 

Structure and methylation-associated silencing of a gene 
within a homozygously deleted region of human chromosome 
band 8p22. 

97243398: 

Tumour -suppressor genes in prostatic oncogenesis: a 
positional approach. 

98334474: 

Concordant methylation of the ER and N33 genes in 
glioblastoma multiforme. 


Peptide information for frame 3 


ORF from 30 bp to 1034 bp; peptide length: 335 
Category: strong similarity to knovm protein 


1 MAARWRFWCV SVTMVVALLI VCDVPSASAQ RKKEMVLSEK VSQLMEWTNK 
51 RPVIRMNGDK FRRLVKAPPR NYSVIVMFTA LQLHRQCWC KQADEEFQIL 
101 ANSWRYSSAF TNRIFFAMVD FDEGSDVFQM LNMNSAPTFI NFPAKGKPKR 
151 GDTYELQVRG FSAEQIARWI ADRTDVNIRV IRPPNYAGPL MLGLLLAVIG 
201 GLVYLRRSNM EFLFNKTGWA FAALCFVLAM TSGQMWNHIR GPPYAHKNPH 
251 TGHVNYIHGS SQAQFVAETH IVLLFNGGVT LGMVLLCEAA TSDMDIGKRK 
301 IMCVAGIGLV VLFFSWMLSI FRSKYHGYPY SFLMS 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_2kl4, frame 3 

TREMBL:RNAF8554_1 gene: •'IAG2-; product: "implantation-associated 
protein"; Rattus norvegicus implantation-associated protein (IAG2> 
mRNA, partial cds,, N = 1, Score = 1560, P = 3.4e-160 

PIR:G02297 gene N33 protein - human, N = 1, Score = 1256, P = 5.6e-128 
TREMBL:HSN33S11_1 gene: "N33"; product: "N33 protein form 2«? Human 
- 1252^^P^" 1°5^ 127^"^"^^ ^^^^ complete cds., N = 1, Score 


>TREMBL:RNAr8554_l gene: "IAG2"; product: "implantation-associated protein"; 

Rattus norvegicus implantation-associated protein (IAG2) mRNA, partial cds. 
Length » 308 

HSPs : 

Score - 1560 (234.1 bits). Expect » 3,4e-160, P « 3.4e-160 
Identities = 295/307 (96%), Positives = 299/307 (97%) 

Query: 29 AQRKKEMVLSEKVSQLMEWTNKRPVIRMNGDKFRRLVKAPPRNYSVIVMFTALQLHRQCV 88 

AQRKKE VL ekv qlmewtn+rpvirmngdkfr lvkapprnysvivmftalqlhrqcv 
sbjct: 2 aqrkkekylvekviqlmewtnqrpvirmngdkfrplvkapprnysvivmftalqlhrqcv 61 

Query: 89 VCKQADEEFQILANSWRYSSAFTNRIFFAMVDFDEGSDVFQMLNMNSAPTFINFPAKGKP 148 

VCKQADEEFQILAN WRYSSAFTNRIFFAMVDFDEGSDVFQMLNMNSAPTFINFP kgkp 
Sbjct: 62 VCKQADEEFQILANFWRYSSAFTNRIFFAMVDFDEGSDVFQMLNMNSAPTFINFPPKGKP 121 

Query: 149 KRGDTYELQVRGFSAEQIARWIADRTDVNXRVIRPPNYAGPLMLGLLLAVIGGLVYLRRS 208 

KR DTYELQVRGFSAEQIARWIADRTDVNIRVIRPPNYAGPLMLGLLLAVIGGLVYLRRS 
Sb^Ct: 122 KRADTYELQVRGFSAEQIARWIADRTDVNIRVIRPPNYAGPLMLGLLLAVIGGLVYLRRS 181 

Query: 209 NMEFLFNKTGWAFAALCFVLAMTSGQMWNHIRGPPYAHKNPHTGHVNYIHGSSQAQFVAE 268 
NMEFLFNKTGWAFAALCFVLAMTSGQMWNHIRGPPYAHKNPHTGHVNyiHGSSQAQFVAE 
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Sbjct: 182 NMEFLFNKTGWAFAALCPVLAMTSGQMWNHIRGPPYAHKNPHTGHVNYIHGSSQAQFVAE 241 

Query: 269 THIVLLFNGGVTLGMVLLCEAATSDMDIGKRKIMCVAGIGLVVLFFSWMLSIFRSKYHGY 328 

THIVLLFNGGVTLGMVLLCEAA SDMDIGKR++MC+AGIGLVVLFFSWMLSIFRSKYHGY 
Sbjct: 242 THIVLLFNGGVTLGMVLLCEAAASDMDIGKRRMMCIAGIGLWLFFSWMLSIFRSKYHGY 301 

Query: 329 PYSFLMS 335 

PYSFLMS 
Sbjct: 302 PYSFLMS 308 


Pedant information for DKFZphfbr2_2kl4, frame 3 


Report for DKFZphfbr2_2kl4 . 3 


[LENGTH) 

335 

{MWl 

38036.83 

fpl) 

9.68 

(HOMOLJ 

TREMBL: RNAF8554_1 gene: "IAG2"; product: "implantation-associated protein"; 

Rattus norvegicus implantation-associated protein (IAG2) mRNA, partial cds . le-161 

[FUNCATJ 

30.07 organization of endoplasmatic reticulum [S. cerevisiae, YOR085w) 

4e-14 


[FUNG AT) 

06.07 protein modification (glycolsylation, acylation, myristylation. 

palmitylation. 

farnesylation and processing) (S. cerevisiae, YOR085w] 4e-14 

(FUNCAT) 

01.05.01 carbohydrate utilization (S. cerevisiae, YOR085w) 4e-14 

(ECI 

2.4.1.119 Dolichyl-diphosphooligosaccharide — protein glycosyltransferase le-12 

{PIRKW] 

glycosyltransferase le-12 

[PIRKW] 

transmembrane protein 6e-69 

[PIRKW] 

hexosyltransferase le-12 

[PROSITE) 

RGD 1 

[PROSITE] 

MYRISTYL 4 

[PROSITE] 

AMI DAT I ON 1 

[PROSITE] 

CK2_PH0SPH0SITE 2 

(PROSITE) 

PKC PHOSPHO SITE 4 

[PROSITE] 

ASN_GLYCOSYLATION 2 

[KWJ 

SIGNAL PEPTIDE 30 

[KW] 

TRANSMEMBRANE 4 

(KW) 

LOW COMPLEXITY 5 . 97 % 


SEQ MAARWRFWCVSVTMVVALLIVCDVPSASAQRKKEMVLSEKVSQLMEWTNKRPVIRMNGDK 

SEG 

PRD cccceeeeeeehhhhhhhhhhhccccccchhhhhhhhhhhhhhhhhhhccceeeeecccc 

MEM 

SEQ FRRLVKAPPRNYSVIVMFTALQLHRQCWCKQADEEFQILANSWRYSSAFTNRIFFAMVD 

SEG 

PRD ceeeeeccccccceeeehh]ihhhccceeeehhhhhhhhlihhhhcccccccccceeeeeec 

MEM 

SEQ FDEGSDVFQMLNMNSAPTFINFPAKGKPKRGDTYELQVRGFSAEQIARWIADRTDVNIRV 

SEG 

PRD cccccceeeecccccccceeeccccccccccceeeeeeeccchhhhhhhhhhhhheeeee 

MEM M 

SEQ IRPPNYAGPLMLGLLLAVIGGLVYLRRSNMEFLrNKTGWAFAALCFVLAMTSGQMWNHIR 

SEG xxxxxxxxxxxxxxxxxxxx 

PRD eccccccchhhhhhhhhhhcchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccceeec 

MEM MMMMMMMNIMMMMMMMMMMhIMMMMMMMMM MMMMMMMMMMMMMMMMMMMMMMMM . . . 

SEQ GPPYAHKNPHTGHVNYIHGSSQAQFVAETHIVLLFNGGVTLGMVLLCEAATSDMDIGKRK 

SEG 

PRD ccccccccccccceeeecccchhhhhhhheeeeeeccchhlihhhhhhhhhcccccccccc 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ IMCVAGIGLVVLFFSWMLS I FRSKYHG YPYSFLMS 

SEG 

PRD eeeecccceeeeeehhhhhhhhhhccccccccccc 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMM 


Prosite for DKFZphfbr2 2kl4.3 


PSOOOOl 71->75 ASN GLYCOSYLATION PDOCOOOOl 

PSOOOOl 215->219 ASN^GLYCOSYLATION PDOCOOOOl 

PS00005 38->41 PKC_PHOSPHO_SITE PDOC00005 

PS00005 48->51 PKC PHOSPHO SITE PDOC00005 
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PS00005 

103->106 

PKC PHOSPHO 

SITE 

PDOC00005 

PS00005 

111->114 

PKC PHOSPHO" 

"site 

PDOC00005 

PS00006 

208- 

->212 

CK2 PHOSPHO' 

"site 

PDOC00006 

PS00006 

292- 

->296 

CK2 PHOSPHO' 

'site 

PDGC00006 

PS00008 

193- 

->199 

MYRISTYL 


PDOC00008 

PS00008 

233- 

^>239 

MYRISTYL 


PDOC00008 

PS00008 

259->265 

MYRISTYL 


PDOC00008 

PS00008 

278->284 

MYRISTYL 


PDOC00008 

PS00009 

296->300 

AMIDATION 


PDOC00009 

PS00016 

150->153 

RGD 


PDOC00016 


(No Pfam data available for DKFZphfbr2_2kl4.3) 


243 


wo 01/12659 


PCT/IBOO/01496 


DKrZphfbr2_3cl8 


group: nucleic acid management 

DKFZphfbr2_3cl8 encodes a novel AAQ amino acid protein with strong similarity to mus musculus 
RNA helicase and several RNA-dependent ATPases from the DEAD box family. 

RNA helicases comprise a large family of proteins that are involved in basic biological 
systems such as nuclear and mitochondrial splicing processes, RNA editing, rRNA processing, 
translation initiation, nuclear mRNA export, and mRNA degradation. RNA helicases are essential 
factors in cell development and differentiation, and some of them play a role in transcription 
and replication of viral single-stranded RNA genomes . The members of the largest subgroup, the 
DEAD and DEAH box proteins, exhibit a strong dependence of the unwinding activity on ATP 
hydrolysis. The novel protein contains a DEAD-box and is a new member of this subgroup. 

The new protein can find application in modulating RNA metabolism and gene expression. 


strong similarity to RNA helicase and RNA-dependent ATPase 
from the DEAD box family 
group helicases 

Summary DKFZphf br2_3cl8 encodes a novel 448 amino acid protein with 
similarity to DEAD-box subfamily ATP-dependent RNA helicases. 
Deletion of the yeast homolouge DBP5 is lethal. 


strong similarity to RNA helicase and RNA-dependent ATPase from the 
DEAD box family 

complete cDNA, EST hits 
complete cds ATG at Bp 109 

Sequenced by AGOWA 

Locus: /map=**87.50 cR from top of Chrl6 linkage group" 

Insert length: 1713 bp 

Poly A stretch at pos , 1696, no polyadenylation signal found 

1 TGGGGTAGTG GGGCTGGAGC AGAGCCTGCC GCGAACCCCC GGAGCCCACG 
51 ATCCCTCGTG CCATCCCTCG AATCCACCAG CACGAGCGTC CCACCCGCGC 
101 CTGGGACCAT GGCCACTGAC TCATGGGCCC TGGCGGTGGA CGAGCAGGAA 
151 GCTGCGGCTG AGTCGTTGAG CAACTTGCAT CTTAAGGAAG AGAAAATCAA 
201 ACCAGATACC AATGGTGCTG TTGTCAAGAC CAATGCCAAT GCAGAGAAGA 
251 CAGATGAAGA AGAGAAAGAG GACAGAGCTG CCCAGTCCTT ACTCAACAAG 
301 CTGATCAGAA GCAACCTTGT TGATAACACA AACCAAGTGG AAGTCCTGCA 
351 GCGGGATCCA AACTCCCCTC TGTACTCGGT GAAGTCTTTT GAAGAGCTTC 
401 GGCTCCCACA GAACTTAATT GCCCAATCTC AGTCTGGTAC TGGTAAAACA 
4 51 GCTGCCTTCG TGCTGGCCAT GCTTAGCCAA GTAGAACCTG CAAACAAATA 
501 CCCCCAGTGT CTATGTCTCT CCCCAACGTA TGAGCTCGCC CTCCAAACAG 
551 GAAAAGTGAT TGAACAAATG GGCAAATTTT ACCCTGAACT - GAAGCTAGCT 
601 TATGCTGTTC GAGGCAATAA ATTGGAAAGA GGCCAGAAGA TCAGTGAGCA 
651 GATTGTCATT GGCACCCCTG GGACTGTGCT GGACTGGTGC TCCAAGCTCA 
701 AGTTCATTGA TCCCAAGAAA ATCAAGGTGT TTGTTCTGGA TGAGGCTGAT 
751 GTCATGATAG CCACTCAGGG CCACCAAGAT CAGAGCATCC GCATCCAGAG 
801" GATGCTGCCC AGGAACTGCC AGATGCTGCT TTTCTCCGCC ACCTTTGAAG 
851 ACTCTGTGTG GAAGTTTGCC CAGAAAGTGG TCCCAGACCC AAACGTTATC 
901 AAACTGAAGC GTGAGGAAGA GACCCTGGAC ACCATCAAGC AGTACTATGT 
951 CCTGTGCAGC AGCAGAGACG AGAAGTTCCA GGCCTTGTGT AACCTCTACG 
1001 GGGCCATCAC CATTGCTCAA GCCATGATCT TCTGCCATAC TCGCAAAACA 
1051 GCTAGTTGGC TGGCAGCAGA GCTCTCAAAA GAAGGCCACC AGGTGGCTCT 
1101 GCTGAGTGGG GAGATGATGG TGGAACAGAG GGCTGCAGTG ATTGAGCGCT 
1151 TCCGAGAGGG CAAAGAGAAG GTTTTGGTGA CCACCAACGT GTGTGCCCGC 
1201 GGCATTGATG TTGAACAAGT GTCTGTCGTC ATCAACTTTG ATCTTCCCGT 
1251 GGACAAGGAC GGGAATCCTG ACAATGAGAC CTACCTGCAC CGGATCGGGC 
1301 GCACGGGCCG CTTTGGCAAG AGGGGCCTGG CAGTGAACAT GGTGGACAGC 
1351 AAGCACAGCA TGAACATCCT GAACAGAATC CAGGAGCATT TTAATAAGAA 
14 01 GATAGAAAGA TTGGACACAG ATGATTTGGA CGAGATTGAG AAAATAGCCA 
1451 ACTGAGAAGC TCCACCAGCC ACTGATGCCA GCCCTGGCAC TGCCCCTGCA 
1501 CAGGAGACAA GTGCGTTCAG GGCACAGGCC CCGACATCAC CCCAAGGACA 
1551 ACGGCACAAG TAGAGAGAAA CTACCTACCT CACTTCAAAT TATGTTTGGA 
1601 CTTGACAAAA ATGTATGCAA ATGATGGGGG ATGGTAGAAA AAAATTATTT 
1651 ACACAACCTT GGAAGATTAG GCATGAATAC ACAGAGATTT ACCTTTAAAA 
1701 AAAAAAAAAA AAA 


BLAST Results 
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Entry G36496 from database EMBL: 
SHGC-53094 Human Homo sapiens STS cDNA, 
Length = 459 
Minus Strand HSPs: 

Score * 1693 (254.0 bits). Expect = 2.8e-70, P = 2.8e-70 
Identities » 369/387 (95%), Positives = 369/387 (95%) 

Entry G44014 from database EMBLNEW: 

WIAF-3643-STS Human THudson SANGER Homo sapiens STS genomic, sequence 
tagged site. 
Score - 901, P « 2.3e-35, identities - 183/185 


Medline entries 


94192995: 

Gene 1994 Mar 25; 140 (2) : 171-177 

Mouse erythroid cells express multiple putative RNA helicase genes 
exhibiting 

high sequence conservation from yeast to mammals. 


Peptide information for frame 1 


ORE from 109 bp to 1452 bp; peptide length: 448 
Category: strong similarity to known protein 


1 MATDSWALAV DEQEAAAESL SNLHLKEEKI KPDTNGAWK TNANAEKTDE 

51 EEKEDRAAQS LLNKLIRSNL VDNTNQVEVL QRDPNSPLYS VKSFEELRLP 

101 QNLIAQSQSG TGKTAAFVLA MLSQVEPANK YPQCLCLSPT YELALQTGKV 

151 lEQMGKFYPE LKLAYAVRGN KLERGQKISE QIVIGTPGTV LDWCSKLKFI 

201 DPKKIKVFVL DEADVMIATQ GHQDQSIRIQ RMLPRNCQML LFSATFEDSV 

251 WKFAQKVVPD PNVIKLKREE ETLDTIKQYY VLCSSRDEKF QALCNLYGAI 

301 TIAQAMIFCH TRKTASWLAA ELSKEGHQVA LLSGEMMVEQ RAAVIERFRE 

351 GKEKVLVTTN VCARGIDVEQ VSWINFDLP VDKDGNPDNE TYLHRIGRTG 

401 RFGKRGLAVN MVDSKHSMNI LNRIQEHFNK KIERLDTDDL DEIEKIAN 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKF2phfbr2_3cl8, frame 1 

PIR: 149731 RNA helicase - mouse, N = 2, Score = 1758, P = 3,8e-223 

TREMBL:AF005239_1 gene: "Dbp80"; product: "DEAD-box helicase"; 
Drosophila melanogaster DBAD-box helicase (DbpBO) mRNA, complete cds., 
N « 2, Score = 1142, P = 1.8e-125 

SWISSPR0T:YB66_SCHPO PUTATIVE ATP-DEPENDENT RNA HELICASE C12C2.06., N = 
2, Score = 911, P « 5.5e-103 

PIR:S66920 probable RNA helicase CA5/6 - yeast (Saccharorayces 
cerevisiae), N = 2, Score - 887, P = 1.9e-98 


>PIR: 149731 RNA helicase - mouse 
Length = 478 

HSPs: 

Score = 1758 (263.8 bits). Expect = 3.8e-223, Sum P(2) = 3.8e-223 
Identities = 338/349 (96%), Positives = 349/349 (100%) 

Query: 100 PQNLIAQSQSGTGKTAAFVLAMLSQVEPANKyPQCLCLSPTYELALQTGKVIEQMGKFYP 159 

PQNLIAQSQSGTGKTAAFVLAMLS+VEPA++YPQCLCLSPTYELALQTGKVIEQMGKF+P 
Sbjct: 130 PQNLIAQSQSGTGKTAAFVLAMLSRVEPADRYPQCLCLSPTYELALQTGKVIEQMGKFHP 189 

Query: 160 ELKLAYAVRGNKLERGQKISEQIVIGTPGTVLDWCSKLKFIDPKKIKVFVLDEADVMIAT 219 

ELKLAYAVRGNKLERGQK+SEQIVIGTPGTVLDWCSKLKFIDPKKIKVFVLDEADVMIAT 
Sbjct: 190 ELKLAYAVRGNKLERGQKVSEQIVIGTPGTVLDWCSKLKFIDPKKIKVFVLDEADVMIAT 249 

Query: 220 QGHQDQSIRIQRMLPRNCQMLLFSATFEDSVWKFAQKVVPDPNVIKLKREEETLDTIKQY 279 
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Sbjct : 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 


250 
280 
310 
340 
370 
400 
430 


Score = 419 
Identities = 


Query: 
Sbjct: 
Query: 
Sbjct: 
Query : 
Sbjct: 


1 
1 
61 
60 
117 
120 


qghqdqsiriqr++prncqmllfsatfe:dsvwkfaqkvvpdpn+iklkreeetldtikqy 
qghqdqsiriqrivprncqmllfsatfedsvwkfaqkvvpdpniiklkreeetldtikqy 309 

YVLCSSRDEKFQALCNLYGAITIAQAMIFCHTRKTASWLAAELSKEGHQVALLSGEMMVE 339 
yVLC++R+EKFQALCNLYGAITIAQAMIFCHTRKTASWLAAELSKEGHQVALLSGEMMVE 
YVLCNNREEKFQALCNLYGAITIAQAMIFCHTRKTASWLAAELSKEGHQVALLSGEMMVE 369 

QRAAVIERFREGKEKVLVTTNVCARGIDVEQVSVVINFDLPVDKDGNPDNETYLHRIGRT 399 
QRAAVIERFREGKEKVLVTTNVCARGIDVEQVSVVINFDLPVDKDGNPDNETYLHRIGRT 
QRAAVIERFREGKEKVLVTTNVCARGIDVEQVSWINFDLPVDKDGNPDNETYLHRIGRT 429 

GRFGKRGLAVNMVDSKHSMNILNRIQEHFNKKIERLDTDDLDEIEKIAN 448 
GRFGKRGLAVNMVDSKHSMNILNRIQEHFNKKIERLDTDDLDEIEKIAN 
GRFGKRGLAVNMVDSKHSMNILNRIQEHFNKKIERLDTDDLDEIEKIAN 478 

(62,9 bits). Expect = 3.8e-223, Sum P(2) 3.8e-223 
= 94/136 (69%), Positives = 104/136 (76%) 

MATDSWALAVDEQEAAAESLSNLHLKEEKIKPDTNGAVVKTNANAEKTDEEEKEDRAAQS 60 
MATDSWALAVDEQEAA +S+S+L +KEEK K DTNG V+KT+ AEKT+EEEKEDRAAQS 
MATDSWALAVDEQEAAVKSMSSLQIKEEKAKSDTNG-VIKTSTTAEKTEEEEKEDRAAQS 59 

LLNKLIRSNLVDNTNQVEVLQRDPNSPLYSVKSFEELRL-PQNL---IAQSQSGTGKTAA 116 
LLNKLIRSNLVDNTNQVEVLQRDP+SPLYSVKSFEELRL PQ L A + K 
LLNKLIRSNLVDNTNQVEVLQRDPSSPLYSVKSFEELRLKPQLLQGVYAMGFNRPSKIQE 119 

FVLAMLSQVEPANKYPQ 133 

L M+ P N Q 
NALPMMLAEPPQNLIAQ 136 


Pedant information for DKF2phfbr2_3cl8, frame 1 


Report for DKFZphfbr2 3cl8.1 


[LENGTH] 
tMW) 
(pl] 

[HOMOL] 

[ FUNCAT J 

[FUNCAT] 

( FUNCAT J 

[FUNCAT] 

[ FUNCAT 1 
YJL138C] le 

{ FUNCAT ) 

[FUNCAT J 

[FUNCAT J 

[FUNCAT] 

influenzae, 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

(FUNCAT) 

(FUNCAT) 

(FUNCAT] 

{FUNCAT I 

[FUNCAT] 

[BLOCKS] 

(BLOCKS] 

(BLOCKS] 

(BLOCKS] 

[PIRKW] 

(PIRKWJ 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

(PIRKW] 

[PIRKW] 

[PIRKW] 

[SUPFAM] 

(SUPFAM) 

(SUPFAM] 

[SUPFAM] 

(SUPFAM) 

[SUPFAM] 


-63 


448 

50490.07 
5.83 

PIR:I49731 RNA helicase - mouse 0.0 

98 classification not yet clear-cut (S. cerevisiae, YOR046cJ le-102 
04.01.04 rrna processing (S. cerevisiae, YDR021w] 2e-65 

30.10 nuclear organization (S. cerevisiae, YDR021w) 2e-65 

30.03 organization of cytoplasm (S. cerevisiae, YJL138c) le-63 

05.04 translation (initiation, elongation and termination) (S. cerevisiae, 

04.99 other transcription activities [S. cerevisiae, YDLl60c) 2e-4S 
AA^^c^/N^'^^'^^'^^^^^^ ribosome biogenesis [H. influenzae, HI0231 RNA) 9e-48 


(S. cerevisiae, YDL084w) le-43 
recombination and repair 


(S. cerevisiae, 
[S. cerevisiae, 
(S. cerevisiae, 
[S. cerevisiae, 


[M. jannaschii, MJ1401) 9e-08 
YMR190cJ le-05 


04.05.03 mrna processing (splicing) 
1 genome replication, transcription 
HI0892J 3e-39 

06.10 assembly of protein complexes 
09.01 biogenesis of cell wall 
04.05.01.07 chromatin modification 
30.16 mitochondrial organization 

r general function prediction ^ , 

11.10 cell death [S. cerevisiae, YMR190c] le-05 

03.19 recombination and dna repair [S. cerevisiae 
99 unclassified proteins (S, cerevisiae, YIR002c] 7e-04 

BL00039D DEAD-box subfamily ATP-dependent helicases proteins 
BL00039C DEAD-box subfamily ATP-dependent helicases proteins 
BL00039B DEAD-box subfamily ATP-dependent helicases proteins 
BL00039A DEAD-box subfamily ATP-dependent helicases proteins 
nucleus 4e-64 
RNA binding le-64 
DEAD box 4e-64 
transmembrane protein 3e-22 
DNA binding 2e-32 
ATP le-101 

purine nucleotide binding 4e-64 

P-loop le-101 

hydrolase 4e-43 

protein biosynthesis le-64 

ATP binding 2e-35 

WW repeat homology 3e-29 

translation initiation factor eIF-4A le-64 

DEAD/H box helicase homology le-101 

DNA helicase recG 2e-06 

unassigned DEAD/H box helicases le-101 

ATP-dependent RNA helicase DBPl 9e-33 


[H. 


YLL008W] le-35 

YJL033W] 9e-27 

YMR290C] 8e-26 

YDR194C) le-23 
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[SUPFAM] 

ATP-dependent RNA 

helicase DHHl 4e-48 

[SUPFAM] 

tobacco ATP-dependent 

RNA helicase DBIO 3e-29 

(PROSITE) 

MYRISTYL 5 



I PROS I TE] 

AMIDATION 1 



[PROSITE] 

CK2_PH0SPH0 SITE 


6 

(PROSITE) 

GLYCOSAMINOGLYCAN 


1 

[PROSITEJ 

PKC PHOSPHO SITE 


8 

[PROSITE] 

ASN"*GLYCOSyLATI ON 


1 

{PFAMJ 

Helicases conserved C- 

-terminal domain 

[PFAM] 

DEAD and DEAH box 

helicases 

[KWJ 

Alpha_Beta 




SEQ MATDSWALAVDEQEAAAESLSNLHLKEEKIKPDTNGAVVKTNANAEKTDEEEKEDRAAQS 

PRD ccchhhhhhhhhhhhhhhhcccchhhhhhhcccccceeeeeehhhhhhhhhhhhhhhhhh 

SEQ LLNKLIRSNLVDNTMQVEVLQRDPNSPLYSVKSFEELRLPQNLIAQSQSGTGKTAAFVLA 

PRD hhhhhhhhhcccccceeeeeeccccccceeehhhhhhhhccceeeeeccccccchhhhhh 

SEQ MLSQVEPANKYPQCLCLSPTYELALQTGKVIEQMGKFYPELKLAYAVRGNKLERGQKISE 

PRD hhhhhhhhhccceeeeeccchhhhhhhhhhhhhhccccccccceeeccccchhhhhhhhe 

SEQ QI VI GT PGTVL DWCSKLKFI DPKK I KVFVLDEADVMI ATQGHQDQS I RI QRMLPRNCQML 

PRD eeeecccccchhhhhhhhhhcccceeeeeecchhhhhhhccchhhhhhhhhhccccceee 

SEQ LFSATFEDSVWKFAQKVVPDPNVIKLKREEETLDTIKQYYVLCSSRDEKFQALCNLYGAI 

PRD eeeccccchhhhhhhhhhcccceeeehhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhch 

SEQ T I AQAMI FCHTRKTASWLAAELSKEGHQVALLSGEMMVEQRAAVIERFREGKEKVLVTTN 

PRD hhhhhheeecchhhhhhhhhhhhhccceeeeecccchhhhhhhhhhhhccccceeeeeec 

SEQ VCARGIDVEQVSVVINFDLPVDKDGNPDNETYLHRIGRTGRFGKRGLAVNMVDSKHSMNI 

PRD ccccccceeeeeeeeecccccccccccccceeeeeecccccccccceeeeeeeccchhhh 

SEQ LNRIQEHFNKKIERLDTDDLDEIEKIAN 

PRO hhhhhhhhhhhccccccccchhhhhccc 


Prosite for DKF2phfbr2_3cl8 . 1 


PSOOOOl 

389->393 

ASN GLYCOSYLATION 

PDOCOOOOl 

PS00002 

109->113 

GLYCOSAMINOGLYCAN 

PDOC00002 

PS0D005 

90->93 

PKC PHOSPHO 

SITE 

PDOC00005 

PS00005 

111->114 

PKC PHOSPHO" 

"site 

PDOC00005 

PS00005 

147->150 

PKC PHOSPHO" 

"site 

PDOC00005 

PS00005 

226->229 

PKC'PHOSPHO' 

"site 

PDOC00005 

PS00005 

275->278 

PKC PHOSPHO' 

"SITE 

PDOC00005 

PS00005 

284->287 

PKC PHOSPHO" 

"site 

PDOC00005 

PS00005 

311->314 

PKC PHOSPHO" 

"site 

PDOC00005 

PS00005 

399->402 

PKC PHOSPH0~ 

~SITE 

PDOC00005 

PS00006 

48->52 

CK2 PHOSPHO' 

"site 

PDOC00006 

PS00006 

93->97 

CK2 PHOSPHO" 

"site 

PDOC00006 

PS00006 

123->127 

CK2 PHOSPHO" 

"site 

PDOC00006 

PS00006 

i89->193 

CK2 PHOSPHO" 

"site 

PDOC00006 

PS00006 

245->249 

CK2 PHOSPHO" 

'site 

PDOC00006 

PS00006 

284->288 

CK2 PHOSPHO 

SITE 

PDOC00006 

PS00008 

110->116 

MYRISTYL 


PDOC0D008 

PS00008 

175->181 

MYRISTYL 


PDOC00008 

PS00008 

185->191 

MYRISTYL 


PDOC00008 

PS00008 

385->391 

MYRISTYL 


PDOC00008 

PS00008 

406->412 

MYRISTYL 


PDOC00008 

PS00009 

402->406 

AMIDATION 


PDOC00009 


Pfam for DKFZphfbr2_3cl8.1 


HMM_N7U4E 

HMM 

Query 

HMM 

Query 

HMM 


DEAD and DEAH box helicases 


65 


*gLpPWILRnIyeMGFEkPTPIQQqAIPiILeG RDVMACAQTGSGK 

++ ++ +n ++ p e+ +++a++q+g+gk 

lirsnlvdntnqvevlqrdpnsplysvksfeelrlpqnliaqsqsgtgk 


113 


TAAFl I PMLQH I Dwd PWpqp PQdP r ALI LAPTRELAMQI QEEcRk Fg kHM 
TAAF++ ML+++ + + PQ +L L+PT ELA4Q+ ++++++GK++ 
114 TAAFVLAMLSQVEPAN— KYPQ— CLCLSPTYELALQTGKVIEQMGKFY 158 

nglRImcIYGGtnMRdQMRmLeRGpPHIVIATPGRLIDHIER.gtldLDr 
+ ++ + ++ ++ +++ +++ +IVI+TPG ++D + +D ++ 
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Query 159 PELKLAYAVR GNKLERGQKISEQIVIGTPGTVLDWCSKLKFIDPKK 204 

HMM leMLVMDEADRMLD.MGFIDQIRrlMrqlPMpwNRQTMMFSATMPdelqE 
I+++V+DEAD M+ +G +D0 RI R++P +N Q ++FSAT+ D++ + 
Query 205 IKVFVLDEADVMIATQGHQDQSIRIQRMLP— RNCQMLLFSATFEDSVWK 252 

HMM LARrFMRNPIRInldMdElTtnBnIkQwYiyVerEMWKfdcLcrLIe* 

+A ++ +P I ++++E T++ +IKQ+Y+ + + ++KF +LC+L++ 
Query 253 FAQKWPDPNVIKLKREEETLD-TIKQYYVLCSSRDEKFQALCNLYG 298 

HMM_NAME Helicases conserved C-terrainal domain 

HMM *EileeWLknlGIrvmYIHGdMpQeERdeIMddFNnGEynVLIcTDVggR 
+L+ +L+++G +V+ + G M+ E+R ++++F++G+ +VL++T+V +R 
Query 316 SWLAAELSKEGHQVALLSGEMMVEQRAAVIERFREGKEKVLVTTNVCAR 364 

HMM GIDIPdVNHVINYDM. . . .PWNPEq. . YIQRIGRTgRIG* 

GID+++V++VIN+D4 + NP++ Y++RIGRTGR+G 
Query 365 GIDVEQVSVVINFDLPVDKDGNPDNETYLHRIGRTGRFG 403 


Medline 

PMID: 10322435 

"Unwinding RNA in : DEAD-box proteins and related families." de la Cruz J, Kressler D, Linder 
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DKFZphfbr2_3fl6 


group; brain derived 

DKFZphfbr2_3f 16 encodes a novel 127 amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 


unknown 

complete cDNA, complete cds, EST hits 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 1514 bp 

Poly A stretch at pos. 1454, polyadenylation signal at pos. 1434 


1 GGGGGGACTG GAGAAGGGAG GCGGCGGGCG AAGCGCACGT CGAGCGGGGG 
51 AGCGGCGCTG CCTGTGGAGA TCCGCGGAGG CCGACAGGAT TCGTTGGCTG 
101 CCGTCCCCGC TGCTGTGCAT TGG6TTAAAA ACGACAACCA ACATCAGCCA 
151 TGAAAGATCC AAGTCGCAGC AGTACTAGCC CAAGCATCAT CAATGAAGAT 
201 GTGATTATTA ACGGTCATTC TCATGAAGAT GACAATCCAT TTGCAGAGTA 
251 CATGTGGATG GAAAATGAAG AAGAATTCAA CAGACAAATA GAAGAGGAGT 
301 TATGGGAAGA AGAATTTATT GAACGCTGTT TCCAAGAAAT GCTGGAAGAG 
351 GAAGAAGAGC ATGAATGGTT TATTCCAGCT CGAGATCTCC CACAAACTAT 
401 GGACCAAATC CAAGACCAGT TTAATGACCT TGTTATCAGT GAAGGCTCTT 
451 CTCTGGAAGA TCTTGTGGTC AAGAGCAATC TGAATCCAAA TGCAAAGGAG 
501 TTTGTTCCTG GGGTGAAGTA CGGAAATATT TGAGTAGACG GGGCCCTCTT 
551 TTGGTGGATG TAGCACAATT TCCACACTGT GAAGGCAGTA TTAGAAGACT 
601 TAATTGTAAA AGCACTCTTG TCACTGTGTT ACACTTATGC ATTGCCAAAG 
651 TTTTTGTTAG TCTTGCATGC TTAATAAAAG TGCTGAGACT GTTACTAAGT 
701 AAAAAGCTGT CAAACATTTA CTGAAAATAG AATTGGCCCC ATGGCTTGAT 
751 GTGAAGACAG CAAGGAAAGA AGCACCAGTC AAGTTGTGAA CAAGCACCAA 
801 ATTAAAAGAC CTAAACCTTA CCAAATTGTC TTTTTTTGAG GCTAATCTAT 
851 CACTTGTTAA TGTCTAAACT TTAAAATCAG TACATTTAAT TTGAGTTCCA 
901 ACTGTTAAGC ATATTTCTCA GACTTAAATT TGATTATGTC CCCATCAAAA 
951 AGAATCTCCA TTTTCTGAAG GTCTGTTAGT TAATTTGAGA TAATTTGTTA 
1001 AAGGCAAGTA TGTCATATTA CTGAGGCTAC AAGTTAGTCA GCAGATGAGT 
1051 GCCAGTCCAG CCTTTTCCGG TATGTTATTG TTAGAAATAT TGAGTTCTAA 
1101 TGTTACATCT GAGGAAGTAT GTAATTTGAG AATTGTAACT TCTAAGGGAT 
1151 TCACTGCATC ATAGCTATGC CTGTATGGAG TCTAACATAT GACCAATACC 
1201 AACCCATAAT CCAGCTGAAC AAAGATACTG TAACATTATG ATTTGAGTGG 
1251 TGCTTTTCCT TGCTTTGTTA ACCATCACGA GAGTCTGCAG CACAACTTTT 
1301 AACAAAGCTA GAACAGTTTT GGCTTCTTAA ACTTCATATT TGGGTAGGTT 
1351 AAGCTGCCAT ACGTGTTCAG TGTGAATAGT GTTTAAGTTG ATU^TATTGT 
1401 AAAAAAATTA TATTTTTTCA AAAATATTTA AAAAAATAAA TAATAGTAGA 
1451 ACTGAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAGAAAAA 
1501 AAAAAAAAAA AAAA 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 3 


ORF from 150 bp to 530 bp; peptide length: 127 
Category: putative protein 


1 MKDPSRSSTS PSIINEDVII NGHSHEDDNP FAEY^^WMENE EEFNRQIEEE 
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51 LWEEEFIERC FQEMLEEEEE HEWFIPARDL PQTMDQIQDQ FNDLVISEGS 
101 SLEDLVVKSN LNPNAKEFVP GVKYGNI 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_3f 16, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_3f 16, frame 3 

Report for DKFZphfbr2_3f 16. 3 

[LENGTH] 127 

[MW] 14998.41 

[pi] 4.04 

[BLOCKS] BL01269D 

[PROSITE] MYRISTYL 1 

[PROSITE] CK2_ PHOSPHORS I TE 2 

[KW] Alpha_Beta 

(KWJ LOW_COMPLEXITY 27.56 % 

SEQ MKDPSRSSTSPSIINEDVIINGHSHEDDNPFAEYMWMENEEEFNRQIEEELWEEEFIERC 


SEG 
PRD 


xxxxxxxxxxxxxxxxxxxxxxx 

ccccccccccccccccceeeecccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhh 


SEQ FQEMLEEEEEHEWFIPARDLPQTMDQIQDQFNDLVISEGSSLEDLWKSNLNPNAKEFVP 

SEG xxxxxxxxxxxx 

PRD ' hhhhhhhhhhhhhccccccccchhhhhhhhhcceeeecccccceeeeecccccccccccc 

SEQ GVKYGNI 

SEG 

PRD ccccccc 


Prosite for DKFZphfbr2_3f 16 . 3 

PS00006 24->28 CK2_PHOSPHO_SITE PDOC00006 

PS00006 100->104 CK2 PHOSPHOSITE PDOC00006 

PS00008 121->127 MYRISTYL PDOC00008 


(No Pfam data available for DKFZphfbr2 3fl6.3) 
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DKFZphfbr2_3g8 


group: ntetabolism 

DKFZphfbr2_3g8 . 1 encodes a novel 178 amino acid protein with similarity to yeast ARDl protein. 

In yeast, ARDl and NATl, are required for the expression of an N-terminal protein 
acetyltransf erase 1. NATl controls full repression of the silent mating type locus HML, 
sporulation and entry into GO. ARDl is involved in the assembly of the NAT 1-complex. The new 
protein could be part of this or an other NAT complex. 

The new protein can find application modulating NAT assembly and action and therefore be 
important in metabolism of drugs and environmental mutagens. 


strong similarity to N-TERMINAL ACETYLTftANSFERASE COMPLEX ARDl homolog 
complete cDNA, complete cds? start at Bp 40, EST hits 
Sequenced by AGOWA 
Locus: /map='*20" 
Insert length: 1030 bp 

Poly A stretch at pos. 1013, no polyadenylation signal found 

1 TGGGCTTGGC GAACGGTCTT CGGAAGCGGC GGCGGCGCGA TGACCACGCT 
51 ACGGGCCTTT ACCTGCGACG ACCTGTTCCG CTTCAACAAC ATTAACTTGG 
101 ATCCACTTAC AGAAACTTAT GGGATTCCTT TCTACCTACA ATACCTCGCC 
151 CACTGGCCAG AGTATTTCAT TGTTGCAGTG GCACCTGGTG GAGAATTAAT 
201 GGGTTATATT ATGGGTAAAG CAGAAGGCTC AGTAGCTAGG GAAGAATGGC 
251 ACGGGCACGT CACAGCTCTG TCTGTTGCCC CAGAATTTCG ACGCCTTGGT 
301 TTGGCTGCTA AACTTATGGA GTTACTAGAG GAGATTTCAG AAAGAAAGGG 
351 TGGGTTTTTT GTGGATCTCT TTGTAAGAGT ATCTAACCAA GTTGCAGTTA 
401 ACATGTACAA GCAGTTGGGC TACAGTGTAT ATAGGACGGT CATAGAGTAC 
451 TATTCGGCCA GCAACGGGGA GCCTGATGAG GACGCTTATG ATATGAGGAA 
501 AGCACTTTCC AGGGATACTG AGAAGAAATC CATCATACCA TTACCTCATC 
551 CTGTGAGGCC TGAAGACATT GAATAACCCT GGGCAGTGGT TCTTAGGCAG 
601 ATACTCTAGA TGCTTTATGG ACAATATTAT TTTCATTGGA TGATTCTGGA 
651 GCTCTATTAG GAGAAAAGTA ATCATTTTAG GTCTTAAAGA CTTCAAG/^ 
701 ATACAGGTTA TCAATTTATT TTAAATCTCA TTGTTTCCAG TTAGCAATAT 
751 CATACCTATT AAAGCTGTTC ATTGTAACAA AATTCAATCA AAAAGGCAGC 
801 TAGGTCAGAA GGAAACATAC CACTCTCATG GTTCATAGTA TTCACTGTAT 
851 GTATGCTAGG GAAAAGACTT GCTCCAGTCT CCTCCTCAGT TCTGTGCCTG 
901 AGAACCACTG CTGCATATAT TTGTTTTTAA ATTTTGTATT GAACTGTTAA 
951 TTGAAGCTTT AAAAGCATAT ATGAAATGTA TAAATCTAAG ATGTATAATA 
1001 CATTATTGAC TCCAAAAAAA AAAAAAAAAA 


BLAST Results 


Entry H5G0101 from database EMBL: 
human 5TS SHGC-35956. 
Length - 401 
Minus Strand HSPs: 

Score = 1417 (212.6 bits). Expect = 9.3e-58, P = 9.3e-58 
Identities « 301/311 (96%) 


Medline entries 


No Medline entry 


Peptide information for frame 1 


ORF from 40 bp to 573 bp; peptide length: 178 
Category: strong similarity to known protein 


1 MTTLRAFTCD DLFRFNNINL DPLTETYGIP FYLQYLAHWP EYFIVAVAPG 
51 GELMGYIMGK AEGSVAREEW HGHVTALSVA PEFRRLGLAA KLMELLEEIS 
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101 ERKGGFFVDL FVRVSNQVAV NMYKQLGYSV YRTVIEYYSA SNGEPDEDAY 
151 DMRKALSRDT EKKSIIPLPH PVRPEDIE 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_3gB, frame 1 

TREMBL:SPCC16C4_12 gene: "SPCC16C4 . 12"; product: "putative n-terminal 
acetyl transferase complex subunit"; S.pombe chromosome III cosmid 
C16C4., N = 1, Score « 475, P = 3.2e-45 

SWISSPROTrARDHLEIDO N-TERMINAL ACETYLTRANSFERASE COMPLEX ARDl SUBUNIT 
HOMOLOG., N = 1, Score = 451, P = l.le-42 

PIR:S69021 hypothetical protein YPR131c - yeast (Saccharomyces 
cerevisiae), N = 1, Score = 382, P = 2.3e-35 


>TREMBL:SPCC16C4_12 gene: "SPCC16C4 . 12"; product: "putative n-terminal 

acetyltransf erase complex subunit"; S.pombe chromosome III cosmid cl6C4. 
Length = 180 

HSPs: 

Score - 475 (71.3 bits). Expect = 3.2e-45, P = 3,2e-4S 
Identities = 96/165 (58%), Positives = 118/165 (71%) 

Query: 1 MTTLRAFTCDDLFRFNNINLDPLTETYGIPFYLQYLAHWPEYFIVAVAPGGE — LMGYIM 58 

MT R F DLF FNNINLDPLTET+ I FYL YL WP +V + + LMGYIM 

Sbjct: 1 MTDTRKFKATDLFSFNNINLDPLTETFNISFYLSYLNKWPSLCVVQESDLSDPTLMGYIM 60 

Query: 59 GKAEGSVAREEWHGHVTALSVAPEFRRLGLAAKLMELLEEISERKGGFFVDLFVRVSNQV 118 

GK+EG+ +EWH HVTA++VAP RRLGLA +M+ LE + + FFVDLFVR SN + 
Sbjct: 61 GKSEGT — GKEWHTHVTAITVAPNSRRLGLARTMMDYLETVGNSENAFFVDLFVRASNAL 118 

Query: 119 AVNMYKQLGYSVYRTVIEYYSASNGEPDEDAYDMRKALSRDTEKKSI 165 

A++ YK LGYSVYR VI YYS +G+ DED++DMRK LSRD ++SI 
Sbjct: 119 AIDFYKGLGYSVYRRVIGYYSNPHGK-DEDSFDMRKPLSRDVNRESI 164 


Pedant information for DKF2phfbr2_3g8, frame 1 


Report for DKF2phfbr2_3g8 . 1 


[LENGTH J 
[MW] 
[pl] 
[HOMOLl 
acetyltransfe 
CFUNCAT] 
palmitylation 
[FUNCATl 
4e-14 
(FONCAT] 
(F0NCAT3 
(FUNCAT] 
(PIRKW) 
(SDPFAM) 
(SOPFAMl 
[PROSITE] 
[ PROSITE] 
[KW] 


178 

20338.24 
5.06 

TREMBL:SPCC16C4_12 gene: "SPCC16C4 . 12"; product: "putative n-terminal 
rase complex subunit"; S.pombe chromosome III cosmid cl6C4. 7e-47 

06.07 protein modification (glycolsylation, acylation, myristylation, 
farnesylation and processing) [S. cerevisiae, YPRl3lcJ 6e-37 

01.06.07 lipid, fatty-acid and sterol utilization {S. cerevisiae, YHR013c) 


(S. cerevisiae, YHR0l3c] 4e-14 
(S. cerevisiae, YHR013cJ 4e-14 
(M. jannaschii, MJ15301 6e-09 


30.03 organization of cytoplasm 
03.22 cell cycle control and mitosis 
r general function prediction 
acyltransferase le-12 
arrest-defective protein 1 le-12 

Escherichia coli peptide N-acetyltransf erase rimi le-07 
CK2_PH0SPH0_SITE 3 
PKC_PHOSPHO_SITE 3 
Alpha_Beta 


SEQ MTTLRAFTCDDLFRFNNINLDPLTETYGIPFYLQYLAHWPEYFIVAVAPGGELMGYIMGK 

PRO ccccccccccchhhhhhcccccccccccchhhhhhcccccceeeeeeccccceeeehhhh 

SEQ AEGSVAREEWHGHVTALSVAPEFRRLGLAAKLMELLEEISERKGGFFVDLFVRVSNQVAV 

PRO hcccccccccccceeeeehhhhhhhhcchhhhhhhhhhhhhhccceeeeeeeecchhhhh 

SEQ NMYKQLGYSVYRTVIEYYSASNGEPDEDAYDMRKALSRDTEKKSIIPLPHPVRPEDIE 

PRD hhhhhhcccchhhhhhccccccccccchhhhhhhhhhhhhhhhhcccccccccccccc 


Prosite for DKFZphf br2_3g8 . 1 
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PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 


3->6 PKC PHOSPHO SITE 


100->103 
160->163 
8->12 
133->137 
X41->145 


PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
CK2_PHOSPH0_SITE 
CK2_PHOSPHO_SITE 
CK2 PHOSPHO SITE 


PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
POOC00006 


(NO Pfam data available for DKF2phfbr2_3g8.1) 
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DKF2phfbr2_312 


group: brain derived 

DKFZphfbr2_312 encodes a novel 589 amino acid protein with weak similarity to S. cerevisiae 
ubiquitin-like protein DSK2. 

Pfam predicts for this protein similarity to the ubiquitin family; No informative BLAST 
results; No predictive prosite or SCOP motive 

The new protein can find application in studying the expression profile of brain-specific 
genes. 

similarity to ubiquitin-like protein DSK2 yeast 
complete cDNA, complete cds, EST hits 

Dsk2p is involved in spindel pole body SPB duplication, SPB « centomer 
strong similarity to HRIHFB2157 human mRNA 

Sequenced by AGOWA 

Locus: unknown 

Insert length: 2978 bp 

Poly A stretch at pos. 2958, polyadenylation signal at pos. 2924 

1 GGGGGGAGGA AGCGGTGGCT GCTGCGGATG TCGGTGTGAG CGAGCGGCGC 
51 CTGAACACAC GGCGGCTGCC GAGCGCCTGA CCCGGGCCTG CGCCAGAGCC 

101 TGCACCGAGC TCCGGGGCCC CACACCCGCT ACGGTGGCCC TGCGCCCGTT 

151 GCTACTGAGG CGGCGTGCTC TGCATTCTTC GCTGTCCAGG CCTGCCGGCT 

201 CTGGTGTCTG CTGGCTCCTC CTTGCTCGCC TGCTCCCTCC TGCTTGCCTG 

251 AGTCACCGCC GCCGCCGCCG CCACAGCCAT GGCCGAGAGT GGTGAAAGCG 

301 GCGGTCCTCC GGGCTCCCAG GATAGCGCCG CCGGAGCCGA AGGTGCTGGC 

351 GCCCCCGCGG CCGCTGCCTC CGCGGAGCCC AAAATCATGA AAGTCACCGT 

401 GAAGACCCCG AAGGAAAAGG AGGAATTCGC CGTGCCCGAG AATAGCTCCG 
* 451 TCCAGCAGTT TAAGGAAGAA ATCTCTAAAC GTTTTAAATC ACATACTGAC 

501 CAACTTGTGT TGATATTTGC TGGAAAAATT TTGAAAGATC AAGATACCTT 

551 GAGTCAGCAT GGAATTCATG ATGGACTTAC TGTTCACCTT GTCATTAAAA 

601 CACAAAACAG GCCTCAGGAT CATTCAGCTC AGCAAAC/UU^ TACAGCTGGA 

651 GGCAATGTTA CTACATCATC AACTCCTAAT AGTAACTCTA CATCTGGTTC 

701 TGCTACTAGC AACCCTTTTG GTTTAGGTGG CCTTGGGGGA CTTGCAGGTC 

751 TGAGTAGCTT GGGTTTGAAT ACTACCAACT TCTCTGAACT ACAGAGTCAG 

801 ATGCAGCGAC AACTTTTGTC TAACCCTGAA ATGATGGTCC AGATCATGGA 

851 AAATCCCTTT GTTCAGAGCA TGCTCTCAAA TCCTGACCTG ATGAGACAGT 

901 TAATTATGGC CAATCCACAA ATGCAGCAGT TGATACAGAG AAATCCAGAA 

951 ATTAGTCATA TGTTGAATAA TCCAGATATA ATGAGACAAA CGTTGGAACT 
1001 TGCCAGGAAT CCAGCAATGA TGCAGGAGAT GATGAGGAAC CAGGACCGAG 
1051 CTTTGAGCAA CCTAGAAAGC ATCCCAGGGG GATATAATGC TTTAAGGCGC 
1101 ATGTACACAG ATATTCAGGA ACCAATGCTG AGTGCTGCAC AAGAGCAGTT 
1151 TGGTGGTAAT CCATTTGCTT CCTTGGTGAG CAATACATCC TCTGGTGAAG 
1201 GTAGTCAACC TTCCCGTACA GAAAATAGAG ATCCACTACC CAATCCATGG 
1251 GCTCCACAGA CTTCCCAGAG TTCATCAGCT TCCAGCGGCA CTGCCAGCAC 
1301 TGTGGGTGGC ACTACTGGTA GTACTGCCAG TGGCACTTCT GGGCAGAGTA 
1351 CTACTGCGCC AAATTTGGTG CCTGGAGTAG GAGCTAGTAT GTTCAACACA 
1401 CCAGGAATGC AGAGCTTGTT GCAACAAATA ACTGAAAACC CACAACTGAT 
1451 GCAAAACATG TTGTCTGCCC CCTACATGAG AAGCATGATG CAGTCACTAA 
1501 GCCAGAATCC TGACCTTGCT GCACAGATGA TGCTGAATAA TCCCCTATTT 
1551 GCTGGAAATC CTCAGCTTCA AGAACAAATG AGACAACAGC TCCCAACTTT 
1601 CCTCCAACAA ATGCAGAATC CTGATACACT ATCAGCAATG TCAAACCCTA 
1651 GAGCAATGCA GGCCTTGTTA CAGATTCAGC AGGGTTTACA GACATTAGCA 
1701 ACGGAAGCCC CGGGCCTCAT CCCAGGGTTT ACTCCTGGCT TGGGGGCATT 
1751 AGGAAGCACT GGAGGCTCTT CGGGAACTAA TGGATCTAAC GCCACACCTA 
1801 GTGAAAACAC AAGTCCCACA GCAGGAACCA CTGAACCTGG ACATCAGCAG 
1851 TTTATTCAGC AGATGCTGCA GGCTCTTGCT GGAGTAAATC CTCAGCTACA 
1901 GAATCCAGAA GTCAGATTTC AGCAACAACT GGAACAACTC AGTGCAATGG 
1951 GATTTTTGAA CCGTGAAGCA AACTTGCAAG CTCTAATAGC AACAGGAGGT 
2001 GATATCAATG CAGCTATTGA AAGGTTACTG GGCTCCCAGC CATCATAGCA 
2051 GCATTTCTGT ATCTTGAAAA AATGTAATTT ATTTTTGATA ACGGCTCTTA 
2101 AACTTTAAAA TACCTGCTTT ATTTCATTTT GACTCTTGGA ATTCTGTGCT 
2151 GTTATAAACA AACCCAATAT GATGCATTTT AAGGTGGAGT ACAGTAAGAT 
2201 GTGTGGGTTT TTCTGTATTT TTCTTTTCTG GAACAGTGGG AATTAAGGCT 
2251 ACTGCATGCA TCACTTCTGC ATTTATTGTA ATTTTTTAAA AACATCACCT 
2301 TTTATAGTTG GGTGACCAGA TTTTGTCCTG CATCTGTCCA GTTTATTTGC 
2351 TTTTTAAACA TTAGCCTATG GTAGTAATTT ATGTAGAATA AAAGCATTAA 
2401 AAAGAAGCAA ATCATTTGCA CTCTATAATT TGTGGTACAG TATTGCTTAT 
2451 TGTGACTTTG GCATGCATTT TTGCAAACAA TGCTGTAAGA TTTATACTAC 
2501 TGATAATTTT GTTTTATTTG TATACAATAT AGAGTATGCA CATTTGGGAC 
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2551 TGCATTTCTG GAAACATACT GCAATAGGCT CTCTGAGCAA AACACCTGTA 
2601 ACTAAAAAAG TGAAGATAAG AAAATACTCT TAAAGCTGAG TATTTCCTAA 
2651 TTGTATAGAA TCTTACAGCA TCTTTGACAA ACATCTCCCA GCAAAAGTGC 
2701 CGGTTAGTCA GGTTTGTTGA AAATACAGTA GAAAAGCTGA TTCTGGTTAT 
2751 CTCTTTAAGG ACAATTAATT GTACAGACAC ATAATGTAAC ATTGTCTCAA 
2801 CATTCATTCA CAGATTGACT GTAAATTACC TTAATCTTTG TGCAGACTGA 
2851 AGGAACACTG TAGTATACCC CAAAGTGCAT TTGCCTAGGA CTTCTCAGCT 
2901 TCTCCCATAG GTAGTTTAAC AGGCATTAAA ATTTGTAATT GAAATGTTGC 
2951 TTTCACTCAA AAAAAAAAAA AAAAAAAA 


BLAST Results 


NO BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 3 


ORF from 279 bp to 2045 bp; peptide length: 589 
Category: similarity to known protein 


1 MAESGESGGP PGSQDSAAGA EGAGAPAAAA SAEPKIMKVT VKTPKEKEEF 
51 AVPENSSVCX5 FKEEISKRFK SHTDQLVLIF AGKILKDQDT LSQHGIHDGL 
101 TVHLVIKTQN RPQDHSAQQT NTAGGNVTTS STPNSNSTSG SATSNPFGLG 
151 GLGGLAGLSS LGLNTTNFSE LQSQMORQLL SNPEMMVQIM ENPFVQSMLS 
201 NPDLMRQLIM ANPQMQQLIQ RNPEISHMLN NPDIMRQTLE LARNPAMMQE 
251 MMRNQDRALS NLESIPGGYN ALRRMYTDIQ EPMLSAAQEQ FGGNPFASLV 
301 SNTSSGEGSQ PSRTENROPL PNPWAPQTSQ SSSASSGTAS TVGGTTGSTA 
351 SGTSGQSTTA PNLVPGVGAS MFNTPGMQSL LQQITENPQL MQNMLSAPYM 
401 RSMMQSLSQN PDLAAQMMLN NPLFAGNPQL QEQMRQQLPT FLQQMQNPDT 
4 51 LSAMSNPRAM QALLQIQQGL QTLATEAPGL IPGFTPGLGA LGSTGGSSGT 
501 NGSNATPSEN TSPTAGTTEP GHQQFIQQML QALAGVNPQL QNPEVRFQQQ 
551 LEQLSAMGFL NREANLQALI ATGGDINAAI ERLLGSQPS 

BLASTP hits 

Entry CE1_1 from database TREMBL: 

**F15C11.2"; Caenorhabditis elegans cosmid VF15C11L 
Length = 293 

Score = 454 (159.8 bits). Expect = 4.4e-43, P = 4.4e-43 
Identities = 81/162 (50%), Positives = 113/162 (69%) 

Entry 554583 from database PIR: 

ubiquitin-like protein DSK2 - yeast (Saccharomyces cerevisiae) 

Length « 373 

Score = 278 (97.9 bits). Expect « 1.2e-23, P = 1.2e-23 
Identities = 100/307 (32%), Positives « 155/307 (50%) 

Entry AB015344_1 from database TREMBLNEW: 

gene: "HRIHFB2157"; Homo sapiens HRIHFB2157 itiRNA, partial cds. 
Score = 1135, P = 3.6e-115, identities = 227/301, positives « 253/301 


Alert BLASTP hits for DKFZphfbr2_312, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_312, frame 3 


Report for DKFZphf br2_312 . 3 


589 

62489.22 
5.02 

TREMBL:AB015344_1 gene: "HRIHFB2157"; Homo sapiens HRIHFB2157 mRNA, partial 
03.22 cell cycle control and mitosis [S. cerevisiae, YMR276w] 2e-17 


(LENGTH) 

(MWJ 

[pl] 

[HOMOL] 

cds. le-121 

IFONCAT) 
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(FUNCAT) 30.10 nuclear organization (S. cerevisiae, YMR276w] 2e-17 

[BLOCKS] BL00299 Ubiquitin family proteins 

[SUPFAMj unassigned ubiquitin-related proteins 5e-16 

(SUPFAMJ ubiquitin homology 5e-16 

tPROSITE} MYRISTYL 24 

[PROSITE] CK2_PHOSPHO_SITE 9 

[PROSITE] GLYCOSTmiNOGLYCAN 1 

( PROS I TE J PKC_PHOS PHOS I TE 3 

(PROSITE] ASN_GLYCOSYLATION 7 

[PFAMJ Ubiquitin family 

[KW] Irregular 

[KW] 3D 

[KW] LOW_C0MPLEXITY 23.43 % 

SEQ MAESGESGGPPGSQDSAAGAEGAGAPAAAASAEPKIMKVTVKTPKEKEEFAVPENSSVQQ 

SEG , . xxxxxxxxxxx . . xxxxxxxxxxxxxxxxxxx . . . xxxxxxxxxxxx 

laarA CEEEEEETTTCEEEECTTTTBHHH 


SEQ FKEEISKRFKSHTDQLVLIFAGKILKDQDTLSQHGIHDGLTVHLVIKTQNRPQDHSAQQT 

SEG 

laarA HHHHHHHHHCCCGGGEEEEETTEECTTTTBGGGGCCTTTTEEEEEBC 

SEQ NTAGGNVTTSSTPNSNSTSGSATSNPFGLGGLGGLAGLSSLGLNTTNFSELQSQMQRQLL 

SEG . . .xxxxxxxxxxxxxxxxxxxxxx. .xxxxxxxxxxxxxxxx 

laarA 

SEQ SNPEMMVQIMENPFVQSMLSNPDLMRQLIMANPQMQQLIQRNPEISHMLNNPDIMRQTLE 

SEG 

laarA 

SEQ LARNPAMMQEMMRNQDRALSNLESIPGGYNALRRMYTDIQEPMLSAAQEQFGGNPFASLV 

SEG 

laarA 

SEQ SNTSSGEGSQPSRTENRDPLPNPWAPQTSQSSSASSGTASTVGGTTGSTASGTSGQSTTA 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

laarA 

SEQ PNLVPGVGASMFNTPGMQSLLQQITENPQLMQNMLSAPYMRSMMQSLSQNPDLAAQMMLN 

SEG 

laarA 

SEQ NPLFAGNPQLQEQMRQQLPTFLQQMQNPDTLSAMSNPRAMQALLQIQQGLQTLATEAPGL 

SEG 

laarA 

SEQ IPGFTPGLGALGSTGGSSGTNGSNATPSENTSPTAGTTEPGHQQFIQ(»«LQALAGVNPQL 

SEG , . . .xxxxxxxxxxxxxxxxxxxxxxxx 

laarA 

SEQ QNPEVRFQQQLEQLSAMGFLNREANLQALIATGGDINAAIERLLGSQPS 

SEG 

laarA 


Prosite for DKFZphfbr2_312 . 3 


PSOOOOl 

55->59 

PSOOOOl 

126->130 

PSOOOOl 

136->140 

PSOOOOl 

164->168 

PSOOOOl 

167->171 

PSOOOOl 

302->306 

PSOOOOl 

501->505 

PS00002 

305->309 

PS00005 

40->43 

PS00005 

43->46 

PS00005 

66->69 

PS00006 

43->47 

PS00006 

71->75 

PS00006 

181->185 

PS00006 

200->204 

PS00006 

260->264 

PS00006 

304->308 

PS00006 

312->316 

PS00006 

506~>510 

PS00006 

572->576 

PS00008 

8->14 

PS00008 

12->18 


ASN_GL YCOS YLAT I ON 

ASN^GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GL YCOS YLAT I ON 

GLYCOSAMINOGLYCAN 

PKC_PHOSPH0_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PH0S PHO_S I TE 

CK2_PH0S PHO_S I TE 

CK2__PH0SPH0 SITE 

CK2_PH0SPH0"SITE 

CK2_PH0SPH0_SITE 

C K2_PHOS PH0_S I TE 

CK2_PH0S PHO^S ITE 

C K2__PH0S PH0_S I TE 

CK2_PHOSPH0_SITE 

MYRISTYL 

MYRISTYL 


PDOCOOOOl 
PDOCOOOOl 
PDOCOOOOl 
PDOCOOOOl 
PDOCOOOOl 
PDOCOOOOl 
PDOCOOOOl 
PDOC00002 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 
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PS00008 

19->25 

MYRISTYL 

PDOC00008 

PS00008 

24->30 

MYRISTYL 

PDOC00008 

PS00008 

95 

->101 

MYRISTYL 

PDOC00008 

PS0OOO8 

124 

>>130 

MYRISTYL 

PDOC00008 

PS00008 

140 

->146 

MYRISTYL 

PDOCOOOOB 

PS00008 

150 

->156 

MYRISTYL 

PDOC00008 

PS00008 

153 

>>159 

MYRISTYL 

PDOC00008 

PS00008 

162- 

->168 

MYRISTYL 

PDOC00008 

PS00008 

267- 

->273 

MYRISTYL 

PDOC00008 

PS00008 

293- 

->299 

MYRISTYL 

PDOC00008 

no A A r> o 

PSUUOUa 

308- 

->314 

MYRISTYL 

PDOC00008 

PS00008 

337- 

->343 

MYRISTYL 

PDOC00008 

PS00008 

343- 

->349 

MYRISTYL 

PDOC00008 

PS00008 

- 347- 

->353 

MYRISTYL 

PDOCOOOOB 

PS00008 

355- 

->361 

MYRISTYL 

PDOC00008 

PS00008 

366- 

->372 

MYRISTYL 

PDOC00008 

PS00008 

479- 

->485 

MYRISTYL 

PDOCOOOOB 

PS00008 

489- 

->495 

MYRISTYL 

PDOCOOOOB 

PS00008 

492- 

->498 

MYRISTYL 

PDOCOOOOB 

PS00008 

495- 

->501 

MYRISTYL 

PDOC00008 

PS00008 

499- 

•>505 

MYRISTYL 

PDOCOOOOB 

PS00003 

573- 

■>579 

MYRISTYL 

PDOCOOOOB 


Pfam for DKFZphfbr2_312. 3 


HMM_NAME 

HMM 

Query 

HMM 

Query 


Ubiquitin family 

* MQI rVKT L t G RTcT FEVepQE tVeqIKQHIeekEGIP PeQQRL I FaG RQ 
M -I-+VKT + +F V+-*-»- V Q+K+ 1+ +Q -^LIFAG-«■ 

37 MKVTVKTPK-EKEEFAVPENSSVQQFKEEISKRFKSHTDQLVLIFAGKI 

LEDeKTLsDYNIggeSTLHLVlR* 
L D TLS+++I + T+HLV-I-+ 
85 LKDQDTLSQHGIHDGLTVHLVIK 107 
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DKFZphfbr2_62bll 


group: signal transduction 

DKFZphfbr2_62bll .encodes a novel 655 amino acid putative GTPase-activating protein, related to 
human chimaerins . 

The rac small GTPase is associated with type-I phosphatidylinositol 4-phosphate 5-kinase and 
regulating the production of phosphatidylinositol 4, 5-bisphosphate . The new protein is 
expected to activate p21rac-related small GTPases. 

The new protein can find clinical application in modulating/blocking the response to a 
cellular receptor. 


similarity to CHIMAERIN 

complete cDNA, complete cds, EST hits 

Sequenced by LMU 

Locus: /map="4" 

Insert length: 4593 bp 

Poly A stretch at pos. 4571, polyadenylation signal at pos . 4553 


1 GGGGGAGTTT GAAGACAGAA AGGAAAGGGG AGAAACCTGC AGAGAGCATC 
51 AAAGGATGGG GGGTGCTATA AAAGAAGCAG GGGGGTCCTT TGAAAGAAAT 
101 CTATCATGCA CTGAAATGCT TTCTGGAGAA GGTGCCGTTA TTTTCCTCCC 
151 CTCTTGCTCA GATGAAAGGA GCCAGCAAGG ACAGTCCTGA AATATTCCTC 
201 AGGGGACTTT TTGTCATTGT TCCTCTTTCC TCTTGCACAG AGCTATTTGC 
251 TGACCTTTCC AGAGGAATCT CAGTCCAGCT GAGAAGACAG TTCTTAATAA 
301 AAACAAAAAA ATGCAAAAAC CAATTCCTGC TGTTTGAATG GGAATGGTAG 
351 CTTGCTTGCT GCAGTTCTTT TCCTGTGACA TTTTGGAATG TCTGCAGAAA 
401 CTTAAAAAAA AGAAAAAAAA AACCTTAAAA ACTCCCTGGA TTAGGCAAGA 
451 GAAAAGGAAG TTTTTTTTTG CTAAACAGGA GTAAATGAGA GGTGGTAACT 
501 TATCCCTAAG CCAGGACCTG GATGATCAAA ACCTTCAAAT TCTAGGGATC 
551 AGCACTTCAA AAATAACAAG TAAACAAGCA TGAGGAGTGG CTGTTGGGTT 
601 TCGCTCAGAG GCAGGTTTTA AAGGAAGCCA AAACCGGGTT CAGAACTTCA 
651 GGCCTGTACG ATGCCTGAAG ACCGGAATTC TGGGGGGTGC CCGGCTGGTG 
701 CCTTAGCCTC AACTCCTTTC ATCCCTAAAA CTACATACAG AAGAATCAAA 
751 CGGTGTTTTA GTTTTCGGAA AGGCATTTTT GGACAGAAAC TGGAGGATAC 
801 TGTTCGTTAT GAGAAGAGAT ATGGGAACCG TCTGGCTCCG ATGTTGGTGG 
851 AGCAGTGCGT GGACTTTATC CGACAAAGGG GGCTGAAAGA AGAGGGTCTC 
901 TTTCGACTGC CAGGCCAGGC TAATCTTGTT AAGGAGCTCC AAGATGCCTT 
951 TGACTGTGGG GAGAAGCCAT CATTTGACAG CAACACAGAT GTACACACGG 
1001 TGGCATCACT TCTTAAGCTG TACCTCCGAG AACTTCCAGA ACCAGTTATT 
1051 CCTTATGCGA AGTATGAAGA TTTTTTGTCA TGTGCCAAAC TGCTCAGCAA 
1101 GGAAGAGGAA GCAGGTGTTA AGGAATTAGC AAAGCAGGTG AAGAGTTTGC 
1151 CAGTGGTAAA TTACAACCTC CTCAAGTATA TTTGCAGATT CTTGGATGAA 
1201 GTACAGTCCT ACTCGGGAGT TAACAAAATG AGTGTGCAGA ACTTGGCAAC 
1251 GGTCTTTGGT CCTAATATCC TGCGCCCCAA AGTGGAAGAT CCTTTGACTA 
1301 TCATGGAGGG CACTGTGGTG GTCCAGCAGT TGATGTCAGT GATGATTAGC 
1351 AAACATGATT GCCTCTTTCC CAAAGATGCA GAACTACAAA GCAAGCCCCA 
1401 AGATGGAGTG AGCAACAACA ATGAAATTCA GAAGAAAGCC ACCATGGGGC 
1451 TGTTACAGAA CAAGGAGAAC AATAACACCA AGGACAGCCC TAGTAGGCAG 
1501 TGCTCCTGGG ACAAGTCTGA GTCACCCCAG AGAAGCAGCA TGAACAATGG 
1551 ATCCCCCACA GCTCTATCAG GCAGCAAAAC CAACAGCCCA AAGAACAGTG 
1601 TTCACAAGCT AGATGTGTCT AGAAGCCCCC CTCTCATGGT CAAAAAGAAC 
1651 CCAGCCTTTA ATAAGGGTAG TGGGATAGTT ACCAATGGGT CCTTCAGCAG 
1701 CAGTAATGCA GAAGGTCTTG AGAAAACCCA AACCACCCCC AATGGGAGCC 
1751 TACAGGCCAG AAGGAGCTCT TCACTGAAGG TATCTGGTAC CAAAATGGGC 
1801 ACGCACAGTG TACAGAATGG AACGGTGCGC ATGGGCATTT TGAACAGCGA 
1851 CACACTCGGG AACCCCACAA ATGTTCGAAA CATGAGCTGG CTGCCAAATG 
1901 GCTATGTGAC CCTGAGGGAT AACAAGCAGA AAGAACAAGC TGGAGAGTTA 
1951 GGCCAGCACA ACAGACTGTC CACCTATGAT AATGTCCATC AACAGTTCTC 
2001 CATGATGAAC CTTGATGACA AGCAGAGCAT TGACAGTGCT ACCTGGTCCA 
2051 CTTCCTCCTG TGAAATCTCC CTCCCTGAGA ACTCCAACTC CTGTCGCTCT 
2101 TCTACCACCA CCTGCCCAGA GCAAGACTTT TTTGGGGGGA ACTTTGAGGA 
2151 CCCTGTTTTG GATGGGCCCC CGCAGGACGA CCTTTCCCAC CCCAGGGACT 
2201 ATGAAAGCAA AAGTGACCAC AGGAGTGTGG GAGGTCGAAG TAGTCGTGCC 
2251 ACCAGTAGCA GTGACAACAG TGAGACATTT GTGGGCAACA GCAGCAGCAA 
2301 CCACAGTGCA CTGCACAGTT TAGTTTCCAG CCTGAAACAG GAAATGACCA 
2351 AACAGAAGAT AGAGTATGAG TCCAGGATAA AGAGCTTAGA ACAGCGAAAC 
2401 TTGACTTTGG AAACAGAAAT GATGAGCCTC CATGATGAAC TGGATCAGGA 
2451 GAGGAAAAAG TTCACAATGA TAGAAATAAA AATGCGAAAT GCCGAGCGAG 
2501 CAAAAGAAGA TGCCGAGAAA AGAAATGACA TGCTACAGAA AGAAATGGAG 
2551 CAGTTTTTTT CCACGTTTGG AGAACTGACA GTGGAACCCA GGAGAACCGA 
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2601 GAGAGGAAAC ACAATATGGA TTCAGTGAGC CTGCTTTCGC CTGCTGTCTC 
2651 TGATGGCTCT GGCAAGGACT CCAGGGATTC TGGTGGGATA TGACTTAGAA 
2701 CCAGGTGGCT GGTCACCTGG ATGTACAGAA GTCTAACTGG TGAAGGAATA 
2751 TCATTTACAG ACATTAAACA TCCATATCTG CAATGTGTAC CAAAGTTATA 
2801 TCATGCCCCA TAATGCTACT GTCAAGTGTT ACAACTGGAT ATGTGTATAT 
2851 AGAGTAGTTT TTCAAAAGTA AACTAAAAAT GAGAAGCATA TTTCAAGAAT 
2901 TATTTTATTG CAAGTCTTGT ATTTAAATGT TAAATCAATA TGTTGTTGCA 
2951 ATTTAGCTTG CTTTCAAGCT TCACCCCTTG CACTTAACAT AAGCTATTTT 
3001 TGGCATTGTG TTATCATCGG CTTATTTTAT AGATCAATAT TTTTATTTCC 
3051 CTTTTTTGCT GAGGAAATGA AGATAAGCAA AAATATAAAT ATATATATAA 
3101 ATATATGAGT TATTAAAACC AGAAGAATAC TTTGTGGCTG TGCTGTTTGT 
3151 GCCAATAGAC TTTGTCATGA CCAAAAAGAG AAATGTAAAT AGTTTTATAA 
3201 AATACAGTCG AATCACCAGG AACCTTTGAG CTGCTTTTAA AATTCTTCCC 
3251 CTGGCACCAC TCAGTTTTGC TTTTGCGAGG CGATTTGACA TAGGAACTTT 
3301 GAGACTCCAT GAGAAAGTCC CTTTCTGAGG CCCACTGTCT ACCTTGCCAG 
3351 ATCCTCAGTG CGTATCGCCA ATGCAGGATG CTCCTTAGAA AAGAAAAAAT 
3401 GGTAAAGGAT GGCATTTAAC GATTCAGGCT TTGAATTACT CTGTCCCTCT 
3451 GGACCGAJVTC TCTTTAACTG CTGGATAGTT TTAGAGGAAT TCTCCTGCTA 
3501 CTTAGGTACT GGGAAACAAT GCTTGCTAAA CCATGCCCAC GTGAGCACCT 
3551 GTCTCCCACT CAAACCTCTC CCATCTCCCA ACAACTGCAC TTTAGAATAC 
3601 CAGCAGTGAA ATGGTATTAC TGTTTCCCTC TGAGTGAAAC TGCTAGAGTA 
3651 TATGTCACGT AGTGACATTT TTTTCTCACT CAGGCTATTG CCATCTGGGA 
3701 TTCTCTCCCT ACTACAGCTG GCAAAGTTGG TTTGCAGCAA GAAGATAGTG 
3751 GGAGGGGGCC AGGCTGCAGG AGAAGGAGAA AAGTTTAGAA GAAACAAACC 
3801 ATTTTGCTTC TAATTTTGAC AGTATCACTT TCCTGTTAAA ACATACAATA 
3851 ATTTTAAAAG GTGAATGCCT AAAGTTCCAA TTTTAGCAAA TATGGGAACC 
3901 TCAGCAATGC TAATTTTCTA GAAAAACCCA GGGCTCTTTG GAGCTAGAGT 
3951 TTTGGGAGAA CAGTTCTTCA CAATAAGGCA ATGGTTTTGA GAGGCCAGGC 
4001 AAATAATCTT TCTCACCGTA GAACAAAAAG TTACAAAAGG CATAATCGGA 
4051 AATAGAGACT ACATACTTGA GTTTATGGGG TTTGTGTTGT TTGAAGGTTC 
4101 AATGCTTGCA TGTGTTTATT TATTTTCAAG AGGGAAAGTG GTCTGTACTG 
4151 CTTTCATCCT TGCCACTGTC TTGCTTTTAT TTTTTACTCT CCCACTGAGC 
4201 AAGCGTCTGT GGTCCTATGG TATCAACCAG TATCTTTATA GCAATAATTT 
4251 CTTTAATTCC CTTTTCTCTC TCTTTCCAAT TATTTAACCA GTTACTTCCA 
4301 CCTGGACATA CGATAGGAAA TTCAAACTCA AAATATGAAA ATTGATCTTA 
4351 ATAACTCTCC CTTCATATCT TTTCACCTAT TTCCAGTCCT TATCATAGTT 
4401 GATAAAAACC TCAGACTCAT CCAGAAAGCT ATATGATGCA CTAGTAAAAA 
4451 AAACAAAGAT ATTTAAACTG CTTGGGTTCA AATGGTATAC AATTTGCCAG 
4501 CTGTTACTGA ACCTTCTATG CATAACTTTT TTTTTCCTCT GTGCAATTGG 
4551 AATAATAAAA ATACTACTCC CATAAAAAAA AAAAAAAAAA AAC 


BLAST Results 


Entry G38474 from database EMBLNEW: 

SHGC-58303 Human Homo sapiens STS genomic, sequence tagged site. 
Score = 2175, P = 1.2e-92, identities = 439/441 


Medline entries 


97476250: 

Beta2-chimaerin is a high affinity receptor for the phorbol ester tumor 
promoters . 


Peptide information for frame 1 


ORF from 661 bp to 2625 bp; peptide length: 655 
Category: similarity to known protein 


1 MPEDRNSGGC PAGALASTPF IPKTTYRRIK RCFSFRKGIF GQKLEDTVRY 
51 EKRYGNRLAP MLVEQCVDFI RQRGLKEEGL FRLPGQANLV KELQDAFDCG 
101 EKPSFDSNTD VHTVASLLKL YLRELPEPVI PYAKYEDFLS CAKLLSKEEE 
151 AGVKELAKQV KSLPVVNYNL LKYICRFLDE VQSYSGVNKM SVQNLATVFG 
201 PNILRPKVED PLTIMEGTVV VQQLMSVMIS KHDCLFPKDA ELQSKPQDGV 
251 SNNNEIQKKA TMGLLQNKEN NNTKDSPSRQ CSWDKSESPQ RSSMNNGSPT 
301 ALSGSKTNSP KNSVHKLDVS RSPPLMVKKN PAFNKGSGIV TNGSFSSSNA 
351 EGLEKTQTTP NGSLQARRSS SLKVSGTKMG THSVQNGTVR MGILNSDTLG 
401 NPTNVRNMSW LPNGYVTLRD NKQKEQAGEL GQHNRLSTYD NVHQQFSMMN 
451 LDDKQSIDSA TWSTSSCEIS LPENSNSCRS STTTCPEQDF FGGNFEDPVL 
501 DGPPQDDLSH PRDYESKSDH RSVGGRSSRA TSSSDNSETF VGNSSSNHSA 
551 LHSLVSSLKQ EMTKQKIEYE SRIKSLEQRN LTLETEMMSL HDELDQERKK 


259 


wo 01/12659 


PCT/IBOO/01496 


601 FTMIEIKMRN AERAKEDAEK RNDMLQKEME QFFSTFGELT VEPRRTERGN 
651 TIWIQ 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKF2phfbr2_62i>ll, frame 1 

SWrSSPROT:Y053_HUMAN HYPOTHETICAL PROTEIN KIAA0053., N = 3, SCOre = 
661, P = 2.4e-89 

TREMBL:HSU90908_1 product: "unknown"; Human clones 23549 and 23762 
mRNA, complete cds., N = 1, Score = 348, P = l.le-29 

PIR:S29128 N-chimerin - rat, N = 1, Score 286, P = 2.8e-24 

PIR:S29956 beta -chime r in - rat, N = 1, Score = 279, P = 1.6e-23 

TREMBL:AB014572_1 gene: "KIAA0672"; product: '•KIAA0672 protein"; Homo 
sapiens mRNA for KIAA0672 protein, complete cds., N = 1, Score = 314, p 
« le-24 

>SWISSPROT:Y053_HUMAN HYPOTHETICAL PROTEIN KIAA0053 . 
Length = 638 

HSPs: 

Score ^ 661 (99.2 bits). Expect = 2.4e-89, Sum P{3) = 2.4e-89 
Identities = 122/209 (58%), Positives - 160/209 (76%) 

Query: 38 GIFGQKLEDTVRYEKRYGNRLAPMLVEQCVDFIRQRGLKEEGLFRLPGQANLVKELQDAF 97 

G+FGQ+L++TV YE+++G L P+LVE+C +FI + G EEG+FRLPGQ NLVK+L+DAF 
Sbjct: 148 GVFGQRLDETVAYEQKFGPHLVPILVEKCAEFILEHGRNEEGIFRLPGQDNLVKQLRDAF 207 

Query: 98 DCGEKPSFDSNTDVHTVASLLKLYLRELPEPVIPYAKYEDFLSCAKLLSKEEEAGVKELA 157 

D GE+PSFD +TDVHTVASLLKLYLR+LPEPV+P+++YE FL C +L + +E +EL 
Sbjct: 208 DAGERPSFDRDTDVHTVASLLKLYLRDLPEPVVPWSQYEGFLLCGQLTNADEAKAQQELM 267 

Query: 158 KQVKSLPVVNYNLLKYICRFLDEVQSYSGVNKMSVQNLATVFGPNILRPKVEDPLTIMEG 217 

KQ+ LP NY+LL YICRFL E+Q VNKMSV NLATV G N++R KVEDP IM G 
Sbjct: 268 KQLSILPRDNYSLLSYICRFLHEIQLNCAVNKMSVDNLATVIGVNLIRSKVEDPAVIMRG 327 

Query: 218 TVVVQQLMSVMISKHDCLFPKDAELQSKP 246 

T +Q+4-M++MI H+ LFPK ++ P 
Sbjct: 328 TPQIQRVMTMMIRDHEVLFPKSKDIPLSP 356 

Score « 210 (31.5 bits). Expect = 2.4e-89, Sum P(3) = 2.4e-89 
Identities = 45/115 (39%), Positives = 73/115 (63%) 

Query: 531 TSSSDNSETFVGNSSSNHSALHSL— VSSLKQEMTKQKIEYESRIKSLEQRNLTLETEM 587 

T +S NSET G +S + SL V L++E+ QK YE +IK+LE+ N + ++ 
Sbjct: 523 TLASPNSETGPGKKNSGEEEIDSLQRMVQELRKEIETQKQMYEEQIKNLEKENYDVWAKV 582 

Query: 588 MSLHDELDQERKKFTMIEIKMRNAERAKEDAEKRNDMLQKEMEQFFSTFGELTVE 642 

+ L++EL++E+KK +EI +RN ER++ED EKRN L++E+++F + E E 
Sbjct: 583 VRLNEELEKEKKKSAALEISLRNMERSREDVEKRNKALEEEVKEFVKSMKEPKTE 637 

Score = 70 (10.5 bits). Expect = 1.2e-74, Sum P(3) = 1.2e-74 
Identities - 28/121 (23%), Positives = 54/121 (44%) 

Query: 528 SRATSSSDNSETFVGNSSSNHSALHSLVSSLKQE-MTKQKIEYESRIKSLEQRNL-TLET 585 

S+ TS+ DN + G+ SAL S K + + E K+ + + +L+ 

Sbjct: 489 SQRTSTYDNVPSLPGSPGEEASALSSQACDSKGDTLASPNSETGPGKKNSGEEEIDSLQR 548 

Query: 586 EMMSLHDELDQERKKFTMIEIKMRNAERAKEDAEKRNDMLQKEMEQFFSTFGELTVEPRR 645 

+ L E++ +++ M E +++N E+ D + L +E+E+ L + R 

Sbjct: 549 MVQELRKEIETQKQ— MYEEQIKNLEKENYDVWAKVVRLNEELEKEKKKSAALEISLRN 605 

Query: 646 TER 648 

ER 

Sbjct: 606 MER 608 

Score « 53 (8.0 bits). Expect - 2,4e-89, Sum P(3) = 2.4e-89 
Identities » 31/111 (27%), Positives - 46/111 (41%) 

Query: 344 SFSSSNAEGLEKTQTTPNGSLQARRSSSLKVSGTKMGTHSVQNG TV— RMGILNSD 397 

SFSS ++ + T T A S KV K G +0+ T+ R L S 

Sbjct: 388 SFSSMTSDS-DTTSPTGQQPSDAFPEDSSKVPREKPGDWKMQSRKRTQTLPNRKCFLTSA 446 
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Query: 398 TLG-NPTNV— RNMSWLPNGyVTLRDNKQKEQAGELGQ— HNRLSTYDNV 442 

G N + + +N W P+ + ++ + +L Q R STYDNV 

Sbjct: 447 FQGANSSKMEIFKNEFWSPSSEAKAGEGHRRTMSQDLRQLSDSQRTSTYDNV 498 

Score ° 53 (8.0 bits). Expect 3.5e-14, Sum P(3) « 3.5e-14 


Identities = 32/125 (25%), Positives = 56/125 (44%) 

Query: 242 LQSKPQDG— VSNNNEIOKKATMGLLQNKEN— NNTKD— -SPSRQCSWDKSESPQRSS 293 

++SK +D + -1-10+ TM ++++ E +KD SP Q + K RSS 

Sbjct: 314 IRSKVEDPAVIMRGTPQIQRVMTM-MIRDHEVLFPKSKDIPLSPPAQKNDPKKAPVARSS 372 

Query: 294 MNNGSPTALSGSKTNSPKNSVHKLDVSRSPPLMVKKNPAFNKGSGIVTNGSFSSSNAEGL 353 

+ + L S+T+S + D+P-f + AF+SV + 

Sbjct: 373 VGWDATEDLRISRTDSFSSMTSDSDTTS— PTGQQPSDAFPEDSSKVPREKPGDWKMQSR 430 

Query: 354 EKTQTTPM 361 

++TQT PN 
Sbjct: 431 KRTQTLPN 438 


Pedant information for DKFZphfbr2_62bll, frame 1 


Report for DKFZphfbr2_62bll .1 

(LENGTH] 655 

(MW] 73394.60 

(pl] 8.13 

[HOMOLJ SWISSPROT:Y053_HUMAN HYPOTHETICAL PROTEIN KIAA0053. 3e-71 

(FUNCAT) 03.07 pheromone response, mating-type determination, sex-specific proteins 

(S. cerevisiae, YPLllSc] le-16 

tFUNCAT] 09.04 biogenesis of cytoskeleton (S- cerevisiae, YPLll5c) le-16 

I FUNCAT 1 03.04 budding, cell polarity and filament formation [S. cerevisiae, YPLllSc) 
le-16 

(FUNCAT] 10.02.09 regulation of g-protein activity (S. cerevisiae, YPLllSc] le-16 

(FUNCAT] 03.22 cell cycle control and mitosis (S. cerevisiae, YER155c] 2e-16 

[FUNCAT] 30.03 organization of cytoplasm (S. cerevisiae, YER155c] 2e-16 

[FUNCAT] 10.99 other signal-transduction activities (S. cerevisiae, YDR379w) 4e-16 

[FUNCAT] 03.10 sporulation and germination (S. cerevisiae, YDL240wl 3e-15 

(FUNCAT] 06.10 assembly of protein complexes (S. cerevisiae, YOR134w] 2e-13 

(FUNCAT] 30.04 organization of cytoskeleton (S. cerevisiae, YOR134wJ 2e-13 

[SCOP] dlrgp 1.83.1.1.1 p50 RhoGAP domain (human (Homo sapiens) 2e-46 

(SCOPJ dlpbwa_ 1.83.1.1.2 p85 alpha subunit RhoGAP domain [human (Horn 6e-37 

(PIRKWJ phosphotransferase 3e-13 

(PIRKWJ breakpoint cluster region 2e-20 

(PIRKW] transmembrane protein 7e-14 

(PIRKWl brain 2e-20 

(PIRKW] alternative splicing 2e-20 

(PIRKW) P-loop 9e-19 

(PIRKW) cytoskeleton le-08 

(SUPFAM] CDC24 homology 7e-21 

(SUPFAM] bcr protein 7e-21 

(SUPFAM) myosin motor domain homology 9e-19 

[SUPFAM] pleckstrin repeat homology 2e-15 

(SUPFAM] LIM metal-binding repeat homology 9e-15 

(SUPFAM] protein kinase C zinc-binding repeat homology 5e-24 

IPROSITE) MYRISTYL 16 

(PROSITEJ CAMP PHOSPHO_SITE 3 

(PROSITE] CK2_PHOSPHO_SITE 15 

(PROSITEJ TYR_PHOSPHO_SITE 2 

(PROSITE] PKC PHOSPHORS I TE 11 

(PROSITE] ASN~GLYCOSYLATION 8 

(KW) Irregular 

IKW] 3D 

tKW] LOW_COMPLEXITY 6.87 % 

[KW) COILED_COIL 12.06 % 


SEQ MPEDRNSGGCPAGALASTPFIPKTTYRRIKRCFSFRKGIFGQKLEDTVRYEKRYGNRLAP 

SEG : 

COILS 

irgp- !!!!!!!!!!!!! !!!!c 


SEQ MLVEQCVDFIRQRGLKEEGLFRLPGQANLVKELQDAFDCGEKPSFDSNTOVHTVASLLKL 

SEG 

COILS 

Irgp- HHHHHHHHHHHHHHTTTTTTTTTCCCHHHHHHHHHHHHHCCCCCGGG 

SEQ YLRELPEPVIPYAKYEDFLSCAKLLSKEEEAGVKELAKQVKSLPWNYNLLKYICRFLDE 

SEG 
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COILS 

Irgp- HHHHTTTTTTTGGGHHHHHH TTTTCGGGHHHHHHHHHHHCCHHHHHHHHHHHHHHHH 

SEQ VQSYSGVNKMSVQNLATVFGPNILRPKVEDPLTIMEGTVVVQQLMSVMISKHDCLFPKDA 

SEG 

COILS 

Irgp- HHHHHHHHCCCHHHHHHHHGGGCC 

SEQ ELQSKPQDGVSNNNEIQKKATMGLLQNKENNNTKDSPSRQCSWDKSESPQRSSMNNGSPT 

SEG 

COILS 

Irgp- 

SEQ ALSGSKTNSPKNSVHKLDVSRSPPLMVKKNPAFNKGSGIVTNGSFSSSNAEGLEKTQTTP 

SEG 

COILS 

Irgp- 

SEQ NGSLQARRSSSLKVSGTKMGTHSVQNGTVRMGILNSDTLGNPTNVRNMSWLPNGYVTLRD 

SEG 

COILS 

Irgp- 

SEQ NKQKEQAGELGQHKRLSTYDNVHQQFSMMNLDDKQSIDSATWSTSSCEISLPENSNSCRS 

SEG xxxxxxx 

COILS 

Irgp- 

SEQ STTTCPEQDFFGGNFEDPVLDGPPQDDLSHPRDYESKSDHRSVGGRSSRATSSSDNSETF 

SEG XXXXX XXXXXXXXXXXXXXXXX . . . 

COILS 

irgp- 

SEQ VGNSSSNHSALHSLVSSLKQEMTKQKIEYESRIKSLEQRNLTLETEMMSLHDELDQERKK 

SEG . .XXXXXXXXXXXXXXXX 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

Irgp- 

SEQ FTMIEIKMRNAERAKEDAEKRNDMLQKEMEQFFSTFGELTVEPRRTERGNTIWIQ 

SEG 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

Irgp- 


Prosite for DKFZphfbr2_62bll. 1 


PSOOOOI 

271 

->275 

ASN 

GLYCOSYLATION 

PDOCOOOOl 

PSOOOOl 

342 

->346 

ASN* 

"glycosylation 

PDOCOOOOl 

PSOOOOI 

361 

->365 

asn' 

GLYCOSYLATION 

PDOCOOOOl 

PSOOOOl 

386- 

->390 

asn" 

"glycosylation 

PDOCOOOOl 

PSOOOOI 

407- 

->411 

asn' 

'glycosylation 

PE>OC00001 

PSOOOOl 

543- 

->547 

asn' 

'glycosylation 

PDOCOOOOl 

PSOOOOl 

547- 

->551 

asn" 

^GLYCOSYLATION 

PDOCOOOOl 

PSOOOOl 

580- 

->584 

asn" 

'glycosylation 

PDOCOOOOl 

PS00004 

258- 

->262 

CAMP PHOSPHO SITE 

PDOC00004 

PS00004 

367- 

->371 

CAMP PHOSPHO SITE 

PDOC00004 

PS00004 

599- 

->603 

CAMP PHOSPHO^SITE 

PDOC00QO4 

PS00005 

25->28 

PKC 

PHOSPHO SITE 

PDOC00005 

PS00005 

34->37 

PKC' 

■pHOSPHO site 

PDOC00005 

PS00005 

47->50 

PKC* 

"PHOSPHO SITE 

PDOC00005 

PS00005 

309- 

->312 

PKC" 

"PHOSPHO SITE 

PDOC00005 

PS00005 

371- 

->374 

PKC" 

"PHOSPHO SITE 

PDOC00005 

PS00005 

388- 

->391 

PKC*" 

"PHOSPHO SITE 

PDOC00005 

PS00005 

417- 

->420 

PKC 

'PHOSPHO SITE 

PDOC00005 

PS00005 

477- 

->480 

PKC" 

'PHOSPHO SITE 

PDOC00005 

PS00005 

527- 

->530 

PKC 

"PHOSPHO SITE 

PDOC00005 

PS00005 

557- 

->560 

PKC' 

'PHOSPHO SITE 

PDOC00005 

PS00005 

646->649 

PKC 

PHOSPHO SITE 

PDOC00005 

PS00006 

107- 

■>111 

CK2" 

'PHOSPHO SITE 

PDOC00006 

PS00006 

146- 

->150 

CK2 

"PHOSPHO SITE 

PDOC00006 

PS00006 

213- 

■>217 

CK2 

"PHOSPHO SITE 

PDOC00006 

PS00006 

230- 

■>234 

CK2" 

'PHOSPHO SITE 

PDOC00006 

PS00006 

348- 

•>352 

CK2" 

'PH0SPH0""SITE 

PDOC00006 

PS00006 

417- 

•>421 

CK2~ 

'PHOSPHO SITE 

PDOC00006 

PS00006 

437- 

•>441 

CK2' 

"PHOSPHO SITE 

PDOC00006 

PS00006 

465- 

•>469 

CK2' 

"PHOSPHO SITE 

PDOC00006 

PS00006 

470- 

•>474 

CK2' 

PHOSPHO SITE 

PDOC00006 

PS00006 

484- 

•>488 

CK2 

■pHOSPHO SITE 

PDOC00006 

PS00006 

516- 

■>520 

CK2" 

'PHOSPHO SITE 

PDOC00006 

PS00006 

532- 

•>536 

CK2 

PHOSPHO SITE 

PDOC00006 
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PS00006 

589->593 


S ITE 

PDOC00005 

PS00006 

602->606 

CK7 PHn<;pHn~ 

SITE 

rUUUUUUUo 

PS00006 

635->639 


SITE 

PDOCOQOOb 

PS00007 

43->51 

•PYR PHO*;PHn'* 

SITE 

n v\r\r^ r\ r\ r\ r\ i 

PS00007 

176->185 

TYR'~PHrjQPHn'' 

_SITE 

fir^/"i/^ Art rtrt T 

PDUCuOUO 7 

PS00008 

8->14 

myrTqtvt " 


nn/\^Art/\/\o 

PS00008 

9->15 

MYRTQTVT 


PDOC00008 

PS00008 

13->19 

MYD T VT 


PDOC00008 

PS00008 

249->255 

MYRISTYL 



PS00008 

263->269 

MYRISTYL 


PDOC00008 

PS00008 

297->303 

MYRISTYL 


PDOC00008 

PS00008 

304->310 

MYRISTYL 


PDOC00008 

PSO00O8 

338->344 

MYRISTYL 


PDOC00008 

PS00008 

343->349 

MYRISTYL 


PDOC00008 

PS00006 

352->358 

MYRISTYL 


PDOC00008 

PS00008 

362->368 

MYRISTYL 


PDOC00008 

PS00008 

376->382 

MYRISTYL 


PDOC00008 

PS00008 

392->398 

MYRISTYL 


PDOC00008 

PS00008 

4O0->406 

MYRISTYL 


PDOC00008 

PS00008 

524->530 

MYRISTYL 


PDOC00008 

PS00008 

542->548 

MYRISTYL 


P0OC00008 


(No Pt«un data available for DKF2phfbr2 62bll.l) 
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DKFZphfbr2_62flO 


group: intracellular transport and trafficking 

DKFZphfbr2_62flO encodes a novel 320 amino acid protein with strong similarity to raararaalian 
zinc transporter proteins. 

The novel proteins is a membrane protein, which should be involved in the transport of Zinc 
across the cell membrane. 

The 2n-T-transporters are membrane proteins that facilitates sequestration of zinc in 
endosomal vesicles. In the brain, ZnT-3 mRNA seems to be involved in the accumulation of zinc 
in synaptic vesicles. Zinc (Zn) is an essential element in normal development and metabolism. 
Recent studies show that in Alzheimer's disease, Zn functions as a double-edged sword, 
affording protection against Alzheimer's amyloid beta peptide (the major component of senile 
plaques) at low concentrations and enhancing toxicity at high concentrations by accelerated 
aggregation of the amyloid beta peptide. 

The new protein can find application in modulation of Zinc transport in neuronal cells, thus 
providing means for a modulation of Alzheimer's amyloid beta peptide plaque formation. 


strong similarity to zinc transporter proteins ; 
membrane regions: 5 

Summary DKF2phfbr2_62f 10 encodes a novel 320 amino acid protein with 
similarity to zinc transporter protein. 

The new protein can find clinical application in modulating 2n2+ 
uptake. 


strong similarity to zinc transporter proteins 

complete cDNA, complete cds, few EST hits 

Sequenced by LMU 

Locus: unknown 

Insert length: 5422 bp 

Poly A stretch at pos. 5397, polyadenylation signal at pos. 5381 


1 GTCTAACTTT GGAAATATCA CCCTCATGCT GTCTTCCCAG GATGTCTCTC 
51 TCCCTAAGTA AGGGATGTTA CTTCCTGGAG GGAATGCAGT GTTGGGAATC 
101 TGAAGACCCA GCTTTGAGCT GAATTTGCTT TGTGATACCT GGAGAGAAGA 
151 CGTGTTTTCT TGACAACAGC ACAGTACCTA GTGAGTTCAA CAACAACGAC 
201 AACAACAGCC GCAGCTCATC CTGGCCGTCA TGGAGTTTCT TGAAAGAGCG 
251 TATCTTGTGA ATGATAAAGC TGCCAAGATG TATGCTTTCA CACTAGAAAG 
301 AAGGAGCTGC AAATGAACAC TTCATAGCAA TGTGGAACTC CAACAGAAAC 
351 CGGTGAATAA AGATCAGTGT CCCAGAGAGA GACCAGAGGA GCTGGAGTCA 
401 GGAGGCATGT ACCACTGCCA CAGTGGCTCC AAGCCCACAG AAAAGGGGGC 
451 GAATGAGTAC GCCTATGCCA AGTGGAAACT CTGTTCTGCT TCAGCAATAT 
501 GCTTCATTTT CATGATTGCA GAGGTCGTGG GTGGGCACAT TGCTGGGAGT 
551 CTTGCTGTTG TCACAGATGC TGCCCACCTC TTAATTGACC TGACCAGTTT 
601 CCTGCTCAGT CTCTTCTCCC TGTGGTTGTC ATCGAAGCCT CCCTCTAAGC 
651 GGCTGACATT TGGATGGCAC CGAGCAGAGA TCCTTGGTGC CCTGCtCTCC 
701 ATCCTGTGCA TCTGGGTGGT GACTGGCGTG CTAGTGTACC TGGCATGTGA 
751 GCGCCTGCTG TATCCTGATT ACCAGATCCA GGCGACTGTG ATGATCATCG 
801 TTTCCAGCTG CGCAGTGGCG GCCAACATTG TACTAACTGT GGTTTTGCAC 
851 CAGAGATGCC TTGGCCACAA TCACAAGGAA GTACAAGCCA ATGCCAGCGT 
901 CAGAGCTGCT TTTGTGCATG CCCCTGGAGA TCTATTTCAG AGTATCAGTG 
951 TGCTAATTAG TGCACTTATT ATCTACTTTA AGCCAGAGTA TAAAATAGCC 
1001 GACCCAATCT GCACATTCAT CTTTTCCATC CTGGTCTTGG CCAGCACCAT 
1051 CACTATCTTA AAGGACTTCT CCATCTTACT CATGGAAGGT GTGCCAAAGA 
1101 GCCTGAATTA CAGTGGTGTG AAAGAGCTTA TTTTAGCAGT CGACGGGGTG 
1151 CTGTCTGTGC ACTGCCTGCA CATCTGGTCT CTAACAATGA ATCAAGTAAT 
1201 TCTCTCAGCT CATGTTGCTA CAGCAGCCAG CCGGGACAGC CAAGTGGTTC 
1251 GGAGAGAAAT TGCTAAAGCC CTTAGCAAAA GCTTTACGAT GCACTCACTC 
1301 ACCATTCAGA TGGAATCTCC AGTTGACCAG GACCCCGACT GCCTTTTCTG 
1351 TGAAGACCCC TGTGACTAGC TCAGTCACAC CGTCAGTTTC CCAAATTTGA 
1401 CAGGCCACCT TCAAACATGC TGCTATGCAA TTTCTGCATC ATAGAAAATA 
1451 AGGAACCAAA GGAAGAAATT CATGTCATGG TGCAATGCAT ATTTTATCTA 
1501 TTTATTTAGT TCCATTCACC ATGAAGGAAG AGGCACTGAG ATCCATCAAT 
1551 CAATTGGATT ATATACTGAT CAGTAGCTGT GTTCAATTGC AGGAATGTGT 
1601 ATATAGATTA TTCCTGAGTG GAGCCGAAGT AACAGCTGTT TGTAACTATC 
1651 GGCAATACCA AATTCATCTC CCTTCCAATA ATGCATCTTG AGAACACATA 
1701 GGTAAATTTG AACTCAGGAA AGTCTTACTA GAAATCAGTG GAAGGGACAA 
17 51 ATAGTCACAA AATTTTACCA AAACATTAGA AACAAAAAAT AAGGAGAGCC 
1801 AAGTCAGGAA TAAAAGTGAC TCTGTATGCT AACGCCACAT TAGAACTTGG 
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1851 TTCTCTCACC AAGCTGTAAT GTGATTTTTT TTTCTACTCT GAATTGGAAA 
1901 TATGTATGAA TATACAGAGA AGTGCTTACA ACTAATTTTT ATTTACTTGT 
1951 CACATTTTGG CAATAAATCC CTCTTATTTC TAAATTCTAA CTTGTTTATT 
2001 TCAAAACTTT ATATAATCAC TGTTCAA7VAG GAAATATTTT CACCTACCAG 
2051 AGTGCTTAAA CACTGGCACC AGCCAAAGAA TGTGGTTGTA GAGACCCAGA 
2101 AGTCTTCAAG AACAGCCGAC AAAAACATTC GAGTTGACCC CACCAAGTTG 
2151 TTGCCACAGA TAATTTAGAT ATTTACCTGC AAGAAGGAAT AAAGCAGATG 
2201 CAACCAATTC ATTCAGTCCA CGAGCATGAT GTGAGCACTG CTTTGTGCTA 
2251 GACATTGGGC TTAGCACTGA AACTATAAAG AGGAATCAGA CGCAGCAAGT 
2301 GCTTCTGTGT TCTGGTAGCA ACTCAACACT ATCTGTGGAG AGTAAACTGA 
2351 AGATGTGCAG GCCAACATTC TGGAAATCCT ATGTCAGTGG GTTTGGTTTG 
2401 GAACCTGGAC TTCTGCATTT TTAAAAGTTA CCCAGAGATG CTTCTAAAGA 
24 51 TGAGCCATAG TCTAGAAGAT TGTCAACCAC AGGAGTTCAT TGAGTGGGAC 
2501 AGCTAGACAC ATACATTGGC AGTTACAATA GTATCATGAA TTGCAATGAT 
2551 GTAGTGGGGT ATAAAAGGAA AGCGATGGAT ATTGCCGGAT GGGCATGGCC 
2601 AGTGATGTTT CACGTCATTG AGGTGACAGC TCTGCTGGAC TTTGAATTAC 
2651 ATATGGAGGC TCTCCAGGAA GACGAAGAAG AGAAGGACAT TCTAGGCAAA 
2701 AAGAAGACTA GGCACAAGGC ACACTTATGT TTGTCTGTTA GCTTTTAGTT 
27 51 GAAAAAGCAA AATACATGAT GCAAAGAAAC CTCTCCACGC TGTGATTTTT 
2801 AAAACTACAT ACTTTTTGCA ACTTTATGGT TATGAGTATT GTAGAGAACA 
2851 GGAGATAGGT CTTAGATGAT TTTTATGTTG TTGTCAGACT CTAGCAAGGT 
2901 ACTAGAAACC TAGCAGGCAT TAATAATTGT TGAGGCAATG ACTCTGAGGC 
2951 TATATCTGGG CCTTGTCATT ATTTATCATT TATATTTGTA TTTTTTTCTG 
3001 AAATTTGAGG GCCAAGAAAA CATTGACTTT GACTGAGGAG GTCACATCTG 
3051 TGCCATCTCT GCAAATCAAT CAGCACCACT GAAATAACTA CTTAGCATTC 
3101 TGCTGAGCTT TCCCTGCTCA GTAGAGACAA ATATACTCAT CCCCCACCTC 
3151 AGTGAGCTTG TTTAGGCAAC CAGGATTAGA GCTGCTCAGG TTCCCAACGT 
3201 CTCCTGCCAC ATCGGGTTCT CAAAATGGAA AGAATGGTTT ATGCCAAATC 
3251 ACTTTTCCTG TCTGAAGGAC CACTGAATGG TTTTGTTTTT CCATATTTTG 
3301 CATAGGACGC CCTTUVAGACT AGGTGACTTG GCAAACACAC AAGTGTTAGT 
3351 ATAATTCTTT GCTTCTGCTT CTTTTTGAAA ATCATGTTTA GATTTGATTT 
3401 TAAGTCAGAA ATTCACTGAA TGTCAGGTAA TCATTATGGA GGGAGATTTG 
3451 TGTGTCAACC AAAGTAATTG TCCCATGGCC CCAGGGTATT TCTGTTGTTT 
3501 CCCTGAAATT CTGCTTTTTT AGTCAGCTAG ATTGAAAACT CTGAACAGTA 
3551 GATGTTTATA TGGCAAAATG CAAGACAATC TATAAGGGAG ATTTTAAGGA 
3601 TTTTGAGATG AAAAAACAGA TGCTACTCAG GGGCTTTATG GACCATCCAT 
3651 CAATTCTGAA GTTCTGACTC TCCCATTACC CTTTCCCTGG TGTGGTCAGA 
3701 ACTCCAGGTC ACTGGAAGTT AGTGGAATCA TGTAGTTGAA TTCTTTACTT 
3751 CAAGACATTG TATTCTCTCC AGCTATCAAA ACATTAATGA TCTTTTATGT 
3801 CTTTTTTTTG TTATTGTTAT ACTTTAAGTT CTGGGGTACA TGTGCGGAAC 
3851 ATGTAGGTTT GTTACATAGG TATACATGTG CCATGGTGGT TTGCTGCACT 
3901 CATCAACCTG TCATCTACAT TCTTTTATGT CTGTCTTTCA AAGCAACACT 
3951 CTGTTCTTCT GAGTAGTGAA ATCAGGTCAA CTTTACCACC AGCCTCCATT 
4001 TTTAATATGC TTCACCATCA TCCAGCACCT ACTTAAGATT TATCTAGGGC 
4051 TCTGTGGTGA TGTTAGGACC CATAAAAGAA ATTTATGCCT TCCATATGTT 
4101 TGGTTACAGA TGGGAAATGG GAATGTTGAA GGACATGAAA GAAAGGATGT 
4151 TTACACATTA AGCATCAGTT CTGAAGCTAG ATTGTCTGAG TTTGAATCTT 
4201 AGCTCTTCCC TTTATTAGCT CTGTGACCTC GAGCTAGTTA CTTAAATGCT 
4251 CTGATCCTCT ATTTCCTGAT CAGTGAAACC TCCCTATTCA AATGTGTGAG 
4301 AGTTTAATAA ATTAGGACAC TTAAAAATGT TGGAGCAGTG CATAGCATGT 
4 351 AGTGTTCAGT ACATGTTAAA TGTTGTTTTT TATTATGTAC AAACATGTGT 
4401 GGGCACAGAA TTTTAAATCA TCTCAACTTT TGAGAAATTT TGAGTTATCA 
4451 ACACCGTTCC CACAAGACAG TGGCAAAATT ATTGGTGAGA ATTAAACAGC 
4501 TGTTTCTCAG AGGAAGCAAT GGAGGCTTGC TGGGATAAAG GCATTTACTG 
4 551 AGAGGCTGTT ACCTAGTGAG AGTGATGAAT TAATTAAAAT AGTCGAATCC 
4 601 CTTTCTGACT GTCTCTGAAA GCTTCCGCTT TTATCTTTGA AGAGCAGAAT 
4 651 TGTCACCCCA AGGACATTTA TTAATAAAAA GAACAACTGT CCAGTGCAAT 
4701 GAAGGCAAAG TCATAGGTCT CCCAAGTCTT ACCCCATTCC TGTGAAATAT 
4751 CAAGTTCTTG GCTTTTCTCT GTCATGTAGC CTCAACTTTC TCCGACCGGG 
4801 TGCATTTCTT TCTCTGGTTT CTAAATTGCC AGTGGCAAAT TTGGATCACT 
4851 TACTTAATAT CTGTTAAATT TTGTGACCCA ACAAAGTCTT TTAGCACTGT 
4901 GGTGTCAAAA AGAAAAACAC CTCCCAGGCA TATACATTTT ATAGATTCCT 
4 951 GGAGAATGTT GCTCTCCAGC TCCATCCCCA CCCAATGAAA TATGATCCAG 
5001 AGAGTCTTGC AAAGAGACAA GCCTCATTTT CCACAATTAG CTCTAAAGTG 
5051 CCTCCAGGAA ATGATTTTCT CAGCTCATCT CTCTGTATTC CCTGTTTTGG 
5101 ATCACAGGGC AATCTGTTTA AATGACTAAT TACAGAAATC ATTAAAGGCA 
5151 CCAAGCAAAT GTCATCTCTG AATACACACA TCCCAAGCTT TACAAATCCT 
5201 GCCTGGCTTG ACAGTGATGA GGCCACTTAA CAGTCCAGCG CAGGCGGATG 
5251 TTAAAAAAAA TAAAAAGGTG ACCATCTGCG GTTTAGTTTT TTT^CTTTCT 
5301 GATTTCACAC TTAACGTCTG TCATTCTGTT ACTGGGCACC TGTTTAAATT 
5351 CTATTTTAAA ATGTTAATGA GTGTTGTTTA AAATAAAATC AGGAAAGAGA 
5401 GAAAAAAAAA AAAAAAAAAA AC 


BLAST Results 


Ko BLAST result 


Medline entries 
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97121493: 

ZnT-3, a putative transporter of zinc into synaptic vesicles. 
96203098: 

ZnT-2, a mammalian protein that confers resistance to zinc by 
facilitating vesicular 
sequestration. 


Peptide information for frame 2 


ORF from 407 bp to 1366 bp; peptide length: 320 
Category: strong similarity to known protein 


1 MYHCHSGSKP TEKGANEYAY AKWKLCSASA ICFIFMIAEV VGGHIAGSLA 
51 VVTDAAHLLI DLTSFLLSLF SLWLSSKPPS KRLTFGWHRA EILGALLSIL 
101 CIWWTGVLV YLACERLLYP DYQIQATVMI IVSSCAVAAN IVLTWLHQR 
151 CLGHNHKEVQ ANASVRAAFV HAPGDLFQSI SVLISALIIY FKPEYKIADP 
201 ICTFIFSILV LASTITILKD FSILLMEGVP KSLNYSGVKE LILAVDGVLS 
251 VHCLHIWSLT MNQVILSAHV ATAASRDSQV VRREIAKALS KSFTMHSLTI 
301 QMESPVDQDP DCLFCEDPCD 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_62f 10, frame 2 

PIR:S70632 zinc transporter ZnT-2 - rat, N = 1, Score = 884, P « 
1.5e-88 

TREMBL:MMU76007_1 gene: "ZnT-3"; product: "ZnT-S"; Mus musculus zinc 
transporter ZnT-3 (ZnT-3) mRNA, complete cds., N « 1, Score = 772, P «= 
1. le-76 

TREMBL:HSU76010_1 gene: "ZnT-3-; product: -ZnT-3-; Human putative zinc 
transporter ZnT-3 (2nT-3) mRNA, complete cds., N = 1, Score - 742, P 
1. 6e-73 

TREMBL:MMUZNT02_1 gene: "ZnT-3"; product: "zinc transporter"; Mus 
musculus zinc transporter (2nT-3) gene, con^lete cds., N « 1, Score = 
715, P - 1.2e-70 

TREMBL:CET18D3_3 gene: "T18D3.3"; Caenorhabditis elegans cosmid T18D3, 
N = 1, Score = 699, P = 5.9e-69 


>PIR:S70632 zinc transporter ZnT-2 - rat 
Length = 359 

HSPs: 

Score = 884 (132.6 bits). Expect = 1.5e-88, P = 1.5e-88 
Identities ^ 171/326 (52%), Positives * 230/326 (70%) 

Query: 2 YHCHSGSKPTEKGANEYAYAKWKLCSASAICFIFMIAEVVGGHIAGSLAWTDAAHLLID 61 

+ +CH+ +E A+ KL ASAIC +FMI E++GG++A SLA++TDAAHLL D 

Sbjct: 34 HYCHAQKDSGSHPNSEKQRARRKLYVASAICLVFMIGEIIGGYLAQSLAIMTDAAHLLTD 93 

Query: 62 LTSFLLSLFSLWLSSKPPSKRLTFGWHRAEILGALLSILCIWVVTGVLVYLACERLLYPD 121 

S L+SLFSLW+SS+P +K + FGW RAEILGALLS+L IWVVTGVLVYLA +RL+ D 
Sbjct: 94 FASMLISLFSLWVSSRPATKTMNFGWQRAEILGALLSVLSIWWTGVLVYLAVQRLISGD 153 

Query: 122 YQIQATVMIIVSSCAVAANIVLTWLHQRCLGHNH KEVQANASVRAAFVHAPG 174 

Y+I+ M+I S CAVA NI++ + LHQ GH+H + Q N SVRAAF+H G 

Sbjct: 154 YEIKGDTMLITSGCAVAVNIIMGLALHQSGHGHSHGHSHEDSSQQQQNPSVRAAFIHWG 213 

Query: 175 DLFQSISVLISALIIYFKPEYKIADPICTFIFSILVLASTITILKDFSILLMEGVPKSLN 234 

DL QS+ VL++A IIYFKPEYK DPICTF+FSILVL +T+TIL+D ++LMEG PK ++ 
Sbjct: 214 DLLQSVGVLVAAYIIYFKPEYKYVDPICTFLFSILVLGTTLTILRDVILVLMEGTPKGVD 273 

Query: 235 YSGVKELILAVDGVLSVHCLHIWSLTMNQVILSAHVATAASRDSQVVRREIAKALSKSFT 294 

++ VK L+L+VDGV ++H LHIW+LT+ Q +LS H+A A + D+0 V + L F 

Sbjct: 274 FTTVKNLLLSVDGVEALHSLHIWALTVAQPVLSVHIAIAQNVDAQAVLKVARDRLQGKFN 333 


266 


wo 01/12659 


PCT/IBOO/01496 


Query: 295 MHSLTIQMESPVDQ0PIX;LFCEDPCD 320 

H++TIQ+ES + C C+ P + 
Sbjct: 334 FHTMTIQIESYSEDMKSCQECQGPSE 359 


Pedant information for DKFZphfbr2_62f 10, frame 2 


Report for DKr2phfbr2_62f 10.2 


[LENGTH] 

[MW] 

[pl] 

[HOMOL] 

[FUNCATJ 

[FUNCATI 

IFUNCAT] 

(FUNCATJ 

[FUNCATJ 

2e-16 

[FUNCATJ 

[FUNCATJ 

I FUNCATJ 

[PIRKWJ 

[PIRKWJ 

[PIRKWJ 

[PIRKWJ 

[SUPFAMl 

[SUPFAM] 

[PROSITEJ 

[PROSITEJ 

[PROSITEJ 

[PROSITEJ 

[PROSITEJ 

(PROSITEJ 

(PROSITEJ 

[KWJ 

IKWJ 


320 

35053.51 
6.48 

PIR:S70632 zinc transporter ZnT-2 - rat 3e-84 

30.02 organization of plasma membrane (S. cerevisiae, YMR243cJ 2e-16 

13.01 homeostasis of metal ions {S. cerevisiae, yMR243cl 2e-16 

08.19 cellular import (S. cerevisiae, YMR243cJ 2e-16 , 
11.07 detoxificaton [S. cerevisiae, YMR243ci 2e-16 

07.04.01 metal ion transporters (cu, fe, etc.) (S. cerevisiae, YMR243cJ 

08.04 mitochondrial transport [S. cerevisiae, YOR316cl 3e-13 

30.16 mitochondrial organization (S. .cerevisiae, YOR316ci 3e-13 

99 unclassified proteins [S. cerevisiae, yDR205wl 4e-07 

transmembrane protein 2e-30 

mitochondrial inner membrane 6e-12 

mitochondrion 6e-12 

membrane protein le~ll 

zinc transporter ZnT-2 2e-30 

membrane protein czcD le-11 

MYRISTYL 4 

1 
1 
1 

1 
4 

2 


CAMP PHOSPHORS I TE 
CK2_PHOSPH0 SITE 
PROKAR_LI POPROTEI N 

T YR^PHOS PHO_S ITE 
PKC_PHOSPH0_SITE 
ASN_GLYCOSYLATION 
TRANSMEMBRANE 5 
LOW COMPLEXITY 


8.12 % 


SEQ 
SEG 
PRO 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 


MYHCHSGSKPTEKGANEYAYAKWKLCSASAICFIFMIAEWGGHIAGSLAVVTDAAHLLI 

XXX 

cccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhh 
MMMM^dMMMMMMMMM^mMMMMMMMMMMMMMMMMMMMM 

DLTS FLLS LFS LWLS SKPPSKRLT FGWHRAEI LGALLS I LCI WVVTGVLVYLACERLLYP 

xxxxxxxxxxxxxxxxxxxxxxx . 

hhhhhhhhhhhhhhhccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhc 
MMMMMMMMMMMMM MMMMMMMMMMMMMMMFfl^MMMMMMMMMMMM 

DYQIQATVMIIVSSCAVAANIVLTWLHQRCLGHNHKEVQANASVRAAFVHAPGDLFQSl 

cccccccceeeehhhhhhhhhhhhhhhhhcccccccccccccchhhhhhhhhhhhhchhh 
MMMMMMMMMMMMMMMMMNMMMMMMM MMMMMMMMMMMMMMMMMMMMM . . . 

SVLISALI I YFKPEYKIADPICTFI FS ILVLASTITI LKDFS I LLMEGVPKSLN YSGVKE 

hhhhhhhhhhcccceeeccchhhhhhhhhhhhhchhhhhhhheeeeeccccccchhhhhh 

LILAVDGVLSVHCLHIWSLTMNQVILSAHVATAASRDSQWRREIAKALSKSFTMHSLTI 
hhhhhhceeecccceeeeeccchhhhheeeeeccccchhhhhhhhhhhhhhhhcccccee 


SEQ 
SEG 
PRD 
MEM 


QMESPVDQDPDCLFCEDPCD 
eeeccccccccccccccccc 


PSOOOOl 
PSOOOOl 
PS00004 
PS00005 
PS00005 


Prosite for DKFZphfbr2_62fl0.2 


162->166 
234->238 
81->85 
11->14 
75->78 


ASN_GLYCOSYLATION 
ASN_GLyCOSYLATION 
CAMP_PHOSPHO_SITE 
PKC PHOSPHO^SITE 
PKC'PHOSPHO SITE 


PDOCOOOOl 
PDOCOOOOl 
PDOC00004 
PDOC00005 
PDOC00005 
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PS00005 

80->83 

PKC_PHOS PHO_S ITE 

PDOC00005 

PS00005 

1 64->167 

PKC_PHOSPHO_SITE 

PDOC00005 

PS00006 

Sm »J V/ V/ w V/ W 


CK7 PHr><;PHn cjttf* 

trnyjjtrtiyj OXI Cj 


PS00007 

13->21 

TYR PHOSPHO SITE 

PDOC00007 

PS00008 

7->13 

MYRISTYL 

PDOC00008 

PS00008 

42->48 

MYRISTYL 

PDOC00008 

PS00008 

94->100 

MYRISTYL 

PDOC00008 

PS00008 

228->234 

MYRISTYL 

PDOC0O008 

PS00013 

125->136 

PROKAR LIPOPROTEIN 

PDOC00013 


(No Pfam data available for DKF2phfbr2_62£10. 2) 
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DKFZphfbr2_62nlO 


group: brain derived 

DKFZphfbr2_62nlO encodes a novel 541 amino acid protein with similarity to 
Plasinodium*"vivax reticulocyte-bindlng protein 1. 

The novel protein contains one Leucine Zipper, involved in protein-protein-interaction. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 


similarity to reticulocyte-binding protein 
complete cDNA, complete cds, EST hits 
Sequenced by LMU 
Locus: /raap="13'* 
Insert length: 3522 bp 

Poly A stretch at pos. 3503, polyadenylation signal at pos. 3479 


1 GGGGCGTGTT GGCGGGATTC TGAACGCTGC CATGGCTCAG ACCGTGTAGA 
51 ATGTTACATT GTCGCTCACT CTGCCCATCA CGTGCCACAT TTGCTTGGGG 
101 AAGGTACGTC AGCCTGTCAT ATGCATCAAC AACCATGTAT TTTGTTCGAT 
151 TTGTATTGAT TTGTGGTTGA AGAATAATAG CCAGTGTCCA GCTTGCAGAG 
201 TCCCCATCAC TCCTGAAAAT CCTTGCAAAG AAATTATAGG AGGAACAAGT 
251 GAAAGTGAAC CTATGCTAAG CCATACGGTC AGGAAGCATC TTCGGAAAAC 
301 TAGACTTGAA TTACTACACA AAGAATATGA GGACGAAATA GATTGTTTAC 
351 AGAAAGAAGT AGAAGAGCTT AAGAGTAAAA ATCTCAGCTT GGAGTCACAG 
401 ATCAAAGCTA TTCTGGATCC TTTAACCTTG GTGCAGGGCA ACCAAAATGA 
451 AGACAAACAT CTAGTCACAG ATAATCCAAG TATAATTAAC CCAGAAACTG 
501 TAGCAGAGTG GAAGAAAAAA CTCAGAACAG CTAATGAAAT CTATGAAAAA 
551 GTGAAAGATG ATGTGGATAA GCTAAAGGAG GCAAATAAAA AATTGAAATT 
601 GGAAAATGGT GGTCTGGTGA GGGAGAATTT ACGACTGAAG GCTGTU^GTTG 
651 ATAACAGATC ACCTCAAAAG TTTGGAAGGT TTGCAGTTGC TGCTCTTCAG 
701 TCCAAAGTAG AACAGTATGA GCGTGAAACC AATCGCCTCA AGAAAGCCCT 
751 GGAACGAAGT GATAAGTATA TAGAGGAACT AGAATCTCAA GTTGCACAGC 
801 TAAAAAATTC AAGTGAAGAG AAAGAGGCTA TGAATTCCAT TTGCCAGACA 
851 GCACTTTCTG CAGATGGCAA AGGGAGCAAA GGCAGTGAGG AGGATGTGGT 
901 GTCAAAGAAT CAAGGCGATA GTGCCAGAAA GCAGCCTGGC TCATCCACCT 
951 CCAGTTCTTC TCACCTAGCG AAGCCTTCCA GCAGCAGACT GTGTGACACC 
1001 AGTTCTGCAA GGCAGGAAAG TACCAGCAAA GCAGACCTTA ACTGTTCTAA 
1051 GAACAAAGAC CTATATCAAG AACAGGTAGA AGTAATGTTA GATGTGACAG 
1101 ATACAAGTAT GGATACTTAT TTGGAAAGAG AATGGGGG7VA TAAACCAAGT 
1151 GACTGTGTAC CCTACAAAGA TGAAGAACTT TATGATTTTC CAGCTCCTTG 
1201 TACTCCTTTG TCCCTTAGTT GCCTTCAGCT CAGTACTCCA GAAAATAGAG 
1251 AGAGCTCTGT GGTCCAAGCA GGAGGTTCCA AAAAGCACTC AAACCATCTC 
1301 AGAAAATTGG TGTTTGATGA TTTTTGTGAT TCTTCAAATG TTTCTAATAA 
1351 AGATTCTTCA GAAGATGATA TAAGTAGAAG TGAAAATGAG AAGAAATCAG 
1401 AATGTTTTTC TTCCACAAAG ACAGGATTTT GGGACTGTTG TTCCACAAGC 
14 51 TATGCCCAAA ACTTAGATTT TGAAAGTTCA GAGGGGAACA CGATAGCAAA 
1501 TTCTGTTGGA GAAATATCTT CAAAATTGAG TGAGAAATCA GGCTTATGTT 
1551 TATCCAAAAG GTTGAATTCT ATTCGCTCTT TTGAAATGAA CCGGACAAGA 
1601 ACATCCAGTG AAGCATCGAT GGATGCTGCT TACCTTGACA AAATCTCTGA 
1651 GTTGGATTCA ATGATGTCAG AGTCAGACAA CAGCAAGAGC CCTTGTAATA 
1701 ACGGTTTTAA GTCACTGGAT TTGGATGGGT TATCAAAGTC ATCTCAAGGC 
1751 AGTGAATTTC TTGAGGAACC TGATAAGTTG GAAGAAAAAA CTGAGCTAAA 
1801 CCTTTCCAAA GGTTCTCTAA CTAATGATCA GTTAGAAAAT GGAAGTGAAT 
1851 GGAAACCCAC TTCTTTTTTT TCTCCTCTCT CCATCTGACC AAGAAATGAA 
1901 TGAAGATTTT TCACTCCATT CCAGTTCTTG TCCAGTAACT AATGAAATCA 
1951 AACCCCCAAG CTGCTTGTTT CAGACAGAGT TTTCCCAGGG CATTTTGTTA 
2001 AGCAGTTCAC ATCGACTATT GGAAGATCAA AGATTTGGGT CATCTTTGTT 
2051 TAAGATGTCC TCAGAGATGC ACAGTCTTCA TAACCACCTT CAGTCTCCTT 
2101 GGTCTACTTC CTTTGTGCCT GAAAAGAGGA ATAAAAATGT GAATCAATCA 
2151 ACAAAAAGAA AAATCCAGAG CAGCCTTTCC AGTGCCAGCC CATCAAAAGC 
2201 AACTAAAAGT TGACTCATTA GAAAGGTGTC ATTTGTGGTT TTGTCCTGAG 
2251 AGAAATAGAA AAGTTGTTAA AGTTACCTTT TTTCCTCATA AAAGTTCTAT 
2301 ACAAATTGGA ATTGATAATC TTTAGTCAAG TATCAAGTCA GGATGGTGGA 
2351 TTAACCTGTA CCCAGAATAC TTATTGTTCA TTTTGAAAAG ACTTTGTTCT 
2401 TTTCATTTTT ATTTGGGAGT CTTTGTGACC AGAGAAGTTA GGGAGGAGGT 
2451 TATTTTTGTG TTTTGGGGTT GGTTGGTTGG TTGGTTTTGT TTTTGGTTTT 
2501 GTTTTTTTAC TGAATTTGAT ATGTATCTCG GTTGGATATA CATTGTTTTT 
2551 TTAAAAAATG TTATTTAACT GTTAGATACA GTGGCCTGTT GATAAGCCCC 
2601 ACTTGTCTTC AGAACTTGGA TTTCTTAAAT AAAACTTTTA GTGTTGTCTA 
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2651 TACACTGCTC AATAAGACAC TTGAGTTTAA GCTTTTCCCA GGGTGGAAAT 
2701 TATTTTACCT GTCCCTTTTT ATTTATGTTT AGTGATGGCC TAGTTTTTCT 
2751 GCAGGGCCAT GATGGAGAAA TAGCACTCTA GCCTTAGTCC AATATTGATT 
2801 TACTTTCTTT TTTTAGGTTT TATGTATATG TTTGCATTTT TTAGCATTGT 
2851 GTTTTGTCCA GTTTTGTGAA AATGTTCTGC TAGTATGAAA GAAAACATTT 
2901 TCTATATGAA GACATTTGTT TTATGTTAGG TAGCTTACAT TTTCTCCTCT 
2951 GCGTGTGTGT GTATGTGTGT AAAATCAGAA ATTTAGCATA CTATGGAAAG 
3001 AAGGCATGGA GCACTTGGGT TTAGAGGAAC CTAAAACATC ATAGCTTCAT 
3051 TGTTCCAGAT GTAACAGGTT TGAAAGAGCT CATCGCCAAG TTCTTGATCC 
3101 ACTTGCATTC CAGGGGAGTT CTCTTTTGAG TAGTATGTTT CTTGTTTGCA 
3151 TGTTCCTGTT CTTTGTGGAA ACTATGCATG GTAGCATTTT TGCTTGCTGT 
3201 GTTTTCCATA CTTAAGAAAA AGAGGTTTCA GTTGGCTGAT AGAATATCTT 
3251 TTATGTAGGA CAAAACTTTT CTGTGAAGAG TGTTGAGGGG GTGAAGATAG 
3301 GTAAGAGGTA AGCACAATTT TTAATTTAGG CTCTGAAAAA GTGTATTGTT 
3351 CTAAACGTAT TTGGTATGCC TATATAGGTC TTTAAAAATG GGTTTGTATG 
34 01 CTGTTTAATG TGCACTGAAC ATTTTACATT AATATTGTAC TGTTTTACAT 
3451 TAATACTGCA TGCTTTTCTA TGTGAATTGA ATAAAGAATG TCATAAGCAC 
3501 TGGAAAAAAA AAAAAAAAAA AA 


BLAST Results 


Entry HS658254 from database EMBL: 
human STS SHGC-11774. 
Score = 1643, P = 8.0e-67, identities = 345/355 

Entry HS513217 from database EMBL: 
human STS SHGC-14656. 
Score = 1193, P = 5.8e-46, identities = 241/244 


Medline entries 


No Medline entry 


Peptide information for frame 2 


ORF from 263 bp to 1885 bp; peptide length: 541 
Category: similarity to known protein 


1 MLSHTVRKHL RKTRLELLHK EYEDEIDCLQ KEVEELKSKN LSLESQIKAI 
51 LDPLTLVQGN QNEDKHLVTD NPSIINPETV AEWKKKLRTA NEIYEKVKDD 
101 VDKLKEANKK LKLENGGLVR ENLRLKAEVD NRSPQKFGRF AVAALQSKVE 
151 QYERETNRLK KALERSDKYI EELESQVAQL KNSSEEKEAM NSICQTALSA 
201 DGKGSKGSEE DVVSKNQGDS ARKQPGSSTS SSSHLAKPSS SRLCDTSSAR 
251 QESTSKADLN CSKNKDLYQE QVEVMLDVTD TSMDTYLERE WGNKPSDCVP 
301 YKDEELYDFP APCTPLSLSC LQLSTPENRE SSVVQAGGSK KHSNHLRKLV 
351 FDDFCDSSNV SNKDSSEDDI SRSENEKKSE CFSSTKTGFW DCCSTSYAQN 
401 LDFESSEGNT lANSVGEISS KLSEKSGLCL SKRLNSIRSF EMNRTRTSSE 
451 ASMDAAYLDK ISELDSMMSE SDNSKSPCNN GFKSLDLDGL SKSSQGSEFL 
501 EEPDKLEEKT ELNLSKGSLT NDQLENGSEW KPTSFFSPLS I 

BLASTP hits 

Entry A42771 from database PIR: 

reticulocyte-binding protein 1 - Plasmodium vivax 

Score = 127, P «= 3.7e-08, identities = 68/300, positives = 145/300 

Entry RBP1_PLAVB from database SWISSPROT: 
RETICULOCYTE BINDING PROTEIN 1 PRECURSOR. 

Score = 127, P = 3.9e-08, identities 68/300, positives » 145/300 
Entry MMDSPPG_1 from database TREMBL: 

gene: "DSPP"; product: "dentin sialophosphoprotein"; Mus musculus DSPP 
gene 

Score - 160, P = 5.2e-08, identities = 87/373, positives = 146/373 


Alert BLASTP hits for DKF2phfbr2_62nl0, frame 2 
No Alert BLASTP hits found 
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Pedant information for DKFZphfbr2_62nlO, frame 2 
Report for DKFZphfbr2_62nl0.2 

(LENGTH] 541 

(MWJ 60533.06 

fpl) 5.10 

(FUNCAT} 04.99 other transcription activities IS. cerevisiae, yKR092cI 3e-05 

(FUNCATl 30.10 nuclear organization [S. cerevisiae, YKR092cl 3e-05 

[PROSITE] LEUCINE_ZIPPER 1 

(PROSITE) MYRISTYL 7 

[PROSITE] CAMP_PHOSPHO_SITE 1 

(PROSITE J CK2_PH0SPH0_SITE 18 

[PROSITE] PROKAR_LIPOPROTEIN 1 

[PROSITE] TYR_PHOSPHO_SITE 1 

[PROSITE] PKC_PHOSPHO_SITE 14 

[PROSITE] ASN_GLYCOSYLATION 7 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 9.24 % 

[KW] COILED COIL 22.55 % 


SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 

SEG 
PRD 
COILS 


MLSHTVRKHLRKTRLELLHKEYEDEIDCLQKEVEELKSKNLSLESQIKAILDPLTLVQGN 

ccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhcccccccccc 
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

QNEDKHLVTDNPSIINPETVAEWKKKLRTANEIYEKVKDDVDKLKEANKKLKLENGGLVR 

xxxxxxxxxxxxxxxxxxxx ..... 

cccceeeeeccccccccchhhhhhhhhhhhhhhhhhhhchhhhhhhhhhhhhhcccceee 
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

ENLRLKAEVDNRSPQKFGRFAVAALQSKVEQYERETNRLKKALERSDKYIEELESQVAQL 

ehhhhhhhhcccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh^^ 
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

KNSSEEKEAMNSICQTALSADGKGSKGSEEDVVSKNQGDSARKQPGSSTSSSSHLAKPSS 

• * xxxxxxxxxxxxxx 

hcchhhhhhhhhhhhhhhccccccccccceeeeecccccccccccccccccccccccccc 

CCCCCC 


SEQ 
SEG 
PRD 
COILS 


SRLCDTSSARQESTSKADLNCSKNKDLYQEQVEVMLDVTDTSMDTYLEREWGNKPSDCVP 
X 

ccccccccccccccccccccccccchhhhhhhhhcccccccccchhhhhhhccccccccc 


SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 


YKDEELYDFPAPCTPLSLSCLQLSTPEbJRESSVVQAGGSKKHSNHLRKLVFDDFCDSSNV 
cccccccccccccccccceeeecccccccceeeeeccccccccccccccccccccccccc 

SNKDSSEDDISRSENEKKSECFSSTKTGFWDCCSTSYAQNLDFESSEGNTIANSVGEISS 
cccccccchhhlihccccccccccccccccccccccccccccccccccccccccccccccc 


SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEO 
SEG 
PRD 
COILS 


KLSEKSGLCLSKRLNSIRSFEMNRTRTSSEASMDAAYLDKISELDSMMSESDNSKSPCNN 
ccccccccchhhhhcccccccccccchhhhhhhhhhhhhhhhhccccccccccccccccc 

GFKSLDLDGLSKSSQGSEFLEEPDKLEEKTELNLSKGSLTNDQLENGSEWKPTSFFSPLS 

. . XXXXXXXXXXXXXXX 

ccccccccccccccccceeecccchhhhhhhhhcccccccccccccccccccccccccc^ 


Prosite for DKFZphfbr2_62nl0.2 


PSOOOOl 40->4 4 ASN_GLYCOSYLATION PDOCOOOOl 

PSOOOOl i82->186 ASN^GLYCOSYLATION PDOCOOOOl 
PSOOOOl 260->264 ASN_GLYCOSYLATION PDOCOOOOl 
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PSOOUUl 



DC nn Art 1 
IroUUUU J. 

fl fl J 



CI ^ 


DC An An 1 

DZ D 


DC A n An ^ 

T /I A 

- J *i H 

r»c? A O A A c 


- Q 


± D D 


DC A n A A 

1 bb 

->i by 


./iZU 

-->zz ^ 

DC AO AA Ci 

O i1 A 


DC AAA A ^ 

O Q 


DC A AAA C 


->2 57 

rbUUUUD 

339 

->342 

o c* A A A A C 

361 

->364 

DC A AAA C 

384 

->387 

no AAA A C 

PSOUOOo 

419 

->422 

DC A AAA C 

rSOUOOD 

423 

->426 

DC AAA AC 

431 

->434 

ne> A A An c 
PSU0005 

436- 

->439 

DC AAA A f 

PSOOOOd 

13->17 

PSOUOUd 

79->83 

r»c A AAr\ c 

PSOOuOd 

89->93 

PSUUOUd 

147- 

->151 

ric* A A Art c 

183- 

->187 

PS00006 

208- 

->212 

r»c A A A A 

255- 

->259 

PS00006 

281- 

->285 

T\r* f\ r\r\ f\ 

PSUOOOd 

285- 

■>289 

PSOOOOo 

324- 

'>328 

PSOOOOd 

361- 

■>365 

PS00006 

365- 

■>369 

PS00006 

371- 

>375 

PS00006 

373- 

>377 

PS00006 

414- 

>418 

PS00006 

447- 

>451 

PS00006 

462- 

>466 

PS00006 

469- 

>473 

PS00007 

294- 

>302 

PS00008 

204- 

>210 

PSOOOOo 

226- 

>232 

PS00008 

292- 

>298 

PS00008 

408- 

>414 

PS00008 

427- 

>433 

PS00008 

489- 

>495 

PSOOOOd 

517- 

>523 

PS00013 

310- 

>321 

PS00029 

104- 

>126 


ASN GLYCOSYLATION 

ASN^GL YCOS Y LAT I ON 

AS N_GL YCOS YLAT I ON 

ASN^GLYCOSYLATION 

CAMPPHOS PHOS I TE 

PKC__PHOS PHO_S ITS 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC__PHOSPHO_^SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOS PHO_S I T E 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC^PHOSPHO SITE 

PKCPHOS PHO~S I TE 

PKCPHOSPHO^SITE 

PKC_PHOSPHO_SITE 

CK2_PH0SPH0_SITE 

CK2_PHOSPHO_SITE 

CK2_PH0SPH0_SITE 

CK2_PHOSPHO_SITE 

CK2_PH0SPH0_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOS P HO_S I TE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_S ITE 

CK2_PHOS PHOS ITE 

CK2_PHOSPHO SITE 

CK2_PH0SPH0~SITE 

CK2_PHOSPHO_SITE 

CK2_PH0SPH0_SITE 

T YR_PHOS PHO_S ITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

PROKAR_LI POPROT E I N 
LEUCINE ZIPPER 


PDOCOOOOl 

PDOCOOOOl 

PDOCOOOOl 

PDOCOOOOl 

PDOC000O4 

PDOC00005 

PDOC00005 

PDOC000O5 

PDOC000O5 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

FDOC00006 

PDOC00006 

POOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00007 

PDOC00008 

PDOC00008 

PDOC00008 

PDOC00008 

PDOC00008 

PDOC00008 

POOC00008 

PDOC00013 

PDOC00029 


(No Pfam data available for DKFZphfbr2_62nl0.2) 
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DKFZphfbr2_62ol7 


group: metabolism 

DKFZphfbr2_62pl7.2 encodes a novel 282 amino acid protein with weak similarity to the 
apolipoprotein E receptor. 

The new protein contains a leucine zipper for protein-protein interaction, and three LDL- 
receptor class A domain (LDLRA_1) patterns. In LDL-receptors the class A domains form the 
binding site for LDL and calcium. The acidic residues between the fourth and sixth cysteines 
are important for high-affinity binding of positively charged sequences in LDLR's ligands. 

The new protein can find application in modulation of cholesterol binding and transport by 
LDL-receptors and LDL-binding proteins 


similarity to apolipoprotein E receptor 

complete cDNA, complete cds, start at Bp 56 matches kozak consensus 
ANCatg EST hits 

Sequenced by LMU 

Locus : unknown 

Insert length: 1260 bp 

Poly A stretch at pos. 1240, polyadenylation signal at pos. 1218 


1 GGGGGATAAG AGAGCGGTCT GGACAGCGCG TGGCCGGCGC CGCTGTGGGG 
51 ACAGCATGAG CGGCGGTTGG ATGGCGCAGG TTGGAGCGTG GCGAACAGGG 
101 GCTCTGGGCC TGGCGCTGCT GCTGCTGCTC GGCCTCGGAC TAGGCCTGGA 
151 GGCCGCCGCG AGCCCGCTTT CCACCCCGAC CTCTGCCCAG GCCGCAGGCC 
201 CCAGCTCAGG CTCGTGCCCA CCCACCAAGT TCCAGTGCCG CACCAGTGGC 
251 TTATGCGTGC CCCTCACCTG GCGCTGCGAC AGGGACTTGG ACTGCAGCGA 
301 TGGCAGCGAT GAGGAGGAGT GCAGGATTGA GCCATGTACC CAGAAAGGGC 
351 AATGCCCACC GCCCCCTGGC CTCCCCTGCC CCTGCACCGG CGTCAGTGAC 
401 TGCTCTGGGG GAACTGACAA GAAACTGCGC AACTGCAGCC GCCTGGCCTG 
451 CCTAGCAGGC GAGCTCCGTT GCACGCTGAG CGATGACTGC ATTCCACTCA 
501 CGTGGCGCTG CGACGGCCAC CCAGACTGTC CCGACTCCAG CGACGAGCTC 
551 GGCTGTGGAA CCAATGAGAT CCTCCCGGAA GGGGATGCCA CAACCATGGG 
601 GCCCCCTGTG ACCCTGGAGA GCGTCACCTC TCTCAGGAAT GCCACAACCA 
651 TGGGGCCCCC TGTGACCCTG GAGAGTGTCC CCTCTGTCGG GAATGCCACA 
701 TCCTCCTCTG CCGGAGACCA GTCTGGAAGC CCAACTGCCT ATGGGGTTAT 
751 TGCAGCTGCT GCGGTGCTCA GTGCAAGCCT GGTCACCGCC ACCCTCCTCC 
801 TTTTGTCCTG GCTCCGAGCC CAGGAGCGCC TCCGCCCACT GGGGTTACTG 
851 GTGGCCATGA AGGAGTCCCT GCTGCTGTCA GAACAGAAGA CCTCGCTGCC 
901 CTGAGGACAA GCACTTGCCA CCACCGTCAC TCAGCCCTGG GCGTAGCCGG 
951 ACAGGAGGAG AGCAGTGATG CGGATGGGTA CCCGGGCACA CCAGCCCTCA 
1001 GAGACCTGAG CTCTTCTGGC CACGTGGAAC CTCGAACCCG AGCTCCTGCA 
1051 GAAGTGGCCC TGGAGATTGA GGGTCCCTGG ACACTCCCTA TGGAGATCCG 
1101 GGGAGCTAGG ATGGGGAACC TGCCACAGCC AGAACCGAGG GGCTGGCCCC 
1151 AGGCAGCTCC CAGGGGGTAG GACGGCCCTG TGCTTAAGAC ACTCCTGCTG 
1201 CCCCGTCTGA GGGTGGCGAT TAAAGTTGCT TCACATCCTC AAAAAAAAAA 
1251 AAAAAAAAAC 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 2 


ORF from 56 bp to 901 bp; peptide length: 282 

Category: similarity to known protein 

Classification: unset 

Prosite motifs: LDLRA 1 (67-90) 

LDLRA^l (67-90) 

LDLRA_1 (145-168) 
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LEUCINE_ZIPPER (17-39) 


1 MSGGWMAQVG AWRTGALGLA LLLLLGLGLG LEAAASPLST PTSAQAAGPS 
51 SGSCPPTKFQ CRTSGLCVPL TWRCDRDLDC SDGSDEEECR lEPCTQKGQC 
101 PPPPGLPCPC TGVSDCSGGT DKKLRNCSRL ACLAGELRCT LSDDCIPLTW 
151 RCDGHPDCPD SSDELGCGTN EILPEGDATT MGPPVTLESV TSLRNATTMG 
201 PPVTLESVPS VGNATSSSAG DQSGSPTAYG VIAAAAVLSA SLVTATLLLL 
251 SWLRAQERLR PLGLLVAMKE SLLLSEQKTS LP 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_62ol7, frame 2 

TREMBL:AF110520_6 product: -NG29''; Mus musculus major 
histocompatibility complex region NG27, NG28, RPS28, NADH 
oxidoreductase, NG29, KIFCl, Fas-binding protein, BINGl, tapasin, 
RalGDS-like, KE2, BING4, beta 1, a^-galactosyl transferase, and RPS18 
genes, complete cds; Sacm21 gene, partial cds; and unknown gene., N = 
1, Score 733, P = 1.5e-72 

PIR:JE0237 apolipoprotein E receptor 2 precursor - mouse, N = 2, Score 

= 290, P = l.le-26 

TREMBL:HSZ75190_1 product: "apolipoprotein E receptor 2 906"; 

H. sapiens mRNA for apolipoprotein E receptor 2, N = 1, Score = 279, P = 

I. 8e-23 

>TREMBL:AF110520_6 product: "NG29"; Mus musculus major histocompatibility 
complex region NG27, NG28, RPS28, NADH oxidoreductase, NG29, KIFCl, 
Fas-binding protein, BINGl, tapasin, RalGDS-like, KE2, BING4, beta 
1, 3-galactosyl transferase, and RPS18 genes, complete cds; Sacm21 gene, 
partial cds; and unknown gene. 
Length = 260 

HSPs: 

Score = 733 (110.0 bits). Expect = 1.5e-72, P = 1.56-72 
Identities - 157/276 (56%), Positives « 178/276 (64%) 

MAQVGAWRTGALGLALLLLLGLGLGLEAAASPLSTPTSAQAAGPSSGSCPPTKFQCRTSG 65 
MA+ GA R ALGL L LL GL GLEAA +P T Q +G + SCP FQC TSG 
MARGGAGRAVALGLVLRLLFGLRTGLEAAPAPAHT— RVQVSGSRADSCPTDTFQCLTSG 58 

LCVPLTWRCDRDLDCSDGSDEEECRIEPCTQKGQCPPPPGLPCPCTGVSDCSGGTDKKLR 125 
CVPL+WRCD D DCSDGSDEE+CRIE C Q GQC P LPC C +S CS +DK L 


NCSR C EL C L D CIP TWRCDGHPDC DSSDEL C T+ 


Query: 

6 

Sbjct: 

1 

Query: 

66 

Sbjct: 

59 

Query: 

126 

Sbjct: 

118 

Query: 

186 

Sbjct: 

164 

Query: 

246 

Sbjct : 

224 


++ + NATT T+E+ S NT +SAGD S +P+AYGVIAAA VLSA LV+A 

EI OKI FQEENATTTRI STTMENETSFRNVTFTSAGDSSRNPS AYGV I AAAG VLSAI LVSA 223 

TLLLLSWLRAQERLRPLGLLVAMKESLLLSEQKTSL 281 
TLL+L LR Q LP GLLVA+KESLLLSE+KTSL 
TLLILLRLRGQGYLPPPGLLVAVKESLLLSERKTSL 259 

Pedant information for DKFZphfbr2_62ol7, frame 2 

Report for DKFZphfbr2_62ol7 . 2 

[LENGTH] 282 

(MW) 28991.19 

fpl) 4.61 

[HOMOL] rREMBL:AF110520_6 product: -NG29"; Mus musculus major histocompatibility 

complex region NG27, NG28, RPS28, NADH oxidoreductase, NG29, KIFCl, Fas-binding protein, 
BINGl, tapasin, RalGDS-like, KE2, BING4, beta 1, 3-galactosyl transferase, and RPS18 genes, 
complete cds; Sacm2l gene, partial cds; and unknown gene. 5e-55 
[BLOCKS) BL01209 LDL-receptor class A (LDLRA) domain proteins 

ISCOP) dlajj 7.11.1.1.1 Llgand-binding domain of low-density lipoprotei 2e-10 
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f PTRKW 1 

dupl.ic3t ion 1g~19 



r DTDIfU I 

he terodimer 66'^ 18 

f DTDKU ^ 

6ndocytosxs 4e-18 

r PTRKWl 

neparcin suXi.oue ^q~xc 

f DTDIfU 1 

vjjUL. xe— xy 

[ trx t\J\n J 

tz^nsineinbrdne protein le-19 

r D T DVXiS 1 
i It X K1\W J 

coated pits 4e-18 


xaLty a^xu nic uaijoxxsill X6 XI? 


o pxoucxn'cvupxea recspcor xe xu 


rec6ptor le-19 

f DT RVU 1 

glycoprotein le~19 

f DTRIf U 1 
1 tr±. I\[\W J 

lipid transport 4e"-18 

1 PTRKW 1 

x»L/t» oe 

f D T R If M 1 

calcium binding 6e-18 

f n T n vtj 1 
t rlRKWJ 

extracellular protein 6e-13 

f D T D vra 1 
I rrXKlvW J 

alternative splicing le-lS 

I xrX J\I\W J 

extracellular matrix 3e-10 

I fXKlvW J 

chondroitin sulfate proteoglycan 2e-12 

IPIRKWJ 

cholesterol 4e-18 

[SUPFAMJ 

leuclne-rich alpha-2-giycoprotein repeat homology le-10 

t oriDE;*AiLi(1 
loUrr Anj 

LDL receptor YWTD-containing repeat homology le-19 

(SUPFAMl 

trypsin homology 6e-13 

(SUPFAM) 

alpha-2-macroglobulin receptor 6e-18 

(SUPFAM) 

LDL receptor le-19 

(SUPFAM) 

LDL receptor ligand-blnding repeat homology le-19 

[SUPFAM] 

EGF homology le-19 

(PROSITEJ 

LDLRA 1 3 

IPROSITE] 

LEUCINE_ZIPPER 1 

(PFAMJ 

Low-density lipoprotein receptor domain class A 

[PFAMl 

TNFR/NGFR cysteine-rich region 

[KW] 

SIGNAL PEPTIDE 31 

tKWl 

TRANSMEMBRANE 1 

(KWJ 

LOW COMPLEXITY 22.34 % 


SEQ MSGGWMAQVGAWRTGALGLALLLLLGLGLGLEAAASPLSTPTSAQAAGPSSGSCPPTKFQ 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRO cccccccccccchhhhhhhhhhhhhhhhhhhhhhhhccccccccccccccccccccceee 


MEM 


SEQ CRTSGLCVPLTWRCDRDLDCSDGSDEEECRIEPCTQKGQCPPPPGLPCPCTGVSDCSGGT 

SEG xxxxxxxxxxx 

PRD ecccccceeeeecccccccccccccccccccccccccccccccccccccccccccccccc 

MEM 

SEQ DKKLRNCSRLACLAGELRCTLSDDCIPLTWRCDGHPDCPDSSDELGCGTNEILPEGDATT 

SEG 

PRD cccccccccccccccceeeccccccccccccccccccccccccccccccccccccccccc 

MEM 

SEQ MGPPVTLESVTSLRNATTMGPPVTLESVPSVGNATSSSAGDQSGSPTAYGVIAAAAVLSA 

SEG •••••••••-■-•••••••••.•...«•«.......,....,,,,,..,,.. xxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhhh 

MEM HMHMMMM 

SEQ SLVTATLLLLSWLRAQERLRPLGLLVAMKESLLLSEQKTSLP 

SEG xxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhcccccc 

MEM MMMMM^01MMM 


Prosite for DKFZphfbr2_62ol7 .2 


PS01209 
PS01209 
PS01209 
PSO0O29 


67->90 
67->90 
145->168 
17->39 


LDLRA_1 
LDLRA_1 
LDLRA_1 

LEUCINE ZIPPER 


PDOC00929 
PDOC00929 
PDOC00929 
PDOC00029 


Pfam for DKFZphfbr2_62ol7.2 


HMM_NAME 

HMM 

Query 


TNFR/NGFR cysteine-rich region 


54 


♦CpeGtY tD . WNHvpqClpC . t rCePEMGQYMvqPCTwTQNT . VC* 
CP+ ++ + + C+P RC+ ++ +C + ++ +C 
CPPTKFQCRTS— GLCVPLTWRCDR— DL DCSDGSDEEEC 


89 
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HMM_NAME Low-density lipoprotein receptor domain class A 

HMM *tTCeGPDEFQCgSGeMRCIPMsWvCDGDpDCeDWSDEWPeNChp* 

C P +FQC+++ C+P+ W+CD D DC D+SDE E+C+ 
Query 52 GSCP-PTKFQCRTSG-LCVPLTWRCDRDLDCSDGSDE — EECRI 91 

54.99 (bits) f: 130 t: 169 Target: dkf 2phfbr2_62ol7 . 2 similarity to apolipoprotein E 
receptor 

Alignment to HMM consensus : 
Query *tTCeGPDEFQCgSGeMRCIPMsWvCDGDpDCeDWSDEWPeNChp* 
C + E +C + CIP+ W+CDG PDC D SDE ++C+ 

dkfzphfbr2 130 LACL-AGELRCTLSD-DCIPLTWRCDGHPDCPDSSDE—LGCGT 169 
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DKF2phfbr2_64al5 


group: nucleic acid management 

DKFZphfbr2_64al5 encodes a novel 255 amino acid protein with strong similarity to inorganic 
pyrophosphatases 

Inorganic pyrophosphatase (EC 3.6.1.1) (PPase) is the enzyme responsible for the hydrolysis of 
pyrophosphate (PPi) which is formed as the product of the many biosynthetic reactions that 
utilize ATP, All known PPases require the presence of divalent metal cations, with magnesium 
conferring the highest activity. 

The new protein can find application as a new enzyme for biotechnologic processes. 


strong similarity to inorganic pyrophosphatases 

unspiiced Intron 212-256 see EST HS1190948 

Sequenced by Qiagen 

Locus : un known 

Insert length: 1188 bp 

Poly A stretch at pos. 1170, polyadenylation signal at pos. 1151 

1 GGGGGTTGGG GACCAGTGCA GGGACCGGGT CGCGCCGTGC TATGGCCCTG 
51 TACCACACTG AGGAGCGCGG CCAGCCCTGC TCGCAGAATT ACCGCCTCTT 
101 -CTTTAAGAAT GTAACTGGTC ACTACATTTC CCCCTTTCAT GATATTCCTC 
151 TGAAGGTGAA CTCTAAAGAG GACACTGAGG CTCAAGGCAT TTTTATAGAC 
201 TTGTCTAAGA TCTGGAAAAT GGCATTCCTA TGAAGAAAGC ACGAAATGAT 
251 GAATATGAGA ATCTGTTTAA TATGATTGTA GAAATACCTC GGTGGACAAA 
301 GGCTAAAATG GAGATTGCCA CCAAGGAGCC AATGAATCCC ATTAAACAAT 
351 ATGTAAAGGA TGGAAAGCTA CGCTATGTGG CGAATATCTT CCCTTACAAG 
401 GGTTATATAT GGAATTATGG TACCCTCCCT CAGACTTGGG AAGATCCCCA 
451 TGAAAAAGAT AAGAGCACGA ACTGCTTTGG AGATAATGAT CCTATTGATG 
501 TTTGCGAAAT AGGCTCAAAG ATTCTTTCTT GTGGAGAAGT TATTCATGTG 
551 AAGATCCTTG GAATTTTGGC TCTTATTGAT GAAGGTGAAA CAGATTGGAA 
601 ATTAATTGCT ATCAATGCGA ATGATCCTGA AGGCTCAAAG TTTCATGATA 
651 TTGATGATGT TAAGAAGTTC ATVACCGGGTT ACCTGGAAGC TACTCTTAAT 
701 TGGTTTAGAT TATGTAAGGT ACCAGATGGA AAACCAGAAA ACCAGTTTGC 
751 TTTTAATGGA GAATTCAAAA ACAAGGCTTT TGCTCTTGAA GTTATTAAAT 
801 CCACTCATCA ATGTTGGAAA GCATTGCTTA TGAAGAACTG TAATGGAGGA 
851 GCTACAAATT GCACAAACGT GCAGATATCT GATAGCCCTT TCCGTTGCAC 
901 TCAAGAGGAA GCAAGATCAT TAGTTGAATC GGTATCATCT TCACCAAATA 
951 AAGAAAGTAA TGAAGAAGAG CAAGTGTGGC ACTTCCTTGG C7VAGTGATTG 
1001 AAACATCTGA AATTCTGCTG TCAAGATTCC CATCTCTAAG GACTCCAAGA 
1051 CTCTTTTTCC CCAAGTGCTA GAGACAAGGG GGTCTATGAG CATTTACTGA 
1101 CTTCCTGTTA AAACTTCATT TTTTCAAACT TTTTGAGCTA TGCAATATAT 
1151 AAATAAACAG TAAGAATTTT AAAAAAAAAA AAAAAAAA 


BLAST Results 


Entry HSPPASEMR from database EMBL: 

H. sapiens partial mRNA for pyrophosphatase. 

Score = 1706, P = l,6e-70, identities « 342/343 


Medline entries 


No Medline entry 


Peptide information for frame 2 


ORF from 230 bp to 994 bp; peptide length: 255 
Category: strong similarity to known protein 
Classification: unset 
Prosite motifs: PPASE <85-92) 
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1 MKKARNDEYE NLFNMIVEIP RWTKAKMEIA TKEPMNPIKQ YVKDGKLRYV 
51 ANIFPYKGYI WNYGTLPQTW EDPHEKDKST NCFXSDNDPID VCEIGSKILS 
101 CGEVIHVKIL GILALIDEGE TDWKLIAINA NDPEASKFHD IDDVKKFKPG 
151 YLEATLNWFR LCKVPDGKPE NQFAFNGEFK NKAFALEVIK STHQCWKALL 
201 MKNCNGGATN CTNVQISDSP FRCTQEEARS LVESVSSSPN KESNEEEQVW 
251 HFLGK 

BLASTP hits 

Entry IPYR^KLULA from database SWISSPROT: 

INORGANIC PYROPHOSPHATASE (EC 3.6.1.1) (PYROPHOSPHATE PHOSPHO- 
HYDROLASE) (PPASE) , 

Score = 689, P = 6.0e-68, identities = 128/248, positives = 170/248 

Entry A45153 from database PIR: 

inorganic pyrophosphatase (EC 3.6.1.1) - bovine 

Score = 862, P = 2.8e-86, identities = 146/226, positives = 190/226 
Entry AF085600_1 from database TREMBLNEW; 

gene: "Nurf-38"; product: "inorganic pyrophosphatase NURF-38"; 
Drosophila melanogaster inorganic pyrophosphatase NURF-38 (Nurf-38) 
gene, complete cds. 

Score = 731, P = 2.1e-72, identities = 134/248, positives = 177/248 
Entry PWBY from database PIR: 

inorganic pyrophosphatase (EC 3.6.1.1) - yeast (Saccharomyces 
cerevisiae) 

Score « 688, P = 7.7e-68, identities « 133/251, positives = 174/251 


Alert BLASTP hits for DKFzphfbr2_64al5, frame 2 

SWISSPROT: I PYR_DROME INORGANIC PYROPHOSPHATASE (EC 3.6.1.1) 
(PYROPHOSPHATE PHOSPHO- HYDROLASE) (PPASE)., N = 1, Score = 731, P = 
2.4e-72 


>SWISSPROT:IPYR_DROME INORGANIC PYROPHOSPHATASE (EC 3.6.1.1) (PYROPHOSPHATE 
PHOSPHO- HYDROLASE) (PPASE) . 
Length = 290 

HSPs: 

Score » 731 (109.7 bits). Expect = 2.4e-72, P » 2 . 4e-72 
Identities - 134/248 (54%), Positives = 177/248 (71%) 

Query: 7 DEYENLFNMIVEIPRWTKAKMEIATKEPMNPIKQYVKDGKLRYVANIFPYKGYIWNYGTL 66 

+E + ++NM+VE+PRWT AKMEI+ K PMNPIKQ +K GKLR+VAN FP+KGYIWNYG L 
Sbjct: 40 NEEKTIYNMWEVPRWTNAKMEISLKTPMNPIKQDIKKGKLRFVANCFPHKGYIWNYGAL 99 

Query: 67 PQTWEDPHEKDKSTNCFGDNDPIDVCEIGSKILSCGEVIHVKILGILALIDEGETDWKLI 126 

PQTWE+P + ST C GDNDPIDV EIG ++ G+V+ VK+LG ALIDEGETDWK+I 
Sbjct: 100 PQTWENPDHIEPSTGCKGDNDPIDVIEIGYRVAKRGDVLKVKVLGQFALIDEGETDWKII 159 

Query: 127 AINANDPEASKFHDIDDVKKFKPGYLEATLNWFRLCKVPDGKPENQFAFNGEFKNKAFAL 186 

AI+ NOP ASK +DI DV ++ PG L AT+ WF++ K+PDGKPENQFAFNG+ KN FA 
Sbjct: 160 AIDVNDPLASKVNDIADVDQYFPGLLRATVEWFKIYKIPDGKPENQFAFNGDAKNADFAN 219 

Query: 187 EVIKSTHQCWKALLMKNCNGGATNCTNVQISDSPFRCTQEEARS-LVESVSSSPNKESNE 245 

+1 TH+ W+ L+ ++ G+ + TN+ +S +EEA L E+ +E ++ 

Sbjct: 220 TIIAETHKFWQNLVHQSPASGSISTTNITNRNSEHVIPKEEAEKILAEAPE>GGQVEEVSD 279 

Query: 24 6 EEQVWHFL 253 
WHF+ 

Sbjct: 280 TVDTWHFI 287 


Peptide information for frame 3 


ORF from 42 bp to 230 bp; peptide length: 63 
Category: strong similarity to Jcnown protein 
Classification: unset 


1 MALYHTEERG QPCSQNYRLF FKNVTGHYIS PFHDIPLKVN SKEDTEAQGI 
51 FIDLSKIWKM AFL 
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BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_64al5, frame 3 

SWISSPR0T:1PYR_DR0ME INORGANIC PYROPHOSPHATASE (EC 3.6.1.1) 
(PYROPHOSPHATE PHOSPHO- HYDROLASE) (PPASE)., N - 1, Score = 118, P - 
8.8e-07 

PIR:A45153 inorganic pyrophosphatase (EC 3.6.1.1) - bovine, N - 1, 
Score = 113, P = 3.1e-06 

TREMBLNEW:AF108211_1 product: "cytosolic inorganic pyrophosphatase"; 
Homo sapiens cytosolic inorganic pyrophosphatase roRNA, partial cds., N 
= 1, Score = 106, P « 1.8e-05 


>SWISSPROT:IPYR DROME INORGANIC PYROPHOSPHATASE (EC 3.6.1.1) (PYROPHOSPHATE 
PHOSPHO- HYDROLASE) (PPASE). 
Length * 290 

HSPs : 


Score = 118 (17.7 bits). Expect = 8.8e-07, p « 8.8e-07 
Identities - 23/43 (53%), Positives = 29/43 (67%) 

Query: 1 MALYHTEERGQPCSQNYRLFFKNVTGHYISPFHDIPLKVNSKE 43 

MALY T E+G S +Y L+FKN G+ ISP HDIPL N ++ 
Sbjct: 1 MALYETVEKGAKNSPSYSLYFKNKCGNVISPMHDIPLYANEEK 43 

Pedant information for DKFZphfbr2_64al5, frame 2 


Report for DKrZphfbr2_64al5.2 

[LENGTH] 255 

(MWJ 29177,34 

[pll 5.67 

(HOMOLl TREMBLNEW:AF108211_1 product: "cytosolic inorganic pyrophosphatase"; Homo 
sapiens cytosolic inorganic pyrophosphatase mRNA, partial cds. 2e-93 

(FUNCAT] 01.04,01 phosphate utilization (S. cerevisiae, YBROllcl 9e-73 

(FUNCATJ 30.03 organization of cytoplasm (S. cerevisiae, YBROllc) 9e-73 

(FUNCATl 02.99 other energy generation activities [S, cerevisiae, YMR267wJ le-SB 

[FUNCATJ 30.16 mitochondrial organization (S. cerevisiae, YMR267wl le-58 

[FUNCAT] 1 genome replication, transcription, recombination and repair [M. 

genitalium, MG351] le-06 

(FUNCAT] g carbohydrate metabolism and transport (H. influenzae, HI 012 4] 2e-06 

[BLOCKS] BL00387D 
(BLOCKS] BL00387C 
[BLOCKS! BL00387B 
[BLOCKS] BLO0387A 

[SCOP] dlwgja 2.29.5.1.1 Inorganic pyrophosphatase (baker's yeas le-113 

[EC) 3.6.1.1 Inorganic pyrophosphatase 7e-S2 

[PIRKW] mitochondrion 3e-57 

[PIRKW] hydrolase 7e-92 

I PIRKW J homodimer 2e-71 

fSUPFAM] inorganic pyrophosphatase 7e-92 

(PROSITEJ PPASE 1 

[KW] Alpha_Beta 

[KW] 3D 

[KW] LOW_COMPLEXITY 6.27 % 

SEQ MKKARNDEYENLFNMIVEIPRWTKAKMEIATKEPMNPIKQYVK[)GKLRYVANIFPYKGYI 

SEG 

IJlulcB EGGCSCEEEEEEEETTTbCBCEEETTTTTTTCEEECEETTEECBCCBBTTBTTbT 


SEQ WNYGTLPQTWEDPHEKDKSTNCFGDNDPIDVCEIGSKILSCGEVIHVKILGILALIDEGE 

SEG 

lhu)cB CEEEETTTTCBTTTTEETTTTEBCCCBCCEEEECCCCCCTTTEEEEEEEEEEEEETTTTB 

SEQ TDWKLIAINANDPEASKFHDIDDVKKFKPGYLEATLNWFRLCKVPDGKPENQFAFNGEFK 

SEG 

IhuJcB CEEEEEEEETTTTTGGGCCCHHHHHHHTTTHHHHHHHHHHHHCGGGCCCCCCBCGGGCCB 

SEQ KKAFALEVIKSTHQCWKALLMKNCNGGATNCTNVQISDSPFRCTQBEARSLVESVSSSPN 

SEG *•••••••••••••••••••••••••.••••••.•'•.••••«•,,,,,,,, xxxxxxxxx 

IhukB CHHHHHHHHHHHHHHHHHHHHCTTTTTTTCCCBTTTTTTT 
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SEQ KESNEEEQVWHH'JLGK 

SEG xxxxxxx 

IhukB 


Prosite for DKFZphf br2_64al5 . 2 
PS00387 85->92 PPASE PDOC00325 

<No Pfam data available for DKFZphfbr2_64al5 .2) 

Pedant information for DKF2phfbr2_64al5, frame 3 

Report for DKF2ph£br2_64al5 .3 

[LENGTH] 63 

[MW] 7405.54 

[pi] 6.81 

(HOMOLJ SWISSPROT:IPYR_DROME INORGANIC PYROPHOSPHATASE (EC 3,6.1.1) (PYROPHOSPHATE 

PHOSPHO- HYDROLASE) (PPASE). le-06 

[ECj 3.6,1.1 Inorganic pyrophosphatase 5e-06 

(PIRKW] hydrolase 5e-06 

(SUPFAM] inorganic pyrophosphatase 5e-06 

(KW] AllBeta 

SEQ MALYHTEERGQPCSQNYRLFFKNVTGHYISPFHDIPLKVNSKEDTEAQGIFIDLSKIWKM 
PRO cccccccccccccccceeeeeecccccccccccccccccccccccccceeeechhhhhhh 

SEQ AFL 
PRD ccc 


(No Prosite data available for DKFZph£br2_64al5 . 3) 
(No Pfam data available for DKF2phfbr2_64al5 . 3) 
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group: brain derived 

DKFZphfbr2_64al6.2 encodes a novel 101 amino acid protein without similarity to known 
proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 


unknown 

complete cDNA, complete cds, EST hits 
Sequenced by Qiagen 

Locus: /map="745_A_2; 756_F_2; 842C_2" 
Insert length: 1866 bp 

Poly A stretch at pes. 1848, polyadenylation signal at pos. 1829 

1 GGGCGCGGCG CCGGAGGAGG AAGTGGTGAG GTTGTTGCTC CTTCAGCGCC 
51 TATCGCTGGC TCTTGGGGCG CAGAGAGGGG CCGCAGTCTC CGCGGCTGCG 
101 TCGAGCTCCC TTGCAGTCCC CTCCATGTTC CCCGGCGCCA CTACTCCCCT 
151 TCCTAAGGCC GCCGCTTACC CCGGGGTCTA TGGAAGTAAT GGAAGGACCC 
201 CTCAACCTGG CTCATCAACA GAGCAGACGA GCAGACCGTT TATTAGCTGC 
251 AGGCAAATAC GAAGAGGCTA TTTCTTGTCA CAAAAAGGCT GCAGCATATC 
301 TTTCTGAAGC CATGAAGCTG ACACAGTCAG AGCAGGCTCA TCTTTCACTG 
351 GAATTGCAAA GGGATAGCCA TATGAAACAG CTCCTCCTCA TCCAAGAGAG 
401 ATGGAAAAGG GCCCAGCGTG AAGAAAGATT GAAAGCCCAG CAGAACACAG 
451 ACAAGGATGC AGCTGCCCAT CTTCAGACAT CTCACAAACC CTCTGCAGAG 
501 GATGCAGAGG GCCAGAGTCC CCTTTCTCAG ARGTACAGCC CTTCCACAGA 
551 GAAATGCCTG CCTGAGATTC AGGGGATCTT TGACAGGGAT CCAGACACAC 
601 TACTTTATTT ACTTCAGCAA AAGAGTGAGC CAGCAGAGCC ATGTATTGGA 
651 AGCAAAGCCC CAAAAGATGA TAAAACAATT ATAGAGGAGC AGGCAACCAA 
701 AATTGCAGAT TTGAAGAGGC ATGTGGAATT CCTTGTGGCT GAGAATGAAA 
751 GATTAAGGAA - AGAAAATAAA CAACTAAAGG CTGAAAAGGC CAGACTTCTA 
801 AAAGGTCCAA TAGAAAAGGA GCTGGATGTA GATGCTGATT TTGTAGAAAC 
851 GTCAGAGTTA TGGAGCTTGC CACCACATGC AGAAACTGCT ACAGCCTCCT 
901 CAACCTGGCA GAAGTTCGCA GCAAATACTG GGAAAGCCAA GGACATTCCA 
951 ATCCCCAATC TTCCTCCCTT GGATTTTCCA TCTCCAGAAC TTCCTCTTAT 
1001 GGAGCTCTCT GAGGATATTC TGAAAGGACT TATGAATAAT TAAAATGGAA 
1051 GGCCACAGAA AAGGGGAAAA GAGGAAATAA TACAGTAATC GTTAATCCAG 
1101 CAAAAAGAAA TGAAAAGGGA AAACCACATA GAAGGGTAAT CCCGGAAATG 
1151 CTTCATCTGG TGGACTGTGG GAGCAGAGGC ATTGCCAGGA CTTGGGAAAC 
1201 AGTCACTGTG AAATGCGCTG CGTATCTCAT TCACTCACTT CAGCTAATGA 
1251 CTCCGACTTG GCAGACGCTA AACTCATGGA GGTTCGGTTT CTCCTGATAC 
1301 AAACCAAATG GCTACCTGGA AGAATTTCTT TCAAGCAACA GTTATTTTTC 
1351 TTATCTTCAG GGTTAAAATG TATAAAAGTT ATGTGTAATT AATCTATAAT 
1401 GCCATAAATG ATAATGCAAA ACCTAAATAA TATGGTGGCC GGAGGGGCTG 
1451 CCTTATATTT GAAACATGCT TTCTATCATG CATTGACTGT ATGCATTTTG 
1501 TTAATGCACA TTCTGTTTGT TTAAGGTGTG TGAGATACAC ACCTTTCTAG 
1551 ATGAAACTAT ATGTGCCACA CTTTGCACTA CTCATAATGA TAACCTCAAG 
1601 ACTATCAGAA GAAATATTTA AATTTCCATT TTATGAAGAA AGGAACCAAA 
1651 TTATTATGCT TTTTAAAACA AATTACCAGT TTACATAATT AATCAGGGTG 
1701 CATTTTAAGT TCTAACTTCG TTTATTGTAT AATGCATCAT TTGAAAATAC 
1751 CAAGGAGGAA ATACCCTTTG TTTTTAATGA TGCAAGAGTG GACGTAATGC 
1801 TAGTTGGCAG TATTTTATTG TAAGAAATCA ATAAAGTAAT TGTGTTTTAA 
1851 AAAAAAAAAA T^AAAAA 


BLAST Results 


Entry HS286143 from database EMBL: 
human STS WI-6844. 
Score * 1460, P = 3.4e-61, identities « 292/292 


Medline entries 


No Medline entry 
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Peptide information for. frame 2 


ORF from the beginning to 304 bp; peptide length: 102 
Category: questionable ORF 
Classification: unset 


1 GAAPEEEVVR LLLLQRLSLA LGAQRGAAVS AAASSSLAVP SMFPGATTPL 
51 PKAAAYPGVY GSNGRTPQPG SSTEQTSRPF ISCRQIRRGY FLSQKGCSIS 
101 F 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKF2phfbr2_64cl6, frame 2 
No Alert BLASTP hits found 

Peptide information for frame 3 


ORF from 180 bp to 1040 bp; peptide length: 287 
Category: putative protein 
Classification: unset 

Prosite motifs: LEUCINE_2IPPER (178-200) 
LEUCINE ZIPPER (185-207) 


1 MEVMEGPLNL AHQQSRRADR LLAAGKYEEA ISCHKKAAAY LSEAMKLTQS 
51 EQAHLSLELQ RDSHMKQLLL IQERWKRAQR EERLKAQQNT DKDAAAHLQT 
101 SHKPSAEDAE GQSPLSQKYS PSTEKCLPEI QGIFDRDPDT LLYLLQQKSE 
151 PAEPCIGSKA PKDDKTIIEE QATKIADLKR HVEFLVAENE RLRKENKQLK 
201 AEKARLLKGP lEKELDVDAD FVETSELWSL PPHAETATAS STWQKFAANT 
251 GKAKDIPIPN LPPLDFPSPE LPLMELSEDI LKGLMNN 


BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_64cl6, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_64cl6, frame 2 


Report for DKFZphfbr2_64cl6.2 


[LENGTH) 101 

[MW] 10469.94 

[pi] 10.18 

[KWJ All_Alpha 

[KW] LOW COMPLEXITY 29.70 % 


SEQ GAAPEEEVVRLLLLQRLSLALGAQRGAAVSAAASSSLAVPSMFPGATTPLPKAAAYPGVY 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRO ccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccccccccccccccccc 

SEQ GSNGRTPQPGSSTEQTSRPFISCRQIRRGYFLSQKGCSISF 

SEG 

PRD ccccccccccccccccccccciihhhhccccccccccccccc 


(No Prosite data available for DKFZphfbr2_64cl6.2) 
(No Pfam data available for DKFZphfbr2_64cl6. 2) 

Pedant information for DKFZphfbr2 64cl6, frame 3 
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Report for DKFZphfbr2_64cl6. 3 


(LENGTH! 


287 

32343.79 
5.61 

LEUCINE_2IPPER 2 
All_Alpha 
COILED COIL 


[MWJ 

[pU 


[PROSITE] 


(KW} 
fKW] 


14.98 % 


SEQ MEVMEGPLNLAHQQSRRADRLLAAGKYEEAISCHKKAAAYLSEAMKLTQSEQAHLSLELQ 

PRD ccccchhhhhhhhhhhhhhhhhhhcchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS 

SEQ RDSHMKQLLLIQERWKRAQREERLKAQQNTDKDAAAHLQTSHKPSAEDAEGQSPLSQKYS 

PRD hhcchhhhhhhhhhhhhhhhhhhhhhhhccccchhhhhhhcccccccccccccccccccc 

COILS 

SEQ PSTEKCLPEIQGIFDRDPDTLLyLLQQKSEPAEPCIGSKAPKDDKTIIEEQATKIADLKR 

PRD cccccccchhhhhcccccchhhhhhhhhcccccccccccccccchhhhhhhhhhhhhhhh 

^^^^^ CCCCCCCCCCCCCC 

SEQ HVEFLVAENERLRKENKQLKAEKARLLKGPIEKELDVDADFVETSELWSLPPHAETATAS 

PRO hhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccccccccccccccccccccccc 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ STWQKFAANTGKAKDIPI PNLPPLDFPSPELPLMELSEDILKGLMNN 

PRD hhhhhhhhhcccccccccccccccccccccchhhhhhhhhhhhhccc 

COILS 


Prosite for DKFZphfbr2_64cl6.3 


PS00029 
PS00029 


178->200 
185->207 


LEUCINE_ZIPPER 
LEUCINE_ZIPPER 


PDOC00029 
PDOC00029 


(No Pfam data available for DKrzphfbr2_64cl6 .3) 
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DKF2phfbr2_64c4 


group: brain derived 

DKFZphfbr2_64c4 encodes a novel 467 amino acid protein with similarity to A. thaliana T08ll3,5 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 


similarity to A, thaliana T08I13.5 

complete cDNA, complete cds, EST hits 

on genomic level encoded by AC005043 11 exons 

Sequenced by Qiagen 

Locus : unknown 

Insert length: 1559 bp 

Poly A stretch at pos. 1540, no polyadenylation signal found 


1 TGGGACCGCC GGAAGTTTCT GCCGCGGCTT TGCGGGGACG GGGGAGTGGT 
51 AGTGGGGGCT GCAGCTGCCG GACCCAGGCG CGATGGCTAC GGGCGCGGAT 
101 GTACGGGACA TTCTAGAACT CGGGGGTCCA GAAGGGGATG CAGCCTCTGG 
151 GACCATCAGC AAGAAGGACA TTATCAACCC GGACAAGAAA AAATCCAAGA 
201 AGTCCTCTGA GACACTGACT TTCAAGAGGC CCGAGGGCAT GCACCGGGAA 
251 GTCTATGCCT TGCTCTACTC TGACAAGAAG GATGCACCCC CACTGCTACC 
301 CAGTGACACT GGCCAGGGAT ACCGTACAGT GAAGGCCAAG TTGGGCTCCA 
351 AGAAGGTGCG GCCTTGGAAG TGGATGCCAT TCACCAACCC GGCCCGCAAG 
401 GACGGAGCAA TGTTCTTCCA CTGGCGACGT GCAGCGGAGG AGGGCAAGGA 
451 CTACCCCTTT GCCAGGTTCA ATAAGACTGT GCAGGAGCCT GTGTACTCGG 
501 AGCAGGAGTA CCAGCTTTAT CTCCACGATA ATGCTTGGAC TAAGGCAGAA 
551 ACTGACCACC TCTTTGACCT CAGCCGCCGC TTTGACCTGC GTTTTGTTGT 
601 TATCCATGAC CGGTATGACC ACCAGCAGTT CAAGAAGCGT TCTGTGGAAG 
651 ACCTGAAGGA GCGGTACTAC CACATCTGTG CTAAGCTTGC CAACGTGCGG 
701 GCTGTGCCAG GCACAGACCT TAAGATACCA GTATTTGATG CTGGGCACGA 
751 ACGACGGCGG AAGGAACAGC TTGAGCGTCT CTACAACCGG ACCCCAGAGC 
801 AGGTGGCAGA GGAGGAGTAC CTGCTACAGG AGCTGCGCAA GATTGAGGCC 
851 CGGAAGAAGG AGCGGGAGAA ACGCAGCCAG GACCTGCAGA AGCTGATCAC 
901 AGCGGCAGAC ACCACTGCAG AGCAGCGGCG CACGGAACGC AAGGCCCCCA 
951 AAAAGAAGCT ACCCCAGAAA AAGGAGGCTG AGAAGCCGGC TGTTCCTGAG 
1001 ACTGCAGGCA TCAAGTTTCC AGACTTCAAG TCTGCAGGTG TCACGCTGCG 
1051 GAGCCAACGG ATGAAGCTGC CAAGCTCTGT GGGACAGAAG AAGATCAAGG 
1101 CCCTGGAACA GATGCTGCTG GAGCTTGGTG TGGAGCTGAG CCCGACACCT 
1151 ACGGAGGAGC TGGTGCACAT GTTCAATGAG CTGCGAAGCG ACCTGGTGCT 
1201 GCTCTACGAG CTCAAGCAGG CCTGTGCCAA CTGCGAGTAT GAGCTGCAGA 
1251 TGCTGCGGCA CCGTCATGAG GCACTGGCCC GGGCTGGTGT GCTAGGGGGC 
1301 CCTGCCACAC CAGCATCAGG CCCAGGCCCG GCCTCTGCTG AGCCGGCAGT 
1351 GTCTGAACCC GGACTTGGTC CTGACCCCAA GGACACCATC ATTGATGTGG 
1401 TGGGCGCACC CCTCACGCCC AATTCGAGAA AGCGACGGGA GTCGGCCTCC 
1451 AGCTCATCTT CCGTGAAGAA AGCCAAGAAG CCGTGAGAGG CCCCACGGGG 
1501 TGTGGGCGAC GCTGTTATGT AAATAGAGCT GCTGAGTTGG AAAAAAAAAA 
1551 AAAAAAAAA 


BLAST Results 


Entry AC005043 from database EMBL: 

Homo sapiens clone NH0576N21; HTGS phase 1, 5 unordered pieces. 
Score «= 1506, P = 4.6e-244, identities 316/330 


Medline entries 


No Medline entry 


Peptide information for frame 2 


ORF from 83 bp to 1483 bp; peptide length; 4 67 
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Category: similarity to unknown protein 


1 MATGADVRDI LELGGPEGDA ASGTISKKDI INPDKKKSKK SSETLTFKRP 
51 EGMHREVYAL LYSDKKDAPP LLPSDTGQGY RTVKAKLGSK KVRPWKWMPF 
101 TNPARKDGAM FFHWRRAAEE GKDYPFARFN KTVQEPVYSE QEYQLYLHDN 
151 AWTKAETDHL FDLSRRFDLR FWIHDRYDH QQFKKRSVED LKERYYHICA 
201 KLANVRAVPG TDLKIPVFDA GHERRRKEQL ERLYNRTPEQ VAEEEYLLQE 
251 LRKIEARKKE REKRSQDLQK LITAADTTAE QRRTERKAPK KKLPQKKEAE 
301 KPAVPETAGI KFPDFKSAGV TLRSQRMKLP SSVGQKKIKA LEQMLLELGV 
351 ELSPTPTEEL VHMFNELRSD LVLLYELKQA CAMCEYELQM LRHRHEALAR 
401 AGVLGGPATP ASGPGPASAE PAVSEPGLGP DPKDTIIDVV GAPLTPNSRK 
451 RRESASSSSS VKKAKKP 


BLASTP hits 

Entry ATAC2337_5 from database TREMBLNEW: 

gene: "TOSIIS.S"; Arabidopsis thaliana chromosome II BAC T08I13 
genomic sequence, complete sequence. 

Score - 340, P = 2.6e-30, identities - 115/374, positives - 176/374 

Entry YE8D_SCHP0 from database SWISSPROT; 

HYPOTHETICAL 47.1 KD PROTEIN C9G1.13C IN CHROMOSOME I. 

Score « 221, P « 1.9e-20, identities = 67/192, positives « 97/192 

Entry S64291 from database PIR: 

hypothetical protein yGR002c - yeast (Saccharomyces cerevisiae) 
Score « 202, P = 2.8e-13, identities = 71/260, positives = 124/260 


Alert BLASTP hits for DKF2phfbr2_64c4, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_64c4, frame 2 


Report for DKFZphfbr2_64c4 .2 


(LENGTH! 4 67 

[MWJ 53007.60 


(Pll 9.51 

TS2??3^.noMin '^f^MBL:ATAC2337 5 gene: "TOBIU.S"; Arabidopsis thaliana chromosome II BAC 

T08I13 genomxc sequence, complete sequence. 4e-29 

IFUNCAT] 99 unclassified proteins (S. cerevisiae, YGR002cl le-19 


[PROSITEl MYRISTYL 1 

[PROSITEJ CAMP PHOSPHO_SITE 4 

(PROSITE) CK2_PH0SPH0_SITE 10 

[PROSITE) TYR_PHOSPHO_SITE 3 

(PROSITE) GLYCOSAMINOGLYCAN 1 

(PROSITE) PKC_PHOSPHO_STTE 12 

(PROSITE) ASNGLYCOSYLATION 1 

CKW) All^Alpha 

fKW) LOW COMPLEXITY 20.13 % 


SEQ MATGADVRDI LELGGPEGDAASGTISKKDHNPDKKKSKKSSETLTFKRPEGMHREVYAL 

SEG 

PRD 


SEQ 
SEG 
PRD 


xxxxxxxxxxxxxxxxxx 

ccceeeeeeeeeeccccccccccccccccccccccccccccccccccccccchhhhhhhh 


SEQ LYSDKKDAPPLLPSDTGQGYRTVKAKLGSKKVRPWKWMPFTNPARKDGAMFFHWRRAAEE 
SEG 

PRD *»hhhccccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhc 

SEQ GKDYPFARFNKTVQEPVYSEQEYQLYLHDNAWTKAETDHLFDLSRRFDLRFWIHDRYDH 
SEG , 

PRD ccccccccccccccccchhhhhhhhhhhcchhhhhhhhhhhhhhhhc^ 

QQFKKRSVEDLKERYYHICAKLANVRAVPGTDLKIPVFDAGHERRRKEQLERLYNRTPEQ 

chhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccchhhhhhhhhhhhhhhhhcchhh 

SEQ VAEEEYLLQELRKIEARKKEREKRSQDLQKLITAADTTAEQRRTERKAPKKKLPOKKEAE 
XXJCXXX^XXXXXXXX »«»> •*«»•«•• XX xxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhh 

SEQ KPAVPETAGIKFPDFKSAGVTLRSQRMKLPSSVGQKKIKALEQMLLELGVELSPTPTEEL 
SEG XXX 
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PRD 


hccccccccccccccccceeehhhhhhhccccccchhhhhhhhhhhhhhhhcccccchhh 


SEQ VHMFNELRSDLVLLYELKQACANCEYELQMLRHRHEALARAGVLGGPATPASGPGPASAE 

SEC xxxxxxxxxxxxxxxx 

PRD hhhhhhccchhhhhhhhhhhcccchhhhhhhhhhhhhhhhhccccccccccccccccccc 

SEQ PAVSEPGLGPDPKDTIIDVVGAPLTPNSRKRRESASSSSSVKKAKKP 

SEG xxxxxxx xxxxxxxxxxxxxxxxxxx . 

PRD cccccccccccccceeeeeccccccccccccccccccccceeecccc 


Prosite for DKFZphfbr2_64c4 .2 


pcnnnm 

130 

->134 


412 

->416 

roUUUU4 

35->39 


39->43 

f\ ^ f\ f\ A 

PS0U0u4 

184 

->188 

rSUU0U4 

451 

->455 

PS00005 

2i 

6->29 

PS00005 

38->41 

PS00005 

41 

6->49 

PS00005 

63->66 

PS00005 

82->85 

PS00005 

89->92 

PS00005 

164- 

->167 

PS00005 

284- 

->287 

PS00005 

321- 

->324 

PS00005 

324- 

->327 

PS00005 

448- 

->451 

PS00005 

460- 

->463 

PS00006 


3->7 

PS00006 

26->30 

PS00006 

132- 

->136 

PS00006 

139- 

->143 

PS00006 

153- 

->157 

PS00006 

187- 

->191 

PS00006 

273- 

>277 

PS00006 

277- 

>281 

PS00006 

355- 

>359 

PS00006 

435- 

>439 

PS00007 

131- 

>139 

PS00007 

227- 

>235 

PS00007 

116- 

>125 

PS00008 

14 

->20 


ASN_GLYCOSYLATI0N 

GLYCOSAMINOGLYCAN 

CAMP_PHOSPHO_SITE 

CAMP_PHOSPHO_SITE 

CAMP_PHOSPHO_SITE 

CAMP_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHOSITE 

PKCPHOS P HO_S I T E 

PKC_PHOS PHOS I TE 

PKC__ PHOSPHORS I TE 

PKC_PH0SPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOS PHOS I TE 

PKC_PHOSPHO_SIT£ 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_S I TE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_S ITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_S ITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

TYR_PHOSPHO_SITE 

TYR_PHOSPHO_SITE 

TYR_PHOSPHO_S ITE 

MYRISTYL 


PDOCOOOOl 

PDOC00002 

PDOC00004 

PDOC00004 

PDOC00004 

PDOC00004 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00007 

PDOC00007 

PDOC00007 

PDOC00008 


(No Pfam data available for DKF2phfbr2 64c4.2) 
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DKFZphfbr2_64h6 
group: brain derived 

proteins^^-^^^^ encodes a novel 176 andno acid protein with similarity to predicted yeast 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

genes^*' Protein can find application in studying the expression profile of brain-specific 

similarity to S.pombe SPBC337.09 and S.cerevisiae YER044c 

complete cDNA, complete cds accoring to YER044c/SPBC337 09, 
start at Bp 111, EST hits 

Sequenced by Qiagen 

Locus: ymap="14*' 

Insert length: 1212 bp 

Poly A stretch at pos. 1192, polyadenylation signal at pos. 1168 

1 GGGCTGGAGC TGTCCTGGGG GAGCTTGTTT GCGGCAGCGG CTGCTGCTGC 
51 CACTGCTGTG CTGGGGGCCC GGTCGCCAGG CAAAAAGCCC TCCCACGTTT 

101 GAGGGGAGTC ATGAGCCGTT TCCTGAATGT GTTAAGAAGT TGGCTGGTTA 

151 TGGTGTCCAT CATAGCCATG GGGAACACGC TGCAGAGCTT CCGAGACCAC 

201 ACTTTTCTCT ATGAAAAGCT CTACACTGGC AAGCCAAACC TTGTGAATGG 

251 CCTCCAAGCT CGGACCTTTG GGATCTGGAC GCTGCTCTCA TCAGTGATCC 

301 GCTGCCTCTG TGCCATTGAC ATTCACAACA AGACGCTCTA TCACATCACA 

351 CTCTGGACCT TCCTCCTTGC CCTGGGGCAT TTCCTCTCTG AGTTGTTTGT 

401 CTATGGAACT GCAGCTCCCA CGATTGGCGT CCTGGCACCC CTGATGGTGG 

451 CAAGTTTCTC CATCCTGGGT ATGCTGGTCG GGCTCCGGTA TCTAGAAGTA 

501 GAACCAGTAT CCAGACAGAA GAAGAGAAAC TGAGGCCAGC ATTATCACCT 

551 CCAGGACTTT CTCGTTTTCC ACCTTGGCCA TCTTCTTCCT TCGTCGTCTC 

601 TCCCCTTTAA TTTCTTTTCT ATTCCATCAT CTGCCCTTTT ACTCACTTTT 

651 AGCCTCTTTT TTTAATTTTT AAAATTTAAA GATATGCATA CTGAAAAGTA 

701 TATAACATGT ACGTACAATT TAAAGAATAA TTTTAAAGTG AATACTACGT 

751 AACTCCATCC AAGTCAAGAA ATTGCCAGCT TCTCGGAAGC CCACTGTGTC 

801 TCCTTCCCCT ACCTGCAACC TCTTCCAGGC TCCCTTTTCC AGCCTTCCCC 

851 TTTTTCCCTT TTATTTTCAT GCCTTGATTT GACTTGTGTG GTGGGAACAT 

901 GTGAACTATG AAACTTAAAC CTGCTGCCCA CCCAGAGCAG CTGTGACCAA 

951 GGGCTGCCTC AAGGGGTTGT CCACGCAGGT TGGGCTCCTC TCTGCTGCTG 
1001 GACCCAAGAC TCTGAACCTT CCAAGGGACA GGCAGTTCTT CTGAGAAGGG 
1051 CTCCCCTGTG TGTGAGCAAG ACCACAGCTC TCCTTCTATC TACAGATGCA 
1101 TGAGGGTTGG AAGAGTCTGG GCTGTTTTTA GACCTTCTGG TCAGCTGTAT 
1151 TTGTGTAACA ACTTTTGTAA TAAATAGAAA AACCCTCTGC TCAAAAAAAA 
1201 AAATU^AAAAA AA 

BLAST Results 


Entry G38566 from database EMBL: 

SHGC-64295 Human Homo sapiens STS genomic, sequence tagged site 
Score - 1398, P = 1.4e-56, identities - 284/288 


Medline entries 

No Medline entry 

Peptide information for frame 3 

ORF from 0 bp to 530 bp; peptide length: 177 
Category: similarity to unknown protein 
Classification: unclassified 

1 AGAVLGELVC GSGCCCHCCA GGPVARQKAL PRLRGVHSRF LNVLRSWLVM 
51 VSIIAMGNTL QSFRDHTFLY EKLYTGKPNL VNGLQARTFG IWTLLSSVIR 
101 CLCAIDIHNK TLYHITLWTF LLALGHFLSE LFVYGTAAPT IGVLAPLMVA 
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151 SFSILGMLVG LRYLEVEPVS RQKKRN 


BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKF2phfbr2_64h6, frame 3 

TREMBL:SPBC337__9 gene: "SPBC337 .09"; product: "conserved hypothetical 
protein"; S.porabe chromosome II cosraid c337., N = 1, Score =224, P = 
1.4e-18 

PIR:S50547 hypothetical protein YER044c - yeast (Saccharomyces 
cerevisiae), N = 1, Score « 192, P = 3.4e-15 


>TREMBL:SPBC337_9 gene: "SPBC337 . 09- ; product: "conserved hypothetical 
protein"; S.pombe chromosome II cosmid c337. 
Length =136 

HSPs : 

Score = 224 (33,6 bits). Expect = 1.4e-18, P = 1.46-18 
Identities = 49/113 (43%), Positives = 74/113 (65%) 

Query: 42 NVLRSWLVMVSIIAMGNTLQSFRDHTFLYEKLYTGKPNLVNGLQARTFGIWTLLSSVIRC 101 

+++ W V+VS+ A+ NT+QSF L +++Y+ N VNGLQ RTFGIWTLLS+++R 

Sbjct: 11 SLVAKWNVVVSVAALFNTVQSFLTPK-LTKRVYSNT-NEVNGLQGRTFGIWTLLSAIVRF 68 

Query: 102 LCAIDIHNKTLYHITLWTFLLALGHFLSELFVYGTAAPTIGVLAPLMVASFSI 154 

CA I N +Y + T+ LA HFLSE ++ T G+L+P++V++ SI 

Sbjct: 69 YCAYHITNPDVYFLCQCTYYLACFHFLSEWLLFRTTNLGPGLLSPIWSTVSI 121 


Pedant information for DKFZphfbr2_64h6, frame 3 


Report for DKFZphfbr2_64h6 . 3 


[LENGTH J 176 

[MW] 19359.31 

[plj 9.53 

(HOMOL) TREMBL:SPBC337_9 gene: "SPBC337. 09"; product: "conserved hypothetical protein" 

S.pombe chromosome II cosmid c337. 2e-17 

[FUNCAT) 99 unclassified proteins [S. cerevisiae, YER044c} 7e-16 

[KW] TRANSMEMBRANE 2 

(KW] L0W_COMPLEXITY 7.39 % 

SEQ AGAVLGELVCGSGCCCHCCAGGPVARQKALPRLRGVMSRFLNVLRSWLVMVS 1 1 AMGNTL 

SEG xxxxxxxxxxxxx 

PRD ccceeeeeeeeccceeeeccccccccccccccccchhhhhhhhhhhhhhheeeecccccc 
MEM MMMMMMMMMMMMMMMMM .... 

SEQ QSFRDHTFLYEKLYTGKPNLVNGLQARTFGIWTLLSSVIRCLCAIDIHNKTLYHITLWTF 

SEG 

PRD ccccchhhhhhhhhhcccccccccccccccchhhhhhhhhhhhhhhccccceeeehhhhh 

MEM 


SEQ LLALGHFLSELFVYGTAAPTIGVLAPLMVASFSILGMLVGLRYLEVEPVSRQKKRN 

SEG 

PRD hhhhhhhhhhhhhhhccccccccccceeehhhhhhhhhhhheeeeecccccccccc 

MEM MMMMMMMMMMMMMMMMM 


(No Prosite data available for DKFZphfbr2_64h6. 3) 
(No Pfam data available for DKFZphfbr2_64h6. 3) 
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DKFZphfbr2_64jl8 

group: Intracellular transport and trafficking 

ssj'sjrs;:%5 si2K^vi?;r ;:s;: ;srj'=:!4: - — ' 

Strong similarity to dog signal peptidase (EC 3.4.99.-) 

complete cDNA, complete cds, potential start at Bp 109, EST hits, 

Sequenced by Qiagen 

Locus: unknown 

Insert length: 690 bp 

Poly A stretch at pos . 666, polyadenylation signal at pos. 646 

J ^.n"^^^^^^^ GCGCACCGCA GACGGCGCGG ATCGCAGGGA GCCGGTCCGC 

51 CGCCGGAACG GGAGCCTGGG TGTGCGTGTG GAGTCCGGAC TCGTGGGAGA 
101 CGATCGCGAT GAACACGGTG CTGTCGCGGG CGAACTCACT GTTCGCCTTC 
oni JSS^I^nS^^ TGATGGCGGC GCTCACCTTC GGCTGCTTCA TCACCACCGC 
201 CTTCAAAGAC AGGAGCGTCC CGGTGCGGCT GCACGTCTCG CGGATCATGC 
11^ l^^:^'^^'^ AGAAGATTTC ACTGGACCTA GAGAAAGAAG TGATCTGGGA 
301 TTTATCACAT CTGATATAAC TGCTGATCTA GAGAATATAT TTGATTGGAA 
351 TGTTAAGCAG TTGTTTCTTT ATTTATCAGC AGAATATTCA ACAAAAAATA 
401 ATGCTCTGAA CCAAGTrGTC CTATGGGACA AGATTGTTTT GAGAGGTGAT 
451 AATCCGAAGC TGCTGCTGAA AGATATGAAA ACAAAATATT TTTTCTTTGA 
501 CGATGGAAAT GGTCTCAAGG GAAACAGGAA TGTCACTTTG ACCCTGTCTT 
551 GGAACGTCGT ACCAAATGCT GGAATTCTAC CTCTTGTGAC AGGATCAGGA 
601 CACGTATCTG TCCCATTTCC AGATACATAT GAAATAACGA AGAGTTATTA 
651 AATTATTCTG AATTTGAAAC AAAAAAAAAA AAAAAAAAAA 


BLAST Results 


No BLAST result 


Medline entries 


89034208: 

cDNA-derived primary structure of 
microsomal 

signal peptidase complex. 


the glycoprotein component of canine 


Peptide information for frame 1 


ORF from 109 bp to 648 bp; peptide length: 180 
Category: strong similarity to known protein 
Prosite motifs: TONE DEPENDENT REC 1 (1-58) 
RGD (148-151) " - _ 1 


1 MNTVLSRANS LFAFSLSVMA ALTFGCFITT AFKDRSVPVR LHVSRIMLKN 

51 VEDFTGPRER SDLGFITSDI TADLENIFDW NVKQLFLYLS AEYSTKNNAL 

101 NQVVLWDKIV LRGDNPKLLL KDMKTKYFFF DDGNGLKGNR NVTLTLSWNV 

151 VPNAGILPLV TGSGHVSVPF PDTYEITKSY "^''''''''''^ NVTLTLSWNV 


BLASTP hits 
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No BLASTP hits available 

Alert BLASTP hits for DKF2phfbr2_64 j 18, frame I 
No Alert BLASTP hits found 

Pedant information for DKrZphfbr2_64 jl8, frame 1 


Report for DKFZphf br2_64 j 18 . 1 


{LENGTH) 

180 


[MW] 

20253.39 


{pll 

8.66 


IHOMOLJ 

PIR:A31788 signal peptidase (EC 3.4.99.-) (SPC 22/23) - dog le-100 

(FUNCATJ 

30.07 organization of 

endoplasmatic reticulum [S. cerevisiae, ^ 

6e-15 


{FUNCATl 

06.07 protein modification (glycolsylation, acylation, myristylation. 

palmitylation, 

farnesylation and processing) (S. cerevisiae, YLR066w] 6e-15 

[PIRKW) 

transmembrane protein 

2e-92 

[PXRKW) 

glycoprotein 2e-92 


(PIRKW) 

hydrolase 2e-92 


fPROSITE) 

RGD 1 


(PROSITEJ 

MYRISTYL 2 


[PROSITE) 

PROK AR_L I POPROTE I N 

1 

(PROSITE) 

TONE DEPENDENT REC 1 

1 

[PROSITEl 

PKC PHOSPHO SITE 

1 

[PROSITEJ 

ASN^GLYCOSYLATION 

1 

IKW] 

Alpha Beta 


(KW] 

SIGNAL_PEPTIDE 32 



SEQ. MNTVLSRANSLFAFSLSVMAALTFGCFITTAFKDRSVPVRLHVSRIMLKNVEDFTGPRER 

PRO ccccccchhhhhhhhhhhhhhhhhhhhhheeecccccceeehhhhhhhhhhhhccccccc 

SEQ SDLGFITSDITADLENIFDVmVKQLFLYLSAEYSTKNNALNQWLWDKIVLRGDNPKLLL 

PRD ccccchhhhhhhhccccccchhhhhhhhhhhhhhhccccceeeeeeeceeecccchhhhh 

SEQ KDMKTKYFFFDDGNGLKGNRNVTLTLSWNWPNAGILPLVTGSGHVSVPFPDTYEITKSY 

PRD hhcccceeeeecccccccccceeeeeeeecccccceeeeeccccceeeeccccccccccc 


Prosite for DKFZphfbr2_64 jl8. 1 


PSOOOOl 141->145 ASN_GLycOSYLATION PDOCOOOOl 

PS00005 94->97 PKC_PHOSPH0_SITE PDCXOOOOS 

PS00008 25->31 MYRISTYL PDOC00008 

PS0Q008 135->141 MYRISTYL PDOC00008 

PS00013 16->27 PROKAR_LIPOPROTEIN PDOC00013 

PS00016 112->115 RGD PDOC00016 

PS00430 l->22 TONB_DEPENDENT__REC 1 PDOC00354 


(No Pfam data available for DKFZphfbr2_64 jlB . 1) 
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DKFZphfbr2_64k24 


group: transmembrane proteins 

proteins!^""^^''^^ ^"'''''^^^ ^ "''''^^ ^""^"^ protein with weak similarity to several known 

The novel protein contains 5 transmembrane regions. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes and as a new marker for neuronal cells. ^pc^xi-ic 

similarity to AMACl "testicular condensing enzyme" ; 
membrane regions: 5 

Summary DKF2phfbr2_64k24 encodes a novel 412 amino acid protein, with 
similarity to AMACl**; product: "testicular condensing enzyme 

similarity to AMACl "testicular condensing enzyme" 

complete cDNA« complete cds, EST hits 

Sequenced by Qiagen 

Locus: unknown 

Insert length; 1958 bp 

Poly A stretch at pos. 1939, polyadenylation signal at pes. 1918 


1 GGGCCCGCCT CGATTTTCCC AGGCGAGGGC ACGCCCGCGT CAGTCGCCTC 
51 CGGGGCACCT TCCTCGCCAC GACACGCAGG TAACCGGGCC CCGGGAGCCG 
101 GTCGGCGGCG GCGGACTGGG ACCTTGATCC TGCCTGCCCG GCCGCCCGAC 
151 AAGGGAATGA GAGCGGACCC CGAACTCCAC ACACCCGCGT TTAGCCGCCA 
201 CACCTAAGGG GCAGAACAGT CTTTTTGGGT AAGGGCCGGG CTGGGGGCGA 
251 CGCGCCCCGC CCGCTTTGCA GACTTCGGGG TGCTCTGCAC GACGCCTGAA 
301 AGGCCGCGGG GCCCGCATTT CTCTGTGCTG CCCTCCTGGA GAACCGGGAC 
351 ACGGGGACGG GAGGGCCAGC ATCGGCTACG GCCCGGTTTC CCGTTTCTTT 
401 CCTCTGTCGC GTCTGGGCCC TCCTGCAGCG TCCATGATGA AGGCCAGGGG 
451 CTGTTGCTTT CCTCTCGCCC AGTAGCCAAC CCAAGCAAGG GAATTAATTA 
501 TCTGAAGAAA TGGATACTTC TCCCTCCAGA AAATATCCAG TTAAAAAACG 
551 GGTGAAAATA CATCCCAACA CAGTGATGGT GAAATATACT TCTCATTATC 
601 CCCAGCCTGG CCATGATGGA TATGAAGAAA TC7UVTGAAGG CTATGGGAAT 
651 TTTATGGAGG AAAATCCAAA GAAAGGTCTG CTGAGTGAAA TGAAAAAAAA 
701 AGGGAGAGCT TTCTTTGGAA CCATGGATAC CCTACCTCCA CCAACAGAAG 
751 ACCCAATGAT CAATGAGATT GGACAATTCC AGAGCTTTGC AGAAAAAAAC 
801 ATTTTTCAAT CCCGAAAAAT GTGGATAGTG CTGTTTGGAT CTGCTTTGGC 
851 TCATGGATGT GTAGCTCTTA TCACTAGGCT TGTTTCTGAT CGGTCTAAAG 
901 TTCCATCTCT AGAACTGATT TTTATCCGTT CTGTTTTTCA GGTCTTATCT 
951 GTGTTAGTTG TGTGTTACTA TCAGGAGGCC CCCTTTGGAC CCAGTGGATA 
1001 CAGATTACGA CTCTTCTTTT ATGGTGTATG CAATGTCATT TCTATCACTT 
1051 GTGCTTATAC ATCATTTTCA ATAGTTCCTC CCAGCAATGG GACCACTATG 
1101 TGGAGAGCCA CAACTACAGT CTTCAGTGCC ATTTTGGCTT TTTTACTCGT 
1151 AGATGAGAAA ATGGCTTATG TTGACATGGC TACAGTTGTT TGCAGCATCT 
1201 TAGGTGTTTG TCTTGTCATG ATCCCAAACA TTGTTGATGA AGACAATTCT 
1251 TTGTTAAATG CCTGGAAAGA AGCCTTTGGG TACACCATGA CTGTGATGGC 
1301 TGGACTGACC ACTGCTCTCT CAATGATAGT ATACAGATCC ATCAAGGAGA 
1351 AGATCAGCAT GTGGACTGCG CTGTTTACTT TTGGTTGGAC TGGGACAATT 
1401 TGGGGAATAT CTACTATGTT TATTCTTCAA GAACCCATCA TCCCATTAGA 
1451 TGGAGAAACC TGGAGTTATC TCATTGCTAT ATGTGTCTGT TCTACTGCAG 
1501 CATTCTTAGG AGTTTATTAT GCCTTGGACA AATTCCATCC AGCTTTGGTT 
1551 AGCACAGTAC AACATTTGGA GATTGTGGTA GCTATGGTCT TGCAGCTTCT 
1601 CGTGCTGCAC ATATTTCCTA GCATCTATGA TGTTTTTGGA GGGGTAATCA 
1651 TTATGATTAG TGTTTTTGTC CTTGCTGGCT ATAAACTTTA CTGGAGGAAT 
1701 TTAAGAAGGC AGGACTACCA GGAAATACTA GACTCTCCCA TTAAATGAAT 
1751 ACCTGATTAT TATTGTCTCA TTAATGTTCA GTTATTAATA TGTATACTGC 
1801 CATTTTAATG TTTACCTATG AATGTCTTTT GTGTTATATA ACTGACAGAG 
1851 TGCTATAAAA TATATAATAT ATACAAATGC AGAAAATTTA TTCTAGTCTA 
1901 ATATATTCAA ATACAAATAT TAAATATATG aaatacgtta aaaaaaaaaa 
1951 AAAAAAAA 


BLAST Results 


No BLAST result 
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Medline entries 


No Medline entry 


Peptide information for frame 3 


ORF from 510 bp to 1745 bp; peptide length: 412 
Category: similarity to known protein 


1 MDTSPSRKYP VKKRVKIHPN TVMVKYTSHY PQPGDDGYEE INEGYGNFME 
51 ENPKKGLLSE MKKKGRAFFG TMDTLPPPTE DPMINEIGQF QSFAEKNIFQ 
101 SRKMWIVLFG SALAHGCVAL ITRLVSDRSK VPSLELIFIR SVFQVLSVLV 
151 VCYYQEAPFG PSGYRLRLFF YGVCNVISIT CAYTSFSIVP PSNGTTMWRA 
201 TTTVFSAILA FLLVDEKMAY VDMATWCSI LGVCLVMIPN IVDEDNSLLN 
251 AWKEAFGYTM TVMAGLTTAL SMIVYRSIKE KISMWTALFT FGWTGTIWGI 
301 STMFILQEPI IPLDGETWSY LIAICVCSTA AFLGVYYALD KFHPALVSTV 
351 QHLEIVVAMV LQLLVLHIFP SIYDVFGGVI IMISVFVLAG YKLYWRNLRR 
401 ODYQEILDSP IK 

BLAST? hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_64k24 , frame 3 

TREMBLNEW:AF016712_1 gene: "AMACl"; product: "testicular condensing 
enzyme"; Mus musculus testicular condensing enzyme (AMACl) mRNA, 
complete cds., N = 1, Score = 191, P = l.9e-12 

TREMBL:BMAJ733_6 product: "hypothetical protein"; Bacillus megateriura 
bgaM gene, N = 1, Score = 137, p = 1.6e-06 

PIR:G71841 hypothetical protein jhpll55 - Helicobacter pylori {strain 
J99), N = 1, Score = 129, P = l,3e-05 


>TREMBLNEW:AF016712_1 gene: "AMACl"; product: "testicular condensing 

enzyme"; Mus musculus testicular condensing enzyme (AMACl) mRNA, complete 
cds. 

Length = 362 

HSPs : 

Score = 191 (28.7 bits). Expect = 1.9e-12, P = 1.96-12 
Identities = 39/105 (37%), Positives =^ 66/105 (62%) 

Query: 289 FTFGWTGTIWGISTMFILQEPIIPLDGETWSYLIAICVCSTAAFLGVYYALDKFHPALVS 348 

F FG G + + +F+LQ P++P D +WS ++A+ + + +F+ V YA+ K HPALV 
Sbjct: 248 FLFGLVGLMVSVPGLFVLQTPVLPQDTLSWSCWAVGLLALVSFVCVSYAVTKAHPALVC 307 

Query: 349 TVQHLEIVVAMVLQLLVLH--IFPSIYDVFGGVIIMrsVFVLAGYKL 393 

V H E+WA++LQ VL+ + PS D+ G +++ S+ ++ L 
Sbjct: 308 AVLHSEVVVALMLQYYVLYETVAPS--DIMGAGWLGSIAIITAQNL 352 


Pedant information for DKFZphfbr2_64k24, frame 3 


Report for DKrZphfbr2_64k24 . 3 


[LENGTH) 412 

(MWj 46449.87 

tpl) 6.99 

[HOMOL] TREMBL:AF016712_^1 gene: "AMACl"; product: "testicular condensing enzyme"; Mus 

musculus testicular condensing enzyme (AMACl) mRNA, complete cds. 8e-14 

[PROSITE) MYRISTYL 6 

tPROSITE] CK2_PH0SPH0_SITE 3 

[PROSITE] PKC_PHOSPHO_SITE 4 

[PROSITE J ASN_GLYCOSYLATION 1 

[KW] TRANSMEMBRANE 5 

SEQ MDTSPSRKYPVKKRVKIHPNTVMVKYTSHYPQPGDDGYEEINEGYGNFMEENPKKGLLSE 
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•PRD 
MEM 

S£Q 
PRD 
MEM 

SEQ 
PRD 
MEM 

SEQ 
PRD 
MEM 

SEQ 
PRD 
MEM 

SEQ 
PRD 
MEM 

SEQ 
PRD 
MEM 


ccccccccccccceeeecccceeeeeecccccccccceeeeecccccccccccccchhhh 


MKKKGRAFFGTMDTLPPPTEDPMINEIGQFQSFAEKNIFQSRKMWIVLFGSALAHGCVAL 

hhhhcceeecccccccccccccceeeecccchhhhhhhhccceeeeeeeccccchhhhhc 


ITRLVSDRSKVPSLELIFTRSVFQVLSVLVVCYYQEAPFGPSGYRLRLFFYGVCNVISIT 
chhhhhccccccccchhhhhhhhhhhheeeeeeeccccccccceeeeeeeecceeeeeee 
MMMMMMMMMMMMMMMMM 

CA YTS FSI VPPSNGTTMWRATTTVFSAI LAFLLVDEKMAYVDMATVVCS r LGVCLVMI PN 
eccceeeeccccccceeeeeehhhhhhhhhhhhhhhhheeeeeeeeeeeeeeeeeeeecc 


IVDEDNSLLNAWKEAFGYTMTVMAGLTTALSMIVYRSIKEKISMWTALFTFGWTGTIWGI 
cccccchhhhhhhhhhhheeeeeeehhhhhhhcchhhhhhhhhhhhccccccccceeeec 
MMMMMMMMMMMMMMMMMMM 

STMFI LQEP 1 1 PLDGET WS Y L I AI C VCSTAAFLGVY YALDKFHPAL VSTVQHLEI WAMV 
ceeeeeecccccccccceeeeeccchhhhhhhhhccccccccccchhhhhhhhhhhhhhh 
MMMMMMMMMM MMMMMMMMMMMMMHMMMMMMMMM MMMMMMMMMMMMMN 

LQLLVLHIFPSIYDVFGGVIIMISVFVLAGYKLYWRNLRRQDYQEILDSPIK 
hhhhhhhhhccccccceeeeeeeeeecccccchhhhhhhhhhhhhhhccccc 
MMMMMMM. . . . MMMMMMMMMMMMMMMMMMMMM 


Prosite for DKFZphfbr2_64k24 . 3 


PSOOOOl 

193->197 

ASN GLYCOSYLATION 

PDOCOOOOl 

PS00005 

6->9 

PKC PHOSPHO 

SITE 

PDOC00005 

PS00005 

101->104 

PKC PHOSPHo' 

"site 

PDOC00005 

PS00005 

126->129 

PKC PHOSPHO" 

"site 

PDOC00005 

PS00005 

277->280 

PKC PHOSPHO" 

"site 

PDOC00005 

PS00006 

92->96 

CK2 PHOSPHO" 

"site 

PDOC00006 

PSOO0O6 

277->281 

CK2 PHOSPHO' 

'site 

PDOC00006 

PS00006 

371->375 

CK2 PHOSPHO" 

'site 

PDOC00006 

PS0O0O8 

70->76 

MYRISTYL 


PDOC00008 

PSOO0O8 

88->94 

MYRISTYL 


PDOC00008 

PSOO0O8 

110->116 

MYRISTYL 


PDOC00008 

PS0O0O8 

265->271 

MYRISTYL 


PDOC00008 

PS00008 

295->301 

MYRISTYL 


PDOC00008 

PS00008 

334->340 

MYRISTYL 


PDOC00008 


(No Pfam data available for DKFZphfbr2 64k24.3) 
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DKFZphfbr2_6al7 
group: brain derived 

DKr2phfbr2__6al7 encodes a novel 100 amino acid protein with very weak similarity to human 
finger protein zfOCl. ^ 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

gener*' ^^"^ application in studying the expression profile of brain-specific 

complete cDNA, complete cds, EST hits 

Sequenced by AGOWA 

Locus: unknown 

Insert length: 1424 bp 

Poly A stretch at pos. 1405, polyadenylation signal at pos. 1389 

1 GGGACTGAGG GGGTGGGCTT ACTCCCTGGG CAGTCTTGGG GGCCAGAGCT 
51 GAGGCCAGTC CATATTACAG TGGCTGGGCT GTTTTTTTCA GTAGCCCCTA 

101 GCATTGGCTG GGATTCCTGT TCCTGGGTGC GCCTCCACCT CCCTTCTGAT 

151 GCTTCCTGGC TATGGTGGGG TGGGAACCTC AGTTTCCCCC AAAGTCTTCC 

201 CTGGATGCTG GCTTCAGGTT GAAGACCCTG GTTCTTCCAG TTCCTCACGG 

251 GTTAGGTAGG GGCTCCTGCA TCACCTTCAG AATCAGTTCC AACCCCCACT 

301 CTCCTTAGGC TTTGTGCTCT GCTCTGCCCT GCCAGGCTGC CCTTGTCCAT 

351 GTGAGTAGCA TGGGCGGGTG GTGGGGACGG CAGTGGTGAT GAAGGGGGTG 

401 CACCACAGGC CTCATGAAGC AGTTCCCACA TGGGCGTGTG GCTGGGGCGT 

4 51 GGCCACCACA GAGCACATGG CTGTGTCTAG GCGCTUVGCAC TTTAGCAGTA 

501 TCTGTTTACA TGCGCAAGGA TCAAGCCGAC TACCTGTGCT GTCTACTGGG 

551 ACAGCAGTCT CCGAGCTACT CCGTACCTCC CTCTGCCAGG TCGTGGAGTT 

601 AGGCCCCAGT CCCTACTTGT CACTGGTTCC CACTGTGCTC CTAACTGTGC 

651 AGCACCTGGG AGCTCTGGCC TGGGGCTGGA GGCCCTGGTA GGAGCTGCAG 

701 TTGGAGGCCG TTCTGTGCCC AGCAGCGGTG AGCGGCTCCC ATGGGCCCTG 

751 TGTCTGCAGG GAGCCAGGGC TGCGGCACAT GTGCTGTGAA ACTGGCACCC 

801 ACCTGGCGTG CTGCTGCCGC CACTTGCTTC CTGCAGCACC TCCTACCCTG 

851 CTCCGTGTCC TCCCTCTCCC CGCGCCTGGC TCAGGAGTGC TGGAAAAGCT 

901 CACGCCTCGG CCTGGGAGCC TGGCCTCTTG ATATACCTCG AGCTTCCCCT 

951 GTGCTCCCCA GCCCCAGGAC CACTGGCCCC TTGGCCTGAG GGGCTGGGGG 
1001 CCCCACGACC TGCAGCGTCG AGTCCGGGAG AGAGCCCGGA GCGGCGTGCC 
1051 ATCTCGGCTC GGCCTTGCTG AGAGCCTCCG CCCTGGCTTT CTCCCTGTCT 
1101 GGTTTCAGTG GCTCACGTTG GTGCTACACA GCTAGAATAG ATATATTTAG 
1151 AGAGAGAGAT ATTTTTAAGA CAAAGCCCAC AATTAGCTGT CCTTTAACAC 
1201 CGCAGAACCC CCTCCCAGAA GAAGAGCGAT CCCTCGGACG GTCCGGGCGG 
1251 GCACCCTCAG CCGGGCTCTT TGCAGAAGCA GCACCGCTGA CTGTGGGCCC 
1301 GGCCCTCAGA TGTGTACATA TACGGCTATT TCCTATTTTA CTGTTCTTCA 
1351 GATTTAGTAC TTGTAAATAA ACACACACAT TAAGGAGAGA TTAAACATTT 
1401 TTGCCAAAAA AAAAAAAAAA AAAA 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 2 


ORF from 389 bp to 688 bp; peptide length: 100 
Category: putative protein 


1 MKGVHHRPHE AVPTWACGWG VATTEHMAVS RRKHFSSICL HAQGSSRLPV 
51 LSTGTAVSEL LRTSLCQVVE LGPSPYLSLV PTVLLTVQHL GALAWGWRPW 


BLASTP hits 
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Entry S70007 from database PIR: 

finger protein zfOCl - human (fragment) 

Length = 183 

Score - 62 (21.8 bits). Expect = 0.24, Sum P(2) 0,22 
Identities « 18/47 (38%), Positives - 24/47 (51%) 


Alert BLASTP hits for DKFZphfbr2_6al7, frame 2 
NO Alert BLASTP hits found 

Pedant information for DKFZphfbr2_6al7, frame 2 

Report for DKFZphf br2_6al7 .2 


[LENGTH) 

tMW) 

IpIJ 

I PROS I TE) 
(PROSITE) 
[KW] 


100 

10944 .82 
9.49 

MYRISTYL 2 
PKC_PHOSPHO_SITE 
Alpha_Beta 


SEQ MKGVHHRPHEAVPTWACGWGVATTEHMAVSRRKHFSSICLHAQGSSRLPVLSTGTAVSEL 

PRD cccccccccccccccccccccchhhhhhhhhhcccccceeeccccccceeecccchhhhh 

SEQ LRTSLCQWELGPSPYLSLVPTVLLTVQHLGALAWGWRPW 

PRD hhhhheeeeecccccceeecchhhhhhhhhchhhhhcccc 


Prosite for DKFZphfbr2_6al7 . 2 


PS00005 30->33 PKC_PHOSPHO SITE PDOC00005 

PS00005 45->48 PKCPHOSPHO^SITE PDOC00005 

PS00008 20->26 MYRISTYL PDOC00008 

PS00008 54->60 MYRISTYL PDCX:00008 


(No Pfam data available for DKFZphf br2_6al7 .2) 
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DKFZphfbr2_6b24 


group: metabolism 

DKFZphf kd2_6b24 encodes a novel 334 amino acid protein with similarity to several bacterial 
dTDP-4-dehydrorhamnose reductases (EC 1.1.1,133). 

The novel protein seems to be a human enzyme similar to dTDP-4-dehydrorhamnose reductases. EC 
1.1.1.133 catalises the reaction: dTDP-6-deoxy-L-mannose + NADP(+) <=> dTDP-4-dehydro-6-deoxy- 
L-mannose + NADPH, 

The new protein can find application in modulation of rhamnose metabolism and as a new enzyme 
for biotechnologic production processes. 


similar to dTDP-6-deoxy-L-mannose-dehydrogenases 
complete cDNA, EST hits, complete cds 

Nucleotide sugars metabolism seems to be a dehydrogenase 
localisation: region of primer A missing 

Sequenced by AGOWA 

Locus: /map=''5" 

Insert length: 2054 bp 

Poly A stretch at pos . 2028, polyadenylation signal at pes. 2015 


1 GGGGGAGGCC CGCGTCGATC CTGGGTTGGA GGAGGTGGCG GCCGCTGAGG 
51 CTGCGGCGTG AAGACGGCGG GCATGGTGGG GCGGGAGAAA GAGCTCTCTA 
101 TACACTTTGT TCCCGGGAGC TGTCCCCTGG TGGAGGAGGA AGTTAACATC 
151 CCTAATAGGA GGGTTCTGGT TACTGGTGCC ACTGGGCTTC TTGGCAGAGC 
201 TGTACACAAA GAATTTCAGC AGAATAATTG GCATGCAGTT GGCTGTGGTT 
251 TCAGAAGAGC AAGACCAAAA TTTGAACAGG TTAATCTGTT GGATTCTAAT 
301 GCAGTTCATC ACATCATTCA TGATTTTCAG CCCCATGTTA TAGTACATTG 
351 TGCAGCAGAG AGAAGACCAG ATGTTGTAGA AAATCAGCCA GATGCTGCCT 
401 CTCAACTTAA TGTGGATGCT TCTGGGAATT TAGCAAAGGA AGCAGCTGCT 
451 GTTGGAGCAT TTCTCATCTA CATTAGCTCA GATTATGTAT TTGATGGAAC 
501 AAATCCACCT TACAGAGAGG AAGACATACC AGCTCCCCTA AATTTGTATG 
551 GCAAAACAAA ATTAGATGGA GAAAAGGCTG TCCTGGAGAA CAATCTAGGA 
601 GCTGCTGTTT TGAGGATTCC TATTCTGTAT GGGGAAGTTG AAAAGCTCGA 
651 AGAAAGTGCA GTGACTGTTA TGTTTGATAA AGTGCAGTTC AGCAACAAGT 
701 CAGCAAACAT GGATCACTGG CAGCAGAGGT TCCCCACACA TGTCAAAGAT 
751 GTGGCCACTG TGTGCCGGCA GCTAGCAGAG AAGAGAATGC TGGATCCATC 
801 AATTAAGGGA ACCTTTCACT GGTCTGGCAA TGAACAGATG ACTAAGTATG 
851 AAATGGCATG TGCAATTGCA GATGCCTTCA ACCTCCCCAG CAGTCACTTA 
901 AGACCTATTA CTGACAGCCC TGTCCTAGGA GCACAACGTC CGAGAAATGC 
951 TCAGCTTGAC TGCTCCAAAT TGGAGACCTT GGGCATTGGC CAACGAACAC 
1001 CATTTCGAAT TGGAATCAAA GAATCACTTT GGCCTTTCCT CATTGACAAG 
1051 AGATGGAGAC AAACGGTCTT TCATTAGTTT ATTTGTGTTG GGTTCTTTTT 
1101 TTTTTTAAAT GAAAAGTATA GTATGTGGCC CTTTTTAAAG AACAAAGGAA 
1151 ATAGTTTTGT ATGAGTACTT TAATTGTGAC TCTTAGGATC TTTCAGGTAA 
1201 ATGATGCTCT T6CACTAGTG AAATTGTCTA AAGAAACTAA AGGGCAGTCA 
1251 TGCCCTGTTT GCAGTAATTT TTCTTTTTAT CATTATGTTT GTCCTGGCTA 
1301 AACTTGGAGT TTGAGTATAG TAAATTATGA TCCTTAAATA TTTGAGGGTC 
1351 AGGATGAAGC AGATCTGCTG TAGACTTTTC AGATGAAATT GTTCATTCTC 
1401 GTAACCTCCA TATTTTCAGG ATTTTTGAAG CTGTTGACCA TTTCATGTTG 
14 51 ATTATTTTAA ATTGTGTGGA ATAGTATAAA AATCATTGGT GTTCATTATT 
1501 TGCTTTGCCT GAGCTCAGAT CAAAATGTTT GAAGAAAGGA ACTTTATTTT 
1551 TGCAAGTTAC GTACAGTTTT TATGCTTGAG ATATTTCAAC ATGTTATGTA 
1601 TATTGGAACT TCTACAGCTT GATGCCTCCT GCTTTTATAG CAGTTTATGG 
1651 GGAGCACTTG AAAGAGCGTG TGTACATGTA TTTTTTTTCT AGGCAAACAT 
1701 TGAATGCAAA CGTGTATTTT TTTAATATAA ATATATAACT GTCCTTTTCA 
1751 TCCCATGTTG CCGCTAAGTG ATATTTCATA TGTGTGGTTA TACTCATAAT 
1801 AATGGGCCTT GTAAGTCTTT TCACCATTCA TGAATAATAA TAAATATGTA 
1851 CTGCTGGCAT GTAATGCTTA GTTTTCTTGT ATTTACTTCT TTTTTTTAAA 
1901 TGTAAGGACC AAACTTCTAA ACTAATTGTT CTTTTGTTGC TTTAATTTTT 
1951 AAAAATTACA TTCTTCTGAT GTAACATGTG ATACATACAA AAGAATATAG 
2001 TTTAATATGT ATTGAAATAA AACACAATAA AATTAAAAAA AAAAAAAAAA 
2051 AAAA 


BLAST Results 


Entry G37115 from database EMBL: 
SHGC-56899 Human Homo sapiens STS genomic. 
Score = 446, P = 4.6e-14, identities « 90/91 
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Medline entries 


99109950: 

The metabolism of 6>deoxyhexoses in bacterial and animal 
cells . 


Peptide information for frame 1 


ORF from 73 bp to 1074 bp; peptide length: 334 
Category: similarity to known protein 


1 MVGREKELSI HFVPGSCRLV EEEVNIPNRR VLVTGATGLL GRAVHKEFQQ 
51 NNWHAVGCGF RRARPKFEQV NLLDSNAVHH IIHDFQPHVI VHCAAERRPD 
101 VVENQPDAAS QLNVDASGNL AKEAAAVGAF LIYISSDYVF DGTNPPYREE 
151 DIPAPLNLYG KTKLDGEKAV LENNLGAAVL RIPILYGEVE KLEESAVTVM 
201 FDKVQFSNKS ANMDHWQQRF PTHVKDVATV CRQLAEKRML DPSIKGTFHW 
251 SGNEQMTKYE MACAIADAFN LPSSHLRPIT DSPVLGAQRP RNAQLDCSKL 
301 ETLGIGQRTP FRIGIKESLW PFLIDKRWRQ TVFH 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_6b24 , frame 1 

PIR:T00104 probable dTDP-4~dehydrorhamnose reductase (EC 1.1.1.133) - 
Actinobacillus act inomycetemcomi tans, N - 1, Score = 293, P - 6.4e-26 

TREMBL:SSU51197_21 gene: "rhsD"; product: 

"dTDP-6-deoxy-L-mannose-dehydrogenase"; Sphingomonas 388 sphingan 
polysaccharide synthesis (spsG), (spsS) , (spsR) , glycosyl transferase 
(spsQ), (spsi), glycosyl transferase (spsK), glycosyl transferase 
(spsL), (spsJ), (spsF), (spsD), (spsC), (spsE), Urf 32, Orf 26, 
ATP-binding cassette trans>., N « 1, Score = 291, P ^ le-25 

SWISSPROT:RFBD_RHISN PROBABLE DTDP-4-DEHYDRORHAMNOSE REDUCTASE (EC 
1.1.1.133) (DTDP-4-KET0- L-RHAMNOSE REDUCTASE) ( DTDP-6-DEOXY-L-MANNOSE 
DEHYDROGENASE) (DTDP-L- RHAMNOSE SYNTHETASE)., N = 1, Score = 283, P = 
7.4e-25 


>PIR:T00104 probable dTDP-4-dehydrorhamnose reductase (EC 1.1.1.133) - 
.Actinobacillus act inomycetemconii tans 
Length « 294 

HSPs: 

Score = 293 (44.0 bits). Expect = 6.4e-26, P = 6.4e-26 
Identities = 89/276 (32%), Positives = 151/276 (54%) 

Query: 30 RVLVTGATGLLGRAVHKEFQQNNWHAVGCGFRRARPKFEQVNLLDSKAVHHIIHDFQPHV 89 

R+L+TGA G LGR++ K N + V F ++++ f + V II F+P+V 

Sbjct: 3 RLLITGAGGQLGRSLAKLLVDNGRYEV LALDFSELDITNKDMVFSIIDSFKPNV 56 

Query: 90 IVHCAAERRPDWENQPDAASQLNVDASGNLAKEAAAVGAFLIYISSDYVFDG-TNPPYR 148 

I++ AA D E + +A +NV LA+ A + ++++S+DYVFDG + y+ 

Sbjct: 57 IINAAAYTSVDQAELEVSSAYSVNVRGVQYLAEAAIRHNSAILHVSTDYVFDGYKSGKYK 116 

Query: 149 EEDIPAPLNLYGKTKLDGEKAVLENNLGAAVLRIPILYGEVEKLEESAVTVMFDKVQFSN 208 

E DI PL +YGK^-K +GE+ +L + + -J-LR +GE + V M ++ + 

Sbjct: 117 ETDIIHPLCVYGKSKAEGERLLLTLSPKSIILRTSWTFGEYGN— NFVKTML-RLAKNR 172 

Query: 209 KSANMDHWQQRFPTHVKDVATVCRQLAEKRMLDPSIK-GTFHWSGNEQMTKYEMACAIAD 267 

+ Q PT+ D+A+V Q+AEK ++ ++K G +H++G ++ Y+ A Al D 
Sbjct: 173 DILGWADQIGGPTYSGDIASVLIQIAEKIIVGETVKYGIYHFTGEPCVSWYDFAIAIFD 232 

Query: 268 AF NLPSSHLRPITDSPVLGAQRPRNAQLDCSKLE-TLGI 305 

N+P + D P L A+RP LD +K++ Gl 

Sbjct: 233 EAVAQKVLENVPLVNAITTADYPTL-AKRPANSCLDLTKIQQAFGI 277 
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Pedant information for DKFZphfbr2_6b24, frame 1 


Report for DKF2phfbr2_6b24 . 1 


[LENGTH] 334 

{MW] 37551.98 

[pi] 6.90 

[HOMOLJ PIR:T00104 probable dTDP-4-dehydrorhainnose reductase (EC 1.1.1.133) - 
Actinobacillus actinomycetemcomi tans 6e-25 

(FUNCATJ 01.06.01 lipid, fatty-acid and sterol biosynthesis [S. cerevisiae, YGLOOlcl 
6e-04 

(ECj 1.1.1.133 dTDP-4-dehydrorhamnose reductase 2e-16 

(PIRKWJ lipopolysaccharide biosynthesis 2e-16 

[PIRKWJ NADP 2e-16 

(PIRKW) oxidoreductase 2e-16 

(PIRKW) streptomycin biosynthesis le-19 

(SUPFAMJ dTDP-dihydrostreptose synthase le-20 

IPROSITE) MYRISTYL 1 

tPROSITE) CK2_PHOSPHOSITE 4 

(PROSITE) PKC_PHOSPHO_SITE 3 

(PROSITE) ASN_GLYCOSYLATION 1 

[KW) Alpha_Beta 


SEQ MVGREKELSIHFVPGSCRLVEEEVNIPNRRVLVTGATGLLGRAVHKEFQQNNWHAVGCGF 

PRD cccccceeeccccccceeeeecccccccceeeeeccccchhhhhhhhhhhccceeeeecc 

SEQ RRARPKFEQVNLLDSNAVHHIIHDFQPHVIVHCAAERRPDVVENQPDAASQLNVDASGNL 

PRD cccccccccccccchhhhhhhhhhhccceeeehhhhhhhhhhhhhhhhhhhhhhccchhh 

SEQ AKEAAAVGAFLIYISSDYVFDGTNPPYREEDIPAPLNLYGKTKLDGEKAVLENNLGAAVL 

PRD hhhhhhhhheeeeeeccccccccccccccccccccccccchhhhhhhhhccccccceeee 

SEQ RI PI LYGEVEKLEESAVTVMFDKVQFSNKSANMDHWQQRFPTHVKDVATVCRQLAEKRML 

PRD eeeeeecccccccchhhhhhhhhhhhhccceeeccccccccccchhhhhhhhhhhhhhhh 

SEQ DPSIKGTFHWSGNEQMTKYEMACAIADAFNLPSSHLRPITDSPVLGAQRPRNAQLDCSKL 

PRD cccccceeeeccccccchhhhhhhhhhhhhcccccccccccccccccccccccchhhhhh 

SEQ ETLGIGQRTPFRIGIKESLWPFLIDKRWRQTVFH 

PRD hhhhccccchhhhhhhhhhhhhhhhhhhhhcccc 


Prosite for DKF2phfbr2_6b24 . 1 


PSOOOOl 

208 

->212 

ASN 

GLYCOSYLATION 

PDOCOOOOl 

PS00005 

16->19 

PKC' 

■pHOSPHO^SITE 

PDOC00005 

PS00005 

207 

->210 

PKC" 

'PHOSPHO SITE 

PDOC00006 

PS00005 

243 

->246 

PKC" 

'PHOSPHO SITE 

PDOC00005 

PS00006 

162 

->166 

CK2" 

"PHOSPHO SITE 

PDOC00006 

PS00006 

251- 

->255 

CK2 

"PHOSPHO SITE 

PDOC00006 

PS00006 

257- 

->261 

CK2" 

"PHOSPHO SITE 

PDOC00006 

PS00006 

298->302 

CK2' 

PHOSPHO SITE 

PDOC00006 

PS00008 

314- 

->320 

MYRISTYL 

PDOC00008 


(No Pfam data available for OKFZphfbr2_6b24.1) 
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DKFZphfbr2_6i20 


group: brain derived 

DKFZphfbr2_6i20 encodes a novel 296 ainino acid protein with similarity to ribosomal protein 
LIS precursor of S. cerevisiae mitochondria. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes. 


similarity to ribosomal protein L15 precursor, mitochondrial 

complete cDNA, complete cds, EST hits 
potential miochondrial LIS ribosomal protein 

Sequenced by AGOWA 

Locus: /map='*377.5 cR from top of Chr8 linkage group" 
insert length: 1122 bp 

Poly A stretch at pos. 1099, polyadenylation signal at pos. 1071 


1 GGGGGCCCTT GAAAGTTCTT GGATCTGCGG GTTATGGCCG GTCCCTTGCA 
51 GGGCGGTGGG GCCCGGGCCC TGGACCTACT CCGGGGCCTG CCGCGTGTGA 
101 GCCTGGCCAA CTTAAAGCCG AATCCCGGCT CCAAGAAACC GGAGAGAAGA 
151 CCAAGAGGTC GGAGAAGAGG TAGAAAATGT GGCAGAGGCC ATAAAGGAGA 
201 AAGGCAAAGA GGAACCCGGC CCCGCTTGGG CTTTGAGGGA GGCCAGACTC 
251 CATTTTACAT CCGAATCCCA AAATACGGGT TTAACGAAGG ACATAGTTTC 
301 AGACGCCAGT ATAAGCCTAT GAGTCTCAAT AGACTGCAGT ATCTTATTGA 
351 TTTGGGTCGT GTTGATCCTA GTCAACCTAT TGACTTAACC CAGCTTGTCA 
401 ATGGGAGAGG TGTGACCATC CAGCCACTTA AAAGGGATTA TGATGTCCAG 
451 CTGGTTGAGG AGGGTGCTGA CACCTTTACG GCAAAAGTTA ATATTGAAGT 
501 ACAGTTGGCT TCAGAACTAG CTATTGCTGC CATTGAAAAA AATGGTGGTG 
551 TTGTTACTAC AGCCTTCTAT GATCCAAGAA GTCTGGACAT TGTATGCAAA 
601 CCTGTTCCAT TCTTTCTTCG TGGACAACCC ATTCCAAAAA GAATGCTTCC 
651 ACCAGAAGAA CTGGTACCAT ATTACACTGA TGCAAAGAAC CGTGGGTACC 
701 TGGCGGATCC TGCCAAATTT CCTGAAGCAC GACTTGAACT CGCCAGGAAG 
751 TATGGTTATA TCTTACCTGA TATCACTAAA GATGAACTCT TCAAAATGCT 
801 CTGTACTAGG AAGGATCCAA GGCAGATTTT CTTTGGTCTT GCTCCAGGAT 
851 GGGTGGTGAA TATGGCCGAT AAGAAAATCC TAAAACCTAC AGATGAAAAT 
901 CTCCTTAAGT ATTATACCTC ATGAATTCCC GTCCAAGGAA GCAGAGTTGT 
951 TAAAGAGTAC TGGAATAGGG GCTG/^^GGAT CTATATTCCC TTATTGCATT 
1001 TTCCTTATGT ATAATTTTCC AGATGGTGAT GTTACTTTTC AGTGTACTCA 
1051 TATGTCTCAT TTTCATCTAA AATTAAATGG CAGGAAACAA GGACTGCATA 
1101 GAGAAAAAAA AAAAAAAAAA AA 


BLAST Results 


Entry HS500354 from database EMBL: 
human STS Wl- 12392. 
Length =426 
Minus Strand HSPs: 

Score = 1791 (268.7 bits), Expect = l.le-74, P = l.le-74 
Identities - 375/384 (97%) 


Medline entries 


Ko Medline entry 


Peptide information for frame 1 


ORF from 34 bp to 921 bp; peptide length: 296 
Category: strong similarity to known protein 


1 MAGPLQGGGA RALDLLRGLP RVSLANLKPN PGSKKPERRP RGRRRGRKCG 
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51 RGHKGERQRG TRPRLGFEGG QTPFYIRIPK YGFNEGHSFR RQYKPMSLNR 

101 LQYLIDLGRV DPSQPIDLTQ LVNGRGVTIQ PLKRDYDVQL VEEGADTFTA 

151 KVNIEVQLAS ELAIAAIEKN GGVVTTAFYD PRSLDIVCKP VPFFLRGQPI 

201 PKRMLPPEEL VPYYTDAKNR GYLADPAKFP EARLELARKY GYILPDITKD 

251 ELFKMLCTRK DPRQIFFGLA PGWVVNMADK KILKPTDENL LKYYTS 

BLASTP hits 

Entry S63258 from database PIR: 

ribosomal protein LIS precursor, mitochondrial - yeast {Saccharomyces 

cerevisiae) 

Length = 322 

Score ^ 259 (91.2 bits). Expect = 2.0e-22, P = 2.0e-22 
Identities « 71/200 (35%), Positives = 106/200 (53%) 

Entry H70161 from database PIR: 

ribosomal protein LI 5 (rplO) - Lyme disease spirochete 
Length = 145 

Score = 173 (60.9 bits), Expect = 4.8e-13, P - 4.8e-13 
Identities « 45/140 (32%), Positives = 73/140 (52%) 


Alert BLASTP hits for DKFZphfbr2_6i20, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_6i20, frame 1 


Report for DKFZphfbr2_6i20.1 


(LENGTH) 
[MWJ 
(pll 
[HOMOL] 

(FUNCAT] 

(FUNCAT) 

(FUNCAT] 

[BLOCKS] 

[BLOCKS] 

[PIRKWJ 

IPIRKW] 

(PIRKW) 

[SUPFAM] 

(PROSITE] 

(PROSITE) 

[PROSITE] 

[PROSITE] 

[KW] 

[KWJ 


296 

33495.98 
9.98 

TREMBL:AF067212_1 gene: "F37F2.1"; Caenorhabditis elegans cosmid F37F2. le- 

05.01 ribosomal proteins [S. cerevisiae, YNL284c] 7e-15 

30.16 mitochondrial organization (S. cerevisiae, YNL284c] 7e-15 

j mrna translation and ribosome biogenesis [M. genitalium, MG169] le-06 

BL00475D 

BL00475B Ribosomal protein L15 proteins 
ribosome 2e-13 
mitochondrion 2e-13 
protein biosynthesis 2e-13 

Escherichia coli ribosomal protein L15 4e-06 

MYRISTYL 3 

AMIDATION 2 

CK2_PHOSPH0_SITE 2 

PKC_PHOSPH0_SITE 4 

Alpha_Beta 

LOW COMPLEXITY 12.50 % 


SEQ MAGPLQGGGARALDLLRGLPRVSLANLKPNPGSKKPERRPRGRRRGRKCGRGHKGERQRG 

SEG xxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxx . . . 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ TRPRLGFEGGQTPFYI RI PKYGFNEGHSFRRQYKPMSLNRLQYLI DLGRVDPSQP I DLTQ 

SEG 

PRD ccccccccccccceeeeeccccccccccccccccccchhhhhhhhhccccccccccccee 

SEQ LVNGRGVTIQPLKRDYDVQLVEEGADTFTAKVNIEVQLASELAIAAIEKNGGVVTTAFYD 

SEG 

PRD ecccceeeeccccccceeeeeeccccccchhhhhhhhhhhhhhhhhhhhccceeeeeecc 

SEQ PRSLDIVCKPVPFFLRGQPIPKRMLPPEELVPYYTDAKNRGYLADPAKFPEARLELARKY 

SEG 

PRD ccccccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhhh 

SEQ GYILPDITKDELFKMLCTRKDPRQIFFGLAPGWWNKADKKILKPTDENLLKYYTS 

SEG 

PRD cccccccchhhhhhhhhcccccceeeeeccccceeeeccceeecccchhhhhcccc 


Prosite for DKFZphfbr2_6i20 .1 

PS00005 33->36 PKC_PHOSPHO_SITE PDOC00005 

PS00005 88->91 PKC PHOSPHO SITE PDOC00005 
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PS00005 

149->152 

PKC PHOSPHO 

SITE 

PDOC00005 

PS00005 

258->261 

PKC PHOSPHO SITE 

PDOC00005 

PS00006 

248->252 

CK2 PHOSPHO 

SITE 

PDOC00006 

PS00006 

258->262 

CK2 PHOSPHO" 

SITE 

PDOC00006 

PS00008 

8->14 

MYRISTYL 


PDOC00008 

PS00008 

171->177 

MYRISTYL 


PDOC00008 

PS00008 

268->274 

MYRISTYL 


PDOC00008 

PS00009 

41->45 

AMIDATION 


PDOC00009 

PS00009 

45->49 

AMIDATION 


PDOC00009 


(No Pfam data available for DKFZphfbr2_6i20. 1) 
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DKFZphfbr2_6ol7 


group: nucleic acid management 

DKFZphfbr2_6ol7 encodes a novel 455 amino acid protein with strong similarity to DEAD-box ATP- 

dependent RNA helicases YHROSSc and T26G10.1. 

The S. cerevisiae protein YHROSSc is required for maturation of the 35S RNA primary 
transcript. 

The new protein can find application in modulating rRNA maturation. 


strong similar to RNA helicases 
complete cDNA, complete cds, EST hits 

probable start at Bp 27 matchs kozak consensus ANNatgG 
involved in maturation of r-RNA ?? 

YHR065c/Rrp3p is involved in maturation of the 35S primary transcript 
Drslp cold-sensitive mutation has slow 27S to 25S pre-rRNA 
conversion and is deficient in 605 ribosomal subunits 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 1840 bp 

Poly A stretch at pos. 1815, polyadenylation signal at pos. 1793 


1 GGGGACTTCC GGAGACCTCA CACAAGATGG CGGCACCCGA GGAACACGAT 
51 TCTCCGACCG AAGCGTCCCA GCCGATTGTG GAAGAGGAGG AAACTAAAAC 
101 ATTTAAAGAC CTGGGTGTGA CAGATGTGTT GTGTGAAGCT TGTGACCAGT 
151 TGGGATGGAC AAAACCCACC AAGATTCAGA TTGAAGCTAT TCCTTTGGCC 
201 TTACAAGGTC GTGATATCAT TGGGCTTGCA GAAACTGGCT CTGGAAAGAC 
251 AGGCGCCTTT GCTTTGCCCA TTCTAAACGC ACTGCTGGAG ACCCCGCAGC 
301 GTTTGTTTGC CCTAGTTCTT ACCCCGACTC GGGAGCTGGC CTTTCAGATC 
351 TCAGAGCAGT TTGAAGCCCT GGGGTCCTCT ATTGGAGTGC AGAGTGCTGT 
401 GATTGTAGGT GGAATTGATT CAATGTCTCA ATCTTTGGCC CTTGCAAAAA 
451 AACCACATAT AATAATAGCA ACTCCTGGTC GACTGATTGA CCACTTGGAA 
501 AATACGAAAG GTTTCAACTT GAGAGCTCTC AAATACTTGG TCATGGATGA 
551 AGCCGACCGA ATACTGAATA TGGATTTTGA GACAGAGGTT GACAAGATCC 
601 TCAAAGTGAT TCCTCGAGAT CGGAAAACAT TCCTCTTCTC TGCCACCATG 
651 ACCAAGAAGG TTCAAAAACT TCAGCGAGCA GCTCTGAAGA ATCCTGTGAA 
701 ATGTGCCGTT TCCTCTAAAT ACCAGACAGT TGAAAAATTA CAGCAATATT 
751 ATATTTTTAT TCCCTCTAAA TTCAAGGATA CCTACCTGGT TTATATTCTA 
801 AATGAATTGG CTGGAAACTC CTTTATGATA TTCTGCAGCA CCTGTAATAA 
851 TACCCAGAGA ACAGCTTTGC TACTGCGAAA TCTTGGCTTC ACTGCCATCC 
901 CCCTCCATGG ACAAATGAGT CAGAGTAAGC GCCTAGGATC CCTTAATAAG 
951 TTTAAGGCCA AGGCCCGTTC CATTCTTCTA GCAACTGACG TTGCCAGCCG 
1001 AGGTTTGGAC ATACCTCATG TAGATGTGGT TGTCAACTTT GACATTCCTA 
1051 CCCATTCCAA GGATTACATC CATCGAGTAG GTCGAACAGC TAGAGCTGGG 
1101 CGCTCCGGAA AGGCTATTAC TTTTGTCACA CAGTATGATG TGGAACTCTT 
1151 CCAGCGCATA GAACACTTAA TTGGGAAGAA ACTACCAGGT TTTCCAACAC 
1201 AGGATGATGA GGTTATGATG CTGACAGAAC GCGTCGCTGA AGCCCAAAGG 
1251 TTTGCCCGAA TGGAGTTAAG GGAGCATGGA GAAAAGAAGA AACGCTCGCG 
1301 AGAGGATGCT GGAGATAATG ATGACACAGA GGGTGCTATT GGTGTCAGGA 
1351 ACAAGGTGGC TGGAGGAAAA ATGAAGAAGC GGAAAGGCCG TTAATCACTT 
1401 TTATGAAGGC TCGAGTTCTG CTGTTCTGTA AAAGAAAATT GGAGAATGAA 
1451 ACCTGCTCCA ACAGAGATCA TGAGACTGAA ATTGGTCAGA ATTGTGTCCA 
1501 GAATGTGCTC AGCTAATTCA GTATTCTTCC CCATTCTGGG TTGGAGTTTA 
1551 CTGCAGAGTA ATTCTTACAG TGCTGATGTC AAGACTGTTA CTGTTCTTCG 
1601 ACTTTGATTC CTTGCTCATG ACATGAGTAG GGTGTGCTCT TCTGTCACTT 
1651 CACACAGACC TTTTGCCTTT TTTAGCTGCA AGTCAAGGAC TAGGTTGATG 
1701 ATGCCCATGA CCTGTAATTG TAAAGAAGCT TGGACATCTG CAAATGATAT 
1751 TTAAACCATC TTGGCTTGTG CTTTATTCAA ACTAATGTGA AACAATAAAT 
1801 TTAAATATTA TTTTTAAAAG AAAAAAAAAA AAAAAAAAAA 


BLAST Results 


Ko BLAST result 


Medline entries 


No Medline entry 
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Peptide information for frame 3 


ORF from 27 bp to 1391 bp; peptide length: 455 
Category: strong similarity to known protein 


1 MAAPEEHDSP TEASQPIVEE EETKTFKDLG VTDVLCEACD QLGWTKPTKI 
51 QIEAIPLALQ GRDIIGLAET GSGKTGAFAL PILNALLETP QRLFALVLTP 
101 TRELAFQISE QFEALGSSIG VQSAVIVGGI DSMSQSLALA KKPHIIIATP 
151 GRLIDHLENT KGFNLRALKY LVMDEADRIL NMDFETEVDK ILKVIPRDRK 
201 TFLFSATMTK KVQKLQRAAL KNPVKCAVSS KYQTVEKLQQ YYIFIPSKFK 
251 DTYLVYILNS LAGNSFMIFC STCNNTQRTA LLLRNLGFTA IPLHGQMSQS 
301 KRLGSLNKFK AKARSILLAT DVASRGLDIP HVDVWNFDI PTHSKDYIHR 
351 VGRTARAGRS GKAITFVTQY DVELFQRIEH LIGKKLPGFP TQDDEVMMLT 
401 ERVAEAQRFA R^4ELREHGEK KKRSREDAGD NDDTEGAIGV RNKVAGGKMK 
451 KRKGR 

BLAST? hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_6ol7, frame 3 

PIR:S40731 ATP-dependent RNA helicase homolog T26G10.1 - Caenorhabditis 
elegans, N = 1, Score * 1497, P « l,6e-153 

PIR: 54 6713 hypochetical protein YHR065c - yeast (Saccharomyces 
cerevisiae), N = 1, Score = 1154, P = 3-6e-117 

TREMBL:ATH010462_1 gene: "RHIO"; product: "RNA helicase"; Arabidopsis 
thaliana mRNA for DEAD box RNA helicase, RHIO, N « 1, Score • 1122, P = 
8.9e-114 

TREMBL:AC002985_2 product: "R27090_2"; Human DNA from chromosome 

19-specific cosmid R27090, genomic sequence, complete sequence., N = 1, 
Score - 950, P = 1.5e-95 

>PIR:S40731 ATP-dependent RNA helicase homolog T2SG10,1 - Caenorhabditis 
elegans 

Length = 489 


Score = 1497 (224.6 bits). Expect « 1.6e-153, P = 1.6e-153 
Identities = 283/442 (64%), Positives = 364/442 (82%) 

EEEETKTFKDLGVTDVLCEACDQLGSn'KPTKIQIEAIPLALQGRDIIGLAErGSGKTGAF 78 
E+ + K+F +LGV+ LC+AC +LGM KP+KIQ A+P ALQG+D+IGLAETGSGKTGAF 
BDVKEKSFAELGVSQPLCDACQRLGWMKPSKIQQAALPHALQGKDVIGLAETGSGKTGAF 98 

ALPILNALLETPQRLFALVLTPTRELAFQISEQFEALGSSIGVQSAVIVGGIDSMSQSLA 138 
A+P+L +LL+ PQ F LVLTPTRELAFQI +QFEALGS IG+ +AVIVGG+D +Q++A 
AIPVLQSLLOHPQAFFCLVLTPTRELAFQIGQQFEALGSGIGLIAAVIVGGVDMAAQAMA 158 

LAKKPHIIIATPGRLIDHLENTKGFNLRALKYLVMDEADRILNMDFETEVDKILKVIPRD 198 
LA++PHII+ATPGRL+DHLENTKGFNL+ALK+L+MDEADRILNMDFE E+DKILKVIPR+ 
LARRPHI I VATPGRLVDHLENTKGFNLBCALKFLIMDEADRI LNMDFEVELDKI LKVI PRE 218 

RKTFLFSATMTKKVQKLQRAALKNPVKCAVSSKYQTVEKLQQYYIFIPSKFKDTYLVYIL 258 
R+T+LFSATMTKKV KL+RA+L++P + +VSS+Y+TV+ L+Q+YIF+P+K-J-K+TYLVY+L 
RRTYLFSATMTKKVSKLERASLRDPARVSVSSRYKTVDNLKQHYIFVPNKYKETYLVYLL 278 

NELAGNSFMIFCSTCNNTQRTALLLRNLGFTAIPLHGQMSQSKRLGSLNKFKAKARSILL 318 
NE AGNS ++FC+TC T + A++LR LG A+PLHGQMSQ KRLGSLNKFK+KAR IL+ 
NEHAGNSAiVFCATCATTMQIAVMLRQLGMQAVPLHGQMSQEKRLGSLNKFKSKAREILV 338 

ATDVASRGLDI PHVDWVNFDI PTHSKDYIHRVGRTARAGRSGKAITFVTQY0VELFQRI 378 

TDVA+RGLDIPHVD+V+N+D+P+ SKDY+HRVGRTARAGRSG AIT VTQYDVE +0+1 
CTDVAAR6LDI PHVDMVINYDMPSQSKDYVHRVGRTARAGRSGIAITVVTQYDVEAYQKI 398 

EHLIGKKLPGFPTQDDEVMMLTERVAEAQRFARMELREHGEKKK RSREDAGDNDD 433 

E +GKKL + ++EVM+L ER EA AR+E++E EKKK R +b GD ++ 

EAMLGKKLDEYKCVENEVMVLVERTQEATENARI EMKEMDEKKKSGKKRRQNDDFGDTEE 4 58 

434 TEGAIGVRNKVAGGKMKKRKGR 455 


Query: 

19 

Sbjct: 

39 

Query: 

79 

Sbjct: 

99 

Query: 

139 

Sbjct: 

159 

Query: 

199 

Sbjct: 

219 

Query: 

259 

Sbjct: 

279 

Query: 

319 

Sbjct: 

339 

Query: 

379 

Sbjct: 

399 

Query: 

434 
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+ G + K GG+ OR 
Sbjct: 459 SGGRFKMGIKSMGGRGGSGGGR 480 


Pedant information for DKFZphfbr2_6ol7, frame 3 


Report for DKFZphfbr2_6ol7 . 3 


( LENGTH! 
(MWl 
[plj 
fHOMOLJ 
le-167 
(FUNCATJ 
[FUNCATJ 
[FUNCAT] 
[FUNCAT] 
(FUNCATJ 
[ FUNCAT) 
( FUNCAT 1 
(FUNCAT) 
(FUNCAT) 
YOR204W) 5e 
(FUNCAT) 
(FUNCAT) 
influenzae, 
[FUNCATJ 
[FUNCATJ 
[FUNCATJ 
(FUNCATJ 
[FUNCATJ 
( FUNCAT I 
[BLOCKS) 
(BLOCKS) 
[BLOCKS) 
(BLOCKS) 
[BLOCKS) 
(PIRKWJ 
[PIRKWJ 
[PIRKW] 
[PIRKWJ ' 
[PIRKW] 
[PIRKWJ 
(PIRKWJ 
[PIRKW] 
[PIRKWJ 
[PIRKW] 
(PIRKW) 
(PIRKW) 
[SUPFAM] 
(SUPFAMI 
[SUPFAMJ 
[SUPFAMI 
(SUPFAMJ 
(SUPFAMJ 
(SUPFAMJ 
(SUPFAMJ 
(SUPFAMJ 
(SUPFAMJ 
(SUPFAMI 
(SUPFAM] 
IPROSITE] 
(PROSITE) 
[PROSITE] 
[PROSITE] 
(PROSITE) 
[PROSITE) 
(PROSITE) 
[PROSITE] 
(PFAMJ 
(PFAMJ 
[KW] 


465 

50646.80 
9.18 

PIR:S40731 ATP-dependent RNA helicase homolog T26G10.1 


Caenorhabditis elegans 


-55 


04.01.04 rrna processing (S, cerevisiae, YHR065c] le-127 

30.10 nuclear organization (S. cerevisiae, YHROeSc] le-127 

04.99 other transcription activities (S. cerevisiae, YHR169w) 2e-79 

06,10 assembly of protein complexes (S. cerevisiae, YLLOOSwJ le-71 

04.05.01.07 chromatin modification [S. cerevisiae, YMR290ci 4e-66 

j mrna translation and ribosome biogenesis [H. influenzae, HI0231 RNAJ le-63 

09.01 biogenesis of cell wall (S. cerevisiae, YJL033wJ le-58 

04.05,03 mrna processing (splicing) (S. cerevisiae, YDL084wJ le-55 

05.04 translation (initiation, elongation and termination) (S. cerevisiae. 


5e-55 


2e-45 
4e-42 

7e-12 


30.03 organization of cytoplasm (s. cerevisiae, yOR204w] 

1 genome replication, transcription, recombination and repair 
HI08921 9e-48 

98 classification not yet clear-cut (S. cerevisiae, YLR276c) 
30.16 mitochondrial organization [s. cerevisiae, YDR194c] 

99 unclassified proteins [S. cerevisiae, YGL064cJ 7e-16 
03.19 recombination and dna repair [s. cerevisiae, YMR190c) 
11.10 cell death (S. cerevisiae, YMR190cJ 7e-12 
r general function prediction [M. jannaachii, MJ1401J 5e-06 
BL00175B Phosphoglycerate mutase family phosphohistidine proteins 
BL00039D DEAD-box subfamily ATP-dependent helicases proteins 
BL00039C DEAD-box subfamily ATP-dependent helicases proteins 
BL00039B DEAD-box subfamily ATP-dependent helicases proteins 
BL00039A DEAD-box subfamily ATP-dependent helicases proteins 
nucleus 4e-60 

RNA binding 7e-69 

DEAD box 7e-69 

transmembrane protein 9e-41 

DNA binding 3e-55 

recF recombination pathway 3e-ll 

ATP le-126 

purine nucleotide binding 7e-69 

P-loop le-126 

hydrolase le-55 

protein biosynthesis 7e-69 

ATP binding 3e-61 

ATP-dependent RNA helicase erF-4A 8e-06 
WW repeat homology 4e-58 

translation initiation factor eIF-4A 7e-69 
DEAD/H box helicase homology le-126 
recQ helicase homology 5e-12 
ATP-dependent RNA helicase homology 8e-06 
unassigned DEAD/H box helicases le-126 
ATP-dependent RNA helicase DBPl 4e-60 
ATP-dependent RNA helicase DHHl le-58 
recQ protein 3e-ll 

tobacco ATP-dependent RNA helicase DBIO 4e-58 

Bloom's syndrome helicase 5e-12 

DEAD_ATP_H ELI CASE 1 

ATP_GTP_A 1 

MYRISTYL 5 

AMIDATION 1 

CAMP_PHOSPHO_SITE 1 

CK2_PHOSPHO_SITE 6 

PKC_PHOSPHO_SITE 9 

ASN_GLYCOSYLATION 1 

Helicases conserved C-terminal domain 

DEAD and DEAH box helicases 

Alpha_Beta 


(H. 


SEQ MAAPEEHDSPTEASQPIVEEEETKTFKDLGVTDVLCEACDQLGWTKPTKIQIEAIPLALQ 

PRD cccccccccccccccchhhhhhhhhhhccccchhhhhhhhhhcccccccccccccccccc 

SEQ GRDIIGLAETGSGKTGAFALPILNALLETPQRLFALVLTPTRELAFQISEQFEALGSSIG 

PRD ccceeeeeccccccceeehhhhhhhhcccccceeeeeeccchhhhhhhhhhhhhhhhhcc 
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SEQ VQSAVIVGGIDSMSQSLALAKKPHIIIATPGRLIDHLENTKGFNLRALKYLVMDEADRIL 

PRD eeeeeeeccchhhhhhhhhhccceeeeeccccccccccccccccccccceeehhhhhhhh 

SEQ NMDFETEVDKILKVIPRDRKTFLFSATMTKKVQKLQRAALKNPVKCAVSSKYQTVEKLQQ 

PRD hhcchhhhhhhhhhcccchhhhhhhhccchhhhhhhhhhhccceeeeeecccccchhhhh 

SEQ YYI FI PSKFKDTYLVYILNELAGNSFMI FCSTCNNTQRTALLLRNLGFTAIPLHGQMSQS 

PRD hhhhhhhhhhhhhhhhhhhhhccceeeeeeecchhhhhhhhhhhhcccceeeccccchhh 

SEQ KRLGSLNKFKAKARSILLATDVASRGLDIPHVDWVNFDIPTHSKDYIHRVGRTARAGRS 

PRD hhhhhhhhhhhhhhhcchhhhhhhhcccccceeeeeecccccccceeeeecccccccccc 

SEQ GKAITFVTQYDVELFQRIEHLIGKKLPGFPTQDDEVMMLTERVAEAQRFARMELREHGEK 

PRD cceeeeeecchhhhhhhhhhhhhhhhccccccchhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ KKRSREDAGDNDDTEGAIGVRNKVAGGKMKKRKGR 

PRD hhhhccccccccccccccccccccccccccccccc 


Prosite for DKFZphfbr2_6ol7,3 


PSOOOOl 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00009 
PS00017 
PS00039 


274->278 
421->425 
25->28 
72->75 
209->212 
229->232 
276->279 
3O0->303 
354->357 
360->363 
400->403 
9->13 
25->29 
186->190 
368->372 
391->395 
424->428 
66->72 
71->77 
116->122 
120->126 
128->134 
382->386 
68->76 
172->181 


ASN_GLYCOSYLATION 

CAMP_PHOSPHO_SITE 

PKC_PHOSPH0_SITE 

PKC_PHOS PHO_SITE 

PKC_PHOSPH0_SITE 

PKC_PHOS PHO_S I TE 

PKC_PHOSPH0_SITE 

PKC_PHOS PHO_S ITE 

PKC_PHOS PHOS ITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPH0_SITE 

CK2~PH0SPH0_SITE 

CK2_PH0S PHO_S I TE 

CK2~PH0SPH0__SITE 

CK2_PHOSPH0_SITE 

CK2_PH0SPH0_SITE 

CK2 PHOSPHORS ITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

AMI DAT I ON 

ATP GTP_A 

DEAD ATP HELICASE 


PDOCOOOOl 
PDOC00004 

PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC000Q6 
PDOC00006 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDCX:00008 
PDOC00009 
PDOC00017 
PDOC00039 


Pfam for DKFZphfbr2_6ol7 . 3 


HMM^NAME DEAD and DEAH box helicases 

HMM ♦ gLpPWILRnI yeMGFEkPTPIQQqAI Pi ILeGRDVMACAQTGSGKTAAF 

G 4+ ++++++++G++KPT+IQ +AIP++L+GRD+++ A TGSGKT+AF 
Query 30 GVTDVLCEACDQLGWTKPTKIQIEAIPLALQGRDIIGLAETGSGKTGAF 78 

HMM lIPMLQHIDwdPWpqpPQdPrALILAPTRELAMQIQEEcRkFgkHMnglR 
++P+L ++++P + ++AL+L+PTRELA QI+E+++++G++++ ++ 

Query 79 ALPILNALLETP QR-LFALVLTPTRELAFQISEQFEALGSSIG-VQ 122 

HMM ImcIYGGtnMRdQMRmLeRGpPHIVIATPGRLIDHIER.gtldLDrleML 
+++I+GG + + Q L+++P HI+IATPGRLIDH+E+ ++L+++++L 
Query 123 SAVIVGGIDSMSQSLALAKKP-HIIIATPGRLIDHLENTKGFNLRALKYL 171 

HMM VMDEADRMLDMGFIDQIRrlMrqIPMpwNRQTMMFSATMPdelqELARrF 

VMDEADR+L+M+F+ +++ + I + + IP ++R T +FSATM++++Q+L+R+ 
Query 172 VMDEADRILNMDFETEVDKILKVIP— RDEIKTFLFSATMTKKVQKLQRAA 219 

HMM MRNPIRInldMdElTtnEnlkQwYiyVerEMWKfdcLcrLIe* 

++NP+ ++ ++++T++ ++Q+YI+++ + K +L+++++ 
Query 220 LKNPVKCAVSSKYQTVE-KLQQYYIFIP-SKFKDTYLVYILN 259 

HMM_NAME Helicases conserved C-terminal domain 

HMM *EileeWLknlGI rvmYIHGdMpQeERdelMddFNnGEynVLIcTDVggR 
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++ + L+NLG++++ +HG+M+Q +R+ +++r++ +L++TDV++R 
Query 277 QRTALLLRNLGFTAIPLHGQMSQSKRLGSLNKFKAKARSILLATDVASR 325 

HMM GI DI PdVNHVINYDMPWNPEqYI QRIGRTgRIG* 

G+DIP V++V+N+D+P ++ +Yr+R+GRT+R+G 
Query 326 GLDIPHVDVVVNFDIPTHSKDYIHRVGRTARAG 358 
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DKFZphfbr2_71o20 


group: brain derived 

DKF2phfbr2_71o20 encodes a novel 232 amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes. 


unknown 

complete cDNA, complete cds, EST hits 

on genomic level encoded by AC006186 (3 exons) 

Sequenced by GBF 

Locus : /map» " 1 Oq2 2 . 1 " 

Insert length: 1768 bp 

Poly A stretch at pos. 1742, polyadenylation signal at pos. 1726 


1 GGGGGCAGCA GGCCAAGGGG GAGGTGCGAG CGTGGACCTG GGACGGGTCT 
51 GGGCGGCTCT CGGTGGTTGG CACGGGTTCG CACACCCATT CAAGCGGCAG 
101 GACGCACTTG TCTTAGCAGT TCTCGCTGAC CGCGCTAGCT GCGGCTTCTA 
151 CGCTCCGGCA CTCTGAGTTC ATCAGCAAAC GCCCTGGCGT CTGTCCTCAC 
201 CATGCCTAGC CTTTGGGACC GCTTCTCGTC GTCGTCCACC TCCTCTTCGC 
251 CCTCGTCCTT GCCCCGAACT CCCACCCCAG ATCGGCCGCC GCGCTCAGCC 
301 TGGGGGTCGG CGACCCGGGA GGAGGGGTTT GACCGCTCCA CGAGCCTGGA 
351 GAGCTCGGAC TGCGAGTCCC TGGACAGCAG CAACAGTGGC TTCGGGCCGG 
401 AGGAAGACAC GGCTTACCTG GATGGGGTGT CGTTGCCCGA CTTCGAGCTG 
451 CTCAGTGACC CTGAGGATGA ACACTTGTGT GCCAACCTGA TGCAGCTGCT 
501 GCAGGAGAGC CTGGCCCAGG CGCGGCTGGG CTCTCGACGC CCTGCGCGCC 
551 TGCTGATGCC TAGCCAGTTG GTAAGCCAGG TGGGCAAAGA ACTACTGCGC 
601 CTGGCCTACA GCGAGCCGTG CGGCCTGCGG GGGGCGCTGC TGGACGTCTG 
651 CGTGGAGCAG GGCAAGAGCT GCCACAGCGT GGGCCAGCTG GCACTCGACC 
701 CCAGCCTGGT GCCCACCTTC CAGCTGACCC TCGTGCTGCG CCTGGACTCA 
751 CGACTCTGGC CCAAGATCCA GGGGCTGTTT AGCTCCGCCA ACTCTCCCTT 
801 CCTCCCTGGC TTCAGCCAGT CCCTGACGCT GAGCACTGGC TTCCGAGTCA 
851 TCAAGAAGAA GCTGTACAGC TCGGAACAGC TGCCCATTGA GGAGTGTTGA 
901 ACTTCAACCT GAGGGGGCCG ACAGTGCCCT CCAAGACAGA GACGACTGAA 
951 CTTTTGGGGT GGAGACTAGA GGCAGGAGCT GAGGGACTGA TTCCAGTGGT 
1001 TGGAAAACTG AGGCAGCCAC CTAAAGTGGA GGTGGGGGAA TAGTGTTTCC 
1051 CAGGAAGCTC ATTGAGTTGT GTGCGGGTGG CTCTGCATTG GGGACACATA 
1101 CCCCTCAGTA CTGTAGCATG AAACAAAGGC TTAGGGGCCA ACAAGGCTTC 
1151 CAGCTGGATG TGTGTGTAGC ATGTACCTTA TTATTTTTGT TACTGACAGT 
1201 TAACAGTGGT GTGACATCCA GAGAGCAGCT GGGCTGCTCC CGCCCCAGCC 
1251 TGGCCCAGGG TGAAGGAAGA GGCACGTGCT CCTCAGAGCA GCCGGAGGGA 
1301 A6GGGGAGGT CGGAGGTCGT GGAGGTGGTT TGTGTATCTT ACTGGTCTGA 
1351 AGGGACCAAG TGTGTTTGTT GTTTGTTTTG TATCTTGTTT TTCTGATCGG 
1401 AGCATCACTA CTGACCTGTT GTAGGCAGCT ATCTTACAGA CGCATGAATG 
1451 TAAGAGTAGG AAGGGGTGGG TGTCAGGGAT CACTTGGGAT CTTTGACACT 
1501 TGAAAAATTA CACCTGGCAG CTGCGTTTAA GCCTTCCCCC ATCGTGTACT 
1551 GCAGAGTTGA GCTGGCAGGG GAGGGGCTGA GAGGGTGGGG GCTGGAACCC 
1601 CTTCCCGGGA GGAGTGCCAT CTGGGTCTTC CATCTAGAAC TGTTTACATG 
1651 AAGATAAGAT ACTCACTGTT CATGAATACA CTTGATGTTC AAGTATTAAG 
1701 ACCTATGCAA TATTTTTTAC TTTTCTAATA AACATGTTTG TTAAAACAAA 
1751 AAAAAAAAAA AAAAAAAA 


BLAST Results 


Entry AC006186 from database EMBLNEW: 

*** SEQUENCING IN PROGRESS *** Homo sapiens chromosome 10 clone 
CRI-JC2048 map 10q22.1; HTGS phase 1, 4 unordered pieces. 
Score = 6512, P = O.Oe+00, identities = 1326/1345 
3 exons 


Medline entries 


No Medline entry 
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Peptide information for frame 1 


ORF from 202 bp to 897 bp; peptide length: 232 
Category: putative protein 


1 MPSLWDRFSS SSTSSSPSSL PRTPTPDRPP RSAWGSATRE EGFDRSTSLE 

51 SSDCESLDSS NSGFGPEEDT AYLDGV5LPD FELLSDPEDE HLCANLMQLL 

101 QESLAQARLG SRRPARLLMP SQLVSQVGKE LLRLAYSEPC GLRGALLDVC 

151 VEQGKSCHSV GQLALDPSLV PTFQLTLVLR LDSRLWPKIQ GLFSSANSPF 

201 LPGFSQSLTL STGFRVIKKK LYSSEQLPIE EC 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_71o20, frame 1 
Mo Alert BLASTP hits found 

Pedant information for DKFZphfbr2_71o20, frame 1 


Report for DKFZphfbr2_71o20 . 1 


(LENGTH! 232 

IMW] 25354.60 

IpIJ 4.87 

[PROSITE] MYRISTYL 2 

(PROSITE] CK2_PH0SPH0_SITE 6 

[PROSITE) GLYCOSAMINOGLYCAN 1 

[PROSITEJ PKC_PHOSPHO_SITE 1 

[KW] All Alpha 

[KW] LOW^COMPLEXITY 17.67 % 


SEQ MPSLWDRFSSSSTSSSPSSLPRTPTPDRPPRSAWGSATREEGFDRSTSLESSDCESLDSS 

SEG xxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ NSGFGPEEDTAYLDGVSLPDFELLSDPEDEHLCANLMQLLQESLAQARLGSRRPARLLMP 

SEG XX 

PRD cccccccccccccccccccceeeccccccchhhhhhhhhhhhhhhhhhccccccceeecc 

SEQ SQLVSQVGKELLRLAYSEPCGLRGALLDVCVEQGKSCHSVGQLALDPSLVPTFQLTLVLR 

SEG 

PRD ccccchhhhhhhhhhhcccccchhhhhhhhccccccccccccccccccccchhhhhhccc 

SEQ LDSRLWPKIQGLFSSANSPFLPGFSQSLTLSTGFRVIKKKLYSSEQLPIEEC 

SEG 

PRD cccccccccccccccccccccccccceeeecccccccccccccccccccccc 


Prosite for DKFZphfbr2_71o20 . 1 


PS00002 

62->66 

PS00005 

111->114 

PS00006 

3->7 

PS00006 

38->42 

PS00006 

47->51 

PS00006 

52~>56 

PS00006 

77->81 

PS00006 

85->89 

PS00008 

X41->147 

PS00008 

191->197 


GLYCOSAMINOGLYCAN 

PKC_PHOSPHO_S I TE 

CK2_PHOSPHO_SITE 

CK2 PHOSPHO_SITE 

CK2~PH0SPH0_SITE 

CK2_PH0S PHOS I TE 

CK2_PH0SPHO_SITE 

CK2_PH0SPH0_SITE 

MYRISTYL 

MYRISTYL 


PDOC00002 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 


<No Pfam data available for DKFZphfbr2_71o20.1) 
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DKFZphfbr2_72bl8 


group: nucleic acid management 

DKFZphfbr2_72bl8 encodes a novel 715 amino acid protein with similarity to E. coli DNA-damage- 
inducibile protein dinP and other proteins induced by DNA-damage. 

The novel protein is similar to dinP of E. coli, yqjH of B. subtilis, dinP of M. tuberculosis 
and T19K24-15 of A, thaliana. The dinB/P pathway is a second SOS-pathway in E. coli. Therefore 
the new gene seems to be involved in DMA repair. 

The new protein can find application in modulating DNA repair and mutagenesis. 


similarity to DNA damage induced genes 

complete cDNA, ccMnplete cds, potential start at Bp 49, EST hits 
localisation primer site B is missing! 

Sequenced by LMU 

Locus: /map«"416.0 cR from top. of ChrlB linkage group"?? 
Insert length: 2475 bp 

Poly A stretch at pos. 2452, polyadenylation signal at pos. 2431 


1 GGGGGAGGAA GGCGGCGGCG ACGACGAGGA AGACGCCGAG GCCTGGGCCA 
51 TGGAACTGGC GGACGTGGGG GCGGCAGCCA GCTCGCAGGG AGTTCATGAT 
101 CAAGTGTTGC CCACACCAAA TGCTTCATCC AGAGTCATAG TACATGTGGA 
151 TCTGGATTGC TTTTATGCAC AAGTAGAAAT GATCTCAAAT CCAGAGCTAA 
201 AAGACAAACC TTTAGGGGTT CAACAGAAAT ATTTGGTGGT TACCTGCAAC 
251 TATGAAGCTA GGAAACTTGG AGTTAAGAAA CTTATGAATG TCAGAGATGC 
301 AAAAGAAAAG TGTCCACAGT TGGTATTAGT TAATGGAGAA GACCTGACCC 
351 GCTACAGAGA AATGTCTTAT AAGGTTACAG AATTACTGGA AGAATTTAGT 
401 CCAGTTGTTG AGAGACTTGG ATTTGATGAA AATTTTGTGG ATCTAACAGA 
4 51 AATGGTTGAG AAGAGACTAC AGCAGCTGCA AAGTGATGAA CTTTCTGCGG 
501 TGACTGTGTC GGGTCATGTA TACAATAATC AGTCTATAAA CCTGCTTGAC 
551 GTCTTGCACA TCAGACTACT TGTTGGATCT CAGATTGCAG CAGAGATGCG 
601 GGAAGCCATG TATAATCAGT TGGGGCTCAC TGGCTGTGCT GGAGTGGCTT 
651 CTAATAAACT GTTGGCAAAA TTAGTTTCTG GTGTCTTTAA ACCAAATCAA 
701 CAAACAGTCT TATTACCTGA AAGTTGTCAA CATCTTATTC ATAGTTTGAA 
751 TCACATAAAG GAAATACCTG GTATTGGCTA TAAAACTGCC AAATGTCTTG 
801 AAGCACTGGG TATCAATAGT GTGCGTGATC TCCAAACCTT TTCACCCAAA 
851 ATTTTAGAAA AAGAATTAGG AATTTCAGTT GCTCAGCGTA TCCAAAAGCT 
901 CAGTTTTGGA GAGGATAACT CCCCTGTGAT ACTCTCAGGA CCACCTCAGT 
951 CCTTTAGTGA AGAAGATTCA TTTAAAAAAT GTACATCTGA AGTTGAAGCT 
1001 AAAAATAAGA TTGAAGAACT ACTTGCTAGT CTTTTAAACA GAGTATGCCA 
1051 AGATGGAAGG AAGCCTCATA CAGTGAGATT AATAATCCGT CGGTATTCCT 
1101 CTGAGAAGCA CTATGGTCGT GAGAGTCGTC AGTGCCCTAT TCCTTCACAT 
1151 GTAATTCAGA AATTAGGGAC AGGAAATTAT GATGTGATGA CCCCAATGGT 
1201 TGATATACTT ATGAAACTTT TTCGAAATAT GGTGAATGTG AAGATGCCAT 
1251 TTCACCTTAC CCTTCTAAGT GTGTGCTTCT GCAACCTTAA AGCACTAAAT 
1301 ACTGCTAAGA AAGGGCTTAT TGATTATTAT TTAATGCCAT CATTATCAAC 
1351 TACTTCACGC TCTGGCAAGC ACAGTTTTAA AATGAAAGAC ACTCATATGG 
1401 AAGATTTTCC CAAAGACAAA GAAACAAACC GGGATTTCCT ACCAAGTGGA 
1451 AGAATTGTVAA GTACAAGAAC TAGGGAGTCT CCACTAGATA CCACAAATTT 
1501 TTCTAAAGAA AAAGACATTA ATGAATTCCC ACTCTGTTCA CTTCCTGAAG 
1551 GTGTTGACCA AGAAGTCTCC 7VAGCAGCTTC CAGTAGATAT TCAAGAAGAA 
1601 ATCCTTTCTG GAAAATCTAG GGAAAAATTT CAAGGGAAAG GAAGTGTGAG 
1651 TTGTCCATTA CATGCCTCTA GAGGAGTATT ATCTTTCTTT TCTAAAAAAC 
1701 AAATGCAAGA TATTCCCATA AATCCTAGAG ATCATTTATC CAGTAGCAAA 
1751 CAGGTATCCT CTGTATCTCC TTGTGAACCG GGAACATCAG GCTTTAATAG 
1801 CAGTAGTTCT TCTTACATGT CTAGCCAAAA GGATTATTCA TATTATTTAG 
1B51 ATAATAGATT AAAAGATGAA CGAATAAGTC AAGGACCTAA AGAACCTCAA 
1901 GGATTCCACT TTACAAATTC AAACCCTGCT GTGTCTGCTT TTCATTCATT 
1951 TCCAAACTTG CAGAGTGAGC AACTTTTCTC CAGAAACCAC ACTACAGATA 
2001 GCCATAAGCA AACAGTAGCA ACAGACTCTC ATGAAGGACT TACAGAAAAT 
2051 AGAGAGCCAG ATTCTGTTGA TGAGAAAATT ACTTTCCCTT CTGACATTGA 
2101 TCCTCAAGTT TTCTATGAAC TACCAGAAGC AGTACAAAAG GAACTGCTGG 
2151 CAGAGTGGAA GAGAACAGGA TCAGATTTCC ACATTCGACA TAAATAAGCA 
2201 TATTCAGCAA AAAGGTCTGA AAAGCAAGGG AATACCATTA TTTTCGGATT 
2251 AGCGGTTTAT TAAGCTCTTC TATATTAAAC ACTAATAGAT ATTCAATAAC 
2301 GGAGTAAACT GTTCCAGATA AAGCAAGAAT AGTTGCAAGA AGTAAATTCT 
2351 GGCACAAAGC GTAAAAATAT AACAGAAGAA ATAATGTAAA ATACTATCTT 
2401 TTATGTCTAA AGCCATTTTA TATTACTTTT CAATAAAAAG AATATCATGG 
2451 TCAAAAAAAA AAAAAAAAAA AAAAC 


BLAST Results 
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Entry HS086339 from database EMBL: 
human STS WI-11064 . 
Score = 1523, P « 3.0e-64, identities » 327/343 


Medline entries 


No Medline entry 


Peptide information for frame 2 


ORF from 50 bp to 2194 bp; peptide length: 715 
Category: similarity to known protein 


1 MELADVGAAA SSQGVHDQVL PTPNASSRVI VHVDLDCFYA QVEMISNPEL 
51 KDKPLGVQQK YLVVTCNYEA RKLGVKKLMN VRDAKEKCPQ LVLVNGEDLT 
101 RYREMSYKVT ELLEEFSPW ERLGFDENFV DLTEMVEKRL QQLQSDELSA 
151 VTVSGHVYNN QSINLLDVLH IRLLVGSQIA AEMREAMYNQ LGLTGCAGVA 
201 SNKLLAKLVS GVFKPNQQTV LLPESCQHLt HSLNHIKEIP GIGYKTAKCL 
251 EALGINSVRD LQTFSPKILE KELGISVAQR IQKLSFGEDN SPVILSGPPQ 
301 SFSEEDSFKK CTSEVEAKNK lEELLASLLN RVCQDGRKPH TVRLIIRRYS 
351 SEKHYGRESR QCPIPSHVIQ KLGTGNYDVM TPMVDILMKL FRNMVNVKMP 
401 FHLTLLSVCF CNLKALNTAK KGLIDYYLMP SLSTTSRSGK HSFKMKDTHM 
451 EDFPKDKETN RDFLPSGRIE STRTRESPLD TTNFSKEKDI NEFPLCSLPE 
501 GVDQEVSKQL PVDIQEEILS GKSREKFQGK GSVSCPLHAS RGVLSFFSKK 
551 QMQDIPINPR DHLSSSKQVS SVSPCEPGTS GFNSSSSSYM SSQKDYSYYL 
601 DNRLKDERIS QGPKEPQGFH FTNSNPAVSA FHSFPNLQSE QLFSRNHTTD 
651 SHKQTVATDS HEGLTENREP DSVDEKITFP SDIDPQVFYE LPEAVQKELL 
701 AEWKRTGSDF HIGHK 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_72bl8, frame 2 

PIR:H64747 DNA-damage-inducibile protein dinP - Escherichia coli, N = 
2, Score = 212, P « 4-2e-27 

PIR:H69963 DNA-damage repair protein homolog yqjH - Bacillus subtilas, 
N = 2, Score - 230, P = 5.2e-26 


>PIR:H69963 DNA-damage repair protein homolog yqjH - Bacillus subtilis 
Length » 414 

HSPs: 

Score = 230 (34.5 bits). Expect = 5.2e-26, Sum P(2) = 5,2e-26 
Identities « 47/112 (41%), Positives « 73/112 (65%) 

Query: 27 SRVIVHVDLDCFYAQVEMXSNPELKDKPLGV QQKYLWTCNYEARKLGVKKLMIIV 81 

SR+I H+D++ FYA VEM +P L+ KP+ V ++K +WTC+YEAR GVK M V 

Sbjct: 5 SRIIFHIDMNSFYASVEMAYDPALRGKPVAVAGMVKERKGIWTCSYEARARGVKTTMPV 64 

Query: 82 RDAKEKCPQLVLVNGEDLTRYREMSYKVTELLEEFSPWERLGFDENFVDLTE 134 

AK CP+L+++ + RYR S + 4L E++ +VE + DE f+D+T+ 
Sbjct: 65 WQAKRHCPELIVLP-PNFDRYRNSSRAMFTILREYTDLVEPVSIDEGYMDMTD 116 

Score = 137 (20.6 bits). Expect - 5.2e-26, Sum P(2) = 5.2e-26 
Identities * 43/148 (29%), Positives ^ 75/148 (50%) 

Query: 178 QIAAEMREAMYNQLGLTGCAGVASNKLLAKLVSGVFKPNQQTVLLPESCQHLIHSLNHIK 237 
+ A E++ + +L L G+A HK LAK+ S + KP T+L ++ L + 

Sbjct: 125 ETAKEIQSRLQKELLLPSSIGIAPNKFLAKMASDMKKPLGITILRKRQVPDILWPLP-VG 183 

Query: 238 EIPGIGYKTAKCLEALGINSVRDLQTFSPKILEKELGISVAQRIQKLSFGEDNSPVILSG 297 

E+ G+G KTA+ L+ LGI4++ +L L++ LGI+ R++ + G ++PV 
Sbjct: 184 EMHGVGKKTAEKLKGLGIHTIGELAAADEHSLKRLLGIN-GPRLKNKANGIHHAPV 238 

Query: 298 PPQSFSEEDSFKKCTSEVEAKNKIEELL 325 
P+ E S + EELL 
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Sbjct: 239 DPERIYEFKSVGNSSTLSHDSSDEEELL 266 

Pedant information for DKFZphfbr2_72bl8, frame 2 
Report for DKF2phfbr2_72bl8.2 

(LENGTH) 715 

[MWJ 80300.63 

{pD 6.37 

[HOMOLl TREMBL:SPBC16A3_11 gene: "SPBC16A3. 11"; product: "hypothetical protein"; 

S.pombe chromosome II cosmid cl6A3. 5e-30 jt*- 4. ptot-cxu , 

IIo^^^T^ dna repair (direct repair, base excision repair and nucleotide excision 

repaxr) (s. cerevisiae, YDR419w} 2e-15 

[FUNCAT] 1 genome replication, transcription, recombination and repair (m 

genitalxum, MG360] 3e-13 

[PIRKW] SOS mutagenesis 2e-ll 

{PIRKWl DNA repair 2e-ll 

[PIRKWl induced mutagenesis 2e-ll 

(SUPFAMl umuC protein 3e-29 

IPROSITEJ MYRTSTYL 6 

[PROSITEJ AMIDATION 1 

fPROSITE} CAMP_PH0SPH0_SITE 2 

IPROSITEJ CK2_PH0SPH0_SITE 15 

IPROSITEJ PROKAR_LIPOPROTEIN 1 

[PROSITE] TYR_PHOSPHO_SITE 2 

rPROSITE} PKC_PHOSPHO_SITE 21 

IPROSITEJ ASNGLYCOSYLATION 5 

[KWJ Alpha_Beta 

[KWJ LOWCOMPLEXITY 4.20 % 

SEQ MELADVGAAASSQGVHDQVLPTPNASSRVIVHVDLDCFYAQVEMISNPELKDKPUSVQQK 
SEG 

PRO ccceeeeeeecccccceeeccccccceeeeeeeccchhhhhhhhhcccccccTC 

SEQ YLWTCNYEARKLGVKKLMNVRDAKEKCPQLVLVNGEDLTRYREMSYKVTELLEEFSPW 
SEG 

PRD ceeeehhhhhhhhhhcccchhhhhhhhccceeeeccccccchhhhhhhhhhhh^ 

ERLGFDENFVDLTEMVEKRLQQLQSDELSAVTVSGHVYNNQSINLLDVLHIRLLVGSQIA 

eeeccchhhhhhhhhhhhhhhhhhccccceeeeeccccccchhhhhhhhhhhhhhhhhhh 

AEMREAMYNQLGLTGCAGVASNKLLAKLVSGVFKPNQQTVLLPESCQHLIHSLNHIKEIP 

hhhhhhhhhhhcceeeeccchhhhhhhhhhhhhcccceeeeecchhhhhhhhhccccccc 

SEQ GIGYKTAKCLEALGINSVRDLQTFSPKILEKELGISVAQRIQKLSFGEDNSPVILSGPPQ 
SEG 

PRD ccchhhhhhhhhhccccchhhhhhhhhhhhhhccchhhhhhhhhhcccccceeeeccccc 

SFSEEDSFKKCTSEVEAKNKIEELLASLLNRVCQDGRKPHTVRLIIRRYSSEKHYGRESR 

ccccccccccchhhhhhhhhhhhhhhhhhhhhhhccccccceeeehhhhhhhhhh^ 

SEQ QCPIPSHVIQKLGTGNYDVMTPMVDILMKLFRNMVNVKMPFHLTLLSVCFCNLKALNTAK 
SEG 

PRD 


SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 


SEQ 
SEG 
PRD 


ccccccceeeeccccccccchhhhhhhhhhhhhhhhhcccceeeeeeeeechhhhhhhhh 

SEG ^^^^°^^"*PSLSTTSRSGKHSFKMKDTHMEDFPKDKETNRDFLPSGRIESTRTRESPLD 

PRD hhhheeeecccccccccccccceeeccccccccccccccccccccccccccccccccc^^ 

TTNFSKEKDINEFPLCSLPEGVDQEVSKQLPVDIQEEILSGKSREKFQGKGSVSCPLHAS 

cccccccccccccccccccchhhhhhhhhhhhhhhhhhhcccceeeeecccccccchh^ 

SEQ RGVLSFFSKKQMQblPINPRDHLSSSKQVSSVSPCEPGTSGFNSSSSSYMSSQKDYSYYL 

f f ^ • xxxxxxxxxx xxxxxxxxxxxxxxxxxxxx . 

PRD hcccccccccccccccccccccccccccccccccccccccccccccccccccccchhhhh 

SEQ DNRLKDERISQGPKEPQGFHFTNSNPAVSAFHSFPNLQSEQLFSRNHTTDSHKQTVATDS 
SEG 

PRD ^»hhhhhhhhhcccccccceeeeccccceeecccccccchhhhhhhccccccceee^ 

SEQ HEGLTENREPDSVDEKITFPSDIDPQVFYELPEAVQKELLAEWKRTGSDFHIGHK 

SEG 

PRD ccccccccccccccccccccccccceeehhhhhhhhhhhhhhhhhccccccccc^ 


SEQ 
SEG 
PRD 
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Prosite for DKFZphfbr2_72bl8.2 


foUUUU J. 

2 

q — 

roUUUUX 
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HO J 

->HO 1 

DC nf\f\f\ 1 

DO J 

~>587 

FbUUUU X 

64 6 

->650 

roUUUUfi 

309 

->sis 

PSUUUU4 

347 

->351 


2 

6->29 


106 

->109 


201 

->204 

n A c 

246 

->249 

PS00005 

257 

->260 

PS00005 

26S 

->2S8 

PS00005 
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->310 

PS00005 
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PSOOOOS 

418- 

->421 

PS00005 
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->438 
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->441 

PSOOOOS 

442- 

->445 

PSOOOOS 

4S9 

->462 

PSOOOOS 

466- 

->469 

PSOOOOS 

471- 

->474 

PSOOOOS 

S20- 

->523 

PSOOOOS 

548- 

->SS1 

PSOOOOS 

565- 

->568 

PSOOOOS 

592- 

->595 

PSOOOOS 

651- 

->S54 

PS00006 

46->50 

PSOOOOS 

257- 

->261 

PS00006 

285- 

->289 

PS00006 

301- 

->305 

PS00006 

303- 

•>307 

PSOOOOS 

313- 

■>317 

PS00006 

448- 

■>452 

PS00006 

459- 

•>463 

PS00006 

477- 

■>481 

PS000O6 

497- 

>501 

PSOOOOS 

573- 

>S77 

PSOOOOS 

592- 

>596 

PSOOOOS 

672- 

>S76 

PSOOOOS 

681- 

>685 

PSOOOOS 

706- 

>710 

PS00007 

101- 

>108 

PS00007 

348- 

>356 

PSOOOOS 

7 

->13 

PSOOOOS 

176- 

>182 

PSOOOOS 

192- 

>198 

PSOOOOS 

198- 

>204 

PSOOOOS 

274- 

>280 

PSOOOOS 

663- 

>669 

PS00009 

335- 

>339 

PS00013 

186- 

>197 


ASN_GLYCOSYLATION 

ASN_GLYC0SYLATION 

ASNGLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

CAMP_PHOSPHO_SITE 

CAMP_PHOSPHO_SITE 

PKC_PHOS PHO_S I TE 

PKC_PH0SPHO SITE 

PKC PHOSPHO'SITE 

PKC~PHOSPHO~SITE 

PKC_PHOSPHO_S I TE 

PKC_PH0SPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOS PHO_S ITE 

PKCPHOSPHOSITE 

PKCPHOS PHO_S ITE 

PKC_PHOSPHO_SITE 

PKC_PHOS PHO_S ITE 

PKC_PHOS PHO_S ITE 

PKC_PHOSPHO_S I TE 

PKC_PHOSPHO_SITE 

PKC_PHOS PHOS ITE 

PKC_PH0SPHO_SITE 

PKC PHOS PHO_S ITE 

PKC_PHOSPHO_SITE 

PKCPHOS PHO_S ITE 

CK2_PH0SPHO_SITE 

CK2 PHOSPHO_SITE 

CK2~PHOS PHO_S ITE 

CK2_PHOSPHO_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0 SITE 

CK2_PH0SPH0~SITE 

CK2_PH0SPH0_SITE 

CK2_PH0S PHO_S ITE 

CK2_PH0SPHO_SITE 

C K2_PHOS PHO_S I TE 

CK2_PH0SPH0_S ITE 

CK2_PHOS PHO_S ITE 

CK2_PH0SPH0_SITE 

TYRPHOSPHO SITE 

T YRPHOS PHO^S ITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

AMI DAT I ON 

PROKAR LIPOPROTEIN 


PDOCOOOOl 

PDOCOOOOl 

PDOCOOOOl 

PDOCOOOOl 

PDOCOOOOl 

PDOC00004 

PDOC00004 

PDOC00005 

PDOCOOOOS 

PDOC00005 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOC00006 

PDOC00006 

PDOC00006 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOC00006 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOC00007 

PDOC00007 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOC00009 

PDOC00013 


(No Pfam data available for DKFZphfbr2_72bl8.2) 
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DKFZphfbr2_72dl3 


group: brain derived 

DKFZphfbr2_72dl3 encodes a novel 165 amino acid protein without similarity to known prdteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP raotife. 

The new protein can find application in studying the expression profile of brain-specific 
genes. 


unknown 

seems to be testis specific 9 of 10 EST hits are from testis librarys 
Sequenced by LMU 
Locus : unknown 
Insert length: 723 bp 

Poly A stretch at pos . 704. no polyadenylation signal found 

1 AGGGGGGGTA TGGGGGAGGG GGAGACTCTG CAGGAGCCTA ATTCCCCACT 
51 CTGAGCTCAC CCTTCTGTCT GCCCGGGCCC TACCCCTTCC CCTACTCTCA 
101 CCCTTATAAT CCTTTTCAGC ACTAGGTCTT CCCGTCACCT CCACCTCTCT 
151 CCATGACCCG GCTCTGCTTA CCCAGACCCG AAGCACGTGA GGATCCGATC 
201 CCAGTTCCTC CAAGGGGCCT GGGTGCTGGG GAGGGGTCAG GTAGTCCAGT 
251 GCGTCCACCT GTATCCACCT GGGGCCCTAG CTGGGCCCAG CTCCTGGACA 
301 GTGTCCTATG GCTGGGGGCA CTAGGACTGA CAATCCAGGC AGTCTTTTCC 
351 ACCACTGGCC CAGCCCTGCT GCTGCTTCTG GTCAGCTTCC TCACCTTTGA 
401 CCTGCTCCAT AGGCCCGCAG GTCACACTCT GCCACAGCGC AAACTTCTCA 
4 51 CCAGGGGCCA GAGTCAGGGG GCCGGTGAAG GTCCTGGACA GCAGGAGGCT 
501 CTACTCCTGC AAATGGGTAC AGTCTCAGGA CAACTTAGCC TCCAGGACGC 
551 ACTGCTGCTG CTGCTCATGG GGCTGGGCCC GCTCCTGAGA GCCTGTGGCA 
601 TGCCCTTGAC CCTGCTTGGC CTGGCTTTCT GCCTCCATCC TTGGGCCTGA 
651 GAGCCCCTCC CCACAACTCA GTGTCCTTCA AATATACAAT GACCACCCTT 
701 CTTCAAAAAA AAAAAAAAAA AAC 


BLAST Results 


Entry HS860F19 from database EMBLNBH: 

Human DNA sequence *** SEQUENCING IN PROGRESS *** from clone 860F19 
Score = 2059, P - l,le-85, identities » 423/434 
2 exons 


Medline entries 


No Medline entry 


Peptide information for frame 3 


ORF from 153 bp to 647 bp; peptide length; 165 
Category: putative protein 
Classification: no clue 

1 MTRLCLPRPE AREDPIPVPP RGLGAGEGSG SPVRPPVSTW GPSWAQLLDS 

51 VLWLGALGLT IQAVFSTTGP ALLLLLVSFL TFDLLHRPAG HTLPQRKLLT 

101 RGQSQGAGEG PGQQEALLLQ MGTVSGQLSL QDALLLLLMG LGPLLRACGM 
151 PLTLLGLAFC LHPWA 


BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_72dl3, frame 3 
No Alert BLASTP hits found 
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Pedant information for DKFZphfbr2_72dl3, frame 3 


Report for DKFZphfbr2_72dl3.3 

[LENGTH] 165 

{MWl 17393.73 

tpl] 7.80 

{BLOCKS] BL00068A Malate dehydrogenase proteins 

[KW] TRANSMEMBRANE 2 

[KW) LOW COMPLEXITY 29.70 % 


SEQ MTRLCLPRPEAREDPIPVPPRGLGAGEGSGSPVRPPVSTWGPSWAQLLDSVLWLGALGLT 

SEC 

PRD ccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhcccccc 

MEM 

SEQ IQAVFSTTGPALLLLLVSFLTFDLLHRPAGHTLPQRKLLTRGQSQGAGEGPGQQEALLLQ 

SEG XXXXXXXXXXXXXX XXXXXXXXXXXXXXX , . . - 

PRD eeeecccccchhhhhhhhhhhhhhccccccccccccccccccccccccccccchhhhhhh 

MEM MMMMMMMMMMMMMMMMM 

SEQ MGTVSGQLSLQDALLLLLMGLGPLLRACGMPLTLLGLAFCLHPWA 

SEG xxxxxxxxxxxxxxxxxxxx 

PRD hcccccchhhhhhhhhhhhccchhhhhcccccchhhhhhhccccc 

MEM MMMMMMMMMMMMMMMMM 


(No Prosite data available for DKFZphfbr2_72dl3 .31 
(No PfaiD data available for DKFZphfbr2_72dl3 .3) 
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DKFZphfbr2_72112 


group: nucleic acid management 

S^^Tr^iJi^'fi'^iiLfr^ ' ""''^^ ^"^"^ P"'^" to YDR126W and 

h^?tv°T«in^^»?^^"H''°"''^^"^^ "yc-type, helix-loop-helix dimerization domain signature This 
?h»^3i^r ? '^t",'"^'*"'^^ protein dinwrization and has been found in proteins such as 

The new protein can application in modulating gene expression. 


similarity to YDR126w ; 
membrane regions: 2 

similarity to YDR126w 


complete cDNA complete cds, EST hits 
Sequenced by LMO 
Locus: unknown 


Insert length: 1270 bp 

Poly A stretch at pos. 1251, no polyadenylation signal found 


1 GGGGGCGCCC GGGAGGCGCC GGAGCCCAGC GGCTGGCGCC AGATCCAGGC 
51 TCCTGGAAGA ACCATGTCCG GCAGCTACTG GTCATGCCAG GCACACACTG 
101 CTGCCCAAGA GGAGCTGCTG TTTGAATTAT CTGTGAATGT TGGGAAGAGG 
151 AATGCCAGAG CTGCCGGCTG AAAATTACCC AACCAAGAGA AATCTGCAGG 
201 ATGGACTTTC TGGTCCTCTT CTTGTTCTAC CTGGCTTCGG TGCTGATGGG 
251 TCTTGTTCTT ATCTGCGTCT GCTCGAAAAC CCATAGCTTG AAAGGCCTGG 
301 CCAGGGGAGG AGCACAGATA TTTTCCTGTA TAATTCCAGA ATGTCTTCAG 
351 AGAGCCGTGC ATGGATTGCT TCATTACCTT TTCCATACGA GAAACCACAC 
4 01 CTTCATTGTC CTGCACCTGG TCTTGCAAGG GATGGTTTAT ACTGAGTACA 
451 CCTGGGAAGT ATTTGGCTAC TGTCAGGAGC TGGAGTTGTC CTTGCATTAC 
501 CTTCTTCTGC CCTATCTGCT GCTAGGTGTA AACCTGTTTT TTTTCACCCT 
551 GACTTGTGGA ACCAATCCTG GCATTATAAC AAAAGCAAAT GAATTATTAT 
601 TTCTTCATGT TTATGAATTT GATGAAGTGA TGTTTCCAAA GAACGTGAGG 
651 TGCTCTACTT GTGATTTAAG GAAACCAGCT CGATCCAAGC ACTGCAGTGT 
701 GTGTAACTGG TGTGTGCACC GTTTCGACCA TCACTGTGTT TGGGTGAACA 
751 ACTGCATCGG GGCCTGGAAC ATCAGGTACT TCCTCATCTA CGTCTTGACC 
801 TTGACGGCCT CGGCTGCCAC CGTCGCCATT GTGAGCACCA CTTTTCTGGT 
851 CCACTTGGTG GTGATGTCAG ATTTATACCA GGAGACTTAC ATCGATGACC 
901 TTGGACACCr CCATGTTATG GACACGGTCA TTCTTATTCA GTACCTGTTC 
951 CTGACTTTTC CACGGATTGT CTTCATGCTG GGCTTTGTCG TGGTCCTGAG 
1001 CTTCCTCCTG GGTGGCTACC TGTTGTCTGT CCTGTATCTG GCGGCCACCA 
1051 ACCAGACTAC TAACGAGTGG TACAGAGGTG TCTGGGCCTG GTGCCAGCGT 
1101 TGTCCCCTTG TGGCCTGGCC TCCGTCAGCA GAGCCCCAAG TCCACCGGAA 
III} 5?«^CACTCC CATGGGCTTC GGAGCAACCT TCAAGAGATC TTTCTACCTG 
1201 CCTTTCCATG TCATGAGAGG AAGAAACAAG AATGACAAGT GTATGACTGC 
1251 CAAAAAAAAA AAAAAAAAAC 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 3 


ORF from 201 bp to 1232 bp; peptide length: 344 
Category: similarity to unknown protein 
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1 MDFLVLFLFY LASVLMGLVL ICVCSKTHSL KGLARGGAQI FSCIIPECLQ 
51 RAVHGLLHYL FHTRNHTFIV LHLVLQGMVY TEYTWEVFGY CQELELSLHY 
101 LLLPYLLLGV NLFFFTLTCG TNPGIITKAN ELLFLHVYEF DEVMFPKNVR 
151 CSTCDLRKPA RSKHCSVCNW CVHRFDHHCV WVNNCIGAWN IRYFLIYVLT 
201 LTASAATVAI VSTTFLVHLV VMSDLYQETY IDDLGHLHVM DTVILIQYLF 
25X LTFPRIVFML GFVVVLSFLL GGYLLSVLYL AATKQTTNEW YRGVWAWCQR 
301 CPLVAWPPSA EPQVHRNIHS HGLRSNLQEI FLPAFPCHER KKQE 


BLASTP hits 


No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_72112, frame 3 

TREMBL:SPBC13G1_7 gene: "SPBC13G1 . 07"; product; "hypothetical protein"; 
S.pombe chromosome II cosmid cl3Gl., N = 2, Score = 247, p = i.4e-22 

TREMBL:CED2021_3 gene: •*D2021.2"; Caenorhabditis elegans cosmid 
D2021., N = 1, Score = 209, P = 9e-17 

TREMBL:CEC43H6_2 gene: "C43H6.7"; Caenorhabditis elegans cosmid 
C43H6., N = 1, Score = 206, P = 5.2e-15 

PIR:S52691 probable membrane protein YDR126w - yeast (Saccharomyces 
cerevisiae), N = 1, Score = 207, p = 8.4e-15 

PIR:E71607 raetal binding protein (DHHC domain) PFB0725c - malaria 
parasite (Plasmodium falciparum), N « 1, Score = 182, P = l.le-13 


>TREMBL:SPBC13G1_7 gene: *'SPBC13G1 . 07"; product: "hypothetical protein"; 
S.pombe chromosome II cosmid cl3Gl. 
Length -> 356 


HSPs : 


Score = 247 {37.1 bits). Expect = 1.4e-22, Sum P(2) = 1.4e-22 
Identities = 55/148 (37%), Positives « 85/148 (57%) 


Query: 52 AVHGLLHYLFHTRNH — TFIVLHLVLQGM VYTEYTWEVFGYCQELELSLHYLLLPY 105 

A+ L +Y+ + N F+ L L+ G+ +Y + F + + L +LLPY 

Sbjct: 64 AMRSLSNYVLYKNNPLVVFLYLALITIGIASFFIYGSSLTQKFSIIDWISV-LTSVLLPY 122 

Query: 106 LLLGVNLFrrTLTCGTNPGIITKANELLFLHVYEFD-EVMFPKNVRCSTCDLRKPARSKH 164 

++L+ + +NPG I N + +D ++ FP +CSTC KPARSKH 

Sbjct: 123 ISLY— -lAAKSNPGKIDLKNWNEASRRFPYDYKIFFPN—KCSTCKFEKPARSKH 173 

Query: 165 CSVCNWCVHRFDHHCVWVNNCIGAWNIRYFLIYVL 199 

C +CN CV +FDHHC+W+NNC+G N RYF + ++L 
Sbjct: 174 CRLCNICVEKFDHHCIWINNCVGLNNARYFFLFLL 208 


Score = 43 (6.5 bits). Expect « 1.4e-22, Sum P(2) ^ 1.4e-22 
Identities = 10/35 (28%), Positives = 17/35 (48%) 

Query: 257 VFMLGFVV-VLSFLLGGYLLSVLYLAATNQTTNEW 290 

VF++ + VL L GY ++Y T + +W 
Sbjct: 254 VFLISLICSVLVLCLLGYEFFLVYAGYTTNESEKW 288 


Pedant information for DKFZphfbr2_72112, frame 3 


Report for DKF2phfbr2_72112 . 3 


(LENGTH] 

[MWJ 

[pI] 

[HOMOL] 

chromosome 

[FUNCAT] 

(FUNCAT] 

IS. 

trONCATl 
8e-05 

[PIRKW] 

ISUPFAMl 

(SUPFAMl 

(PROSITEJ 

[PROSITE] 


344 

39677.23 
7.26 

TREMBL:SPBC13G1_7 gene: "SPBC13G1 .07"; product: "hypothetical protein"; S.pombe 
II cosmid C13G1. 3e-17 

99 unclassified proteins [S. cerevisiae, YDRl26wJ le-16 

03.07 pheromone response, mating-type determination, sex-specific proteins 
cerevisiae, YDR264cl 8e-05 

10.05.99 other pheromone response activities IS. cerevisiae, yDR264c] 

transmembrane protein 4e-15 
ankyrin repeat homology le-10 
unassigned ankyrin repeat proteins le-10 
MYRISTYL 4 
CK2__PH0SPH0_SITE 3 
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[PROSITEl PKC_PHOSPHO_SITE 1 

(PROSITEJ ASN^GLYCOSYLATION 2 

(KW) SIGNAL PEPTIDE 30 

(KW] TRANSMEMBRANE 2 

[KW] LOW COMPLEXITY 16.57 ' 


SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 


MDFLVLFLFYLASVLMGLVLICVCSKTHSLKGLARGGAQIFSCIIPECLQRAVHGLLHYL 
ccchhhhhhhhhhhhhhheeeeeeccccceeeeecccceeeeeeehhhhhhhhhhhheee 


FHTRNHTFIVLHLVLQGMVYTEYTWEVFGYCQELELSLHYLLLPYLLLGVNLFFFTLTCG 

xxxxxxxxxxxxxxxxxxx 

ecccchhhhhhhhhhccchhhhhhhheeeeccceeehhhhhhhhhhhhhhcccceeeecc 
MMMMMMMMMMMMMMMMMMMMMMMMM 

TNPGIITKANELLFLHVYEFDEVMFPKNVRCSTCDLRKPARSKHCSVCNWCVHRFDHHCV 

ccccccccccchhhhhhhhhcccccccceeeecccccccccccccccceeeecccccccc 

M MMMMMMMMMMMMMMMMMMMMMMMMMMMM 

WVNNCIGAWNIRYFLIYVLTLTASAATVAIVSTTFLVHLWMSDLYQETYIDDLGHLHVM 

xxxxxxxxxxxxxxxxx 

cccccccccccchhhhhhhhhccchhhhhhhhhhhhhhhhhccccccccccccccccchh 


SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 


DTVILIQYLFLTFPRIVFMLGFVVVLSFLLGGYLLSVLYLAATNQTTNEWYRGVWAWCQR 

xxxxxxxxxxxxxxxxxxxxx 

hhhhhhhhhhhhhhhhccccccccceeecccchhhhhhhhhcccchhhhhhhhhhhcccc 


CPLVAWPPSAEPQVHRNIHSHGLRSNLQEIFLPAFPCHERKKQE 
cccccccccccccceeecccccccccceeeeecccccccccccc 


Prosite for DKFZphfbr2_72112. 3 


PSOOOOl 

65->69 

PSOOOOl 

284->288 

PS00005 

29->32 

PS00006 

152->156 

PS00006 

229->233 

PS00006 

286->290 

PS00008 

32->38 

PS00008 

77->83 

PS00008 

120->126 

PS00008 

322->328 


ASNGLYCOSYLATION 

ASN^GLYCOSYLATION 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PH0SPH0 SITE 

CK2_PHOS PHO~S I TE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 


PDOCOOOOl 
PDOCOOOOl 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00Q06 
PDOCD0008 
PDOC00008 
PDOC00008 
PDOCQ0008 


(NO Pfam data available for DKFZphfbr2_72112.3) 
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DKFZphfbr2_72ml6 


group: unknown 

DKF2phfbr2_72ml6 encodes a novel 287 amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes. 


unknown 

complete cDNA, complete cds, EST hits 
Sequenced by LMU 

Locus: /map«"26.2 cR from top of Chrl6 linkage group" 

Insert length: 1462 bp 

Poly A stretch at pos. 1441, polyadenylation signal at pos . 1421 


1 GGGGAGGACC GGAGGACCGA GGACAGAAAG ATTGGTGGAC AGGAGCAGCG 
51 GCCGGTGGGG AGGGCGCTCG GCGGCGGCCT GCGGCCATGG CCACCGTGAT 
101 GGCAGCGACG GCGGCGGAGC GGGCGGTGCT GGAGGAGGAG TTCCGCTGGC 
151 TGCTGCACGA CGAGGTGCAC GCTGTGTTGA AGCAGCTGCA GGACATCCTC 
201 AAGGAGGCCT CTCTGCGCTT CACTCTGCCG GGCTCCGGCA CTGAGGGGCC 
251 CGCCAAGCAA GAGAACTTCA TCCTAGGCAG CTGTGGCACA GACCAGGTGA 
301 AGGGTGTGCT GACTCTGCAG GGGGATGCCC TCAGCCAGGC GGATGTGAAC 
351 CTGAAGATGC CCCGGAACAA CCAGCTGCTG CACTTCGCCT TCCGGGAGGA 
401 CAAGCAGTGG AAGCTGCAGC AGATCCAGGA TGCCAGAAAC CATGTGAGCC 
451 AAGCCATTTA CCTGCTTACC AGCCGGGACC AGAGCTACCA GTTCAAGACG 
501 GGCGCTGAGG TCCTCAAGCT GATGGACGCA GTGATGCTGC AGCTGACCAG 
551 AGCCCGAAAC CGGCTCACCA CCCCCGCCAC CCTCACCCTC CCCGAGATCG 
601 CCGCCAGCGG CCTCACGCGG ATGTTCGCCC CTGCCCTGCC GTCCGACCTG 
651 CTGGTCAACG TCTACATCAA CCTCAACAAG CTCTGCCTCA CGGTGTACCA 
701 GCTGCATGCC CTGCAGCCCA ACTCCACCAA GAACTTCCGC CCAGCTGGGG 
7 51 GGGCGGTGCT GCATAGCCCT GGGGCCATGT TCGAGTGGGG CTCTCAGCGC 
801 CTGGAGGTGA GCCACGTGCA CAAAGTGGAG TGCGTGATCC CCTGGCTCAA 
851 CGACGCCCTG GTCTACTTCA CCGTCTCCCT GCAGCTCTGC CAGCAGCTTA 
901 AGGACAAGAT CTCCGTGTTC TCCAGCTACT GGAGCTACAG ACCCTTCTGA 
951 TCACAGCACC CAGGAGCTTG TCTCCAGGAA GGCGGCCCCG TCCCCTACTC 
1001 ATACCCACCA CAGAGCACCA GCCAGTGCCA ACGCCAGGCT GCTATTTATC 
1051 TCCCTATCCC ACCCCCTACC CCACCTAACA CATTTGCACT GCCGGGAATG 
1101 GACACTGGAA GTGCCAGGAG GAAGGAAGGC TGGTTTGGTG GGGTAGTGGG 
1151 GAGGTCAGGG AGGCGGGGCC AAGGGTGTCC CACATTCCCA ACACCGCCCT 
1201 CTGATCACCA TGGGAATCTT TGGACTCAGG ACAGGGCCAG GCGCAGGGCT 
1251 CTCCCTCCTC TCCCCTTCGC TGTCCCCTCC CCCTGGAGGG CATGGTGTCG 
1301 GGGGGTGGCA CTGAGCTATG AGTCCCGGGG ATGGTGAGGA ACGCCACAGA 
1351 CAGAGCCACC CTAGGAGTGA GTATAGTGCT GGTGACTGTG TTTCATAGCC 
1401 CCAGTCCAGG GCTGTCTAAG AAATAAAGAT CATCAGACTC CAAAAAAAAA 
1451 AAAAAAAAAA AC 


BLAST Results 


Entry HS604351 from database EMBL: 
human STS WI-18474 . 
Score = 1178, P = 1.5e-48, identities = 250/268 


Medline entries 


No Medline entry 


Peptide information for frame 3 


ORF from 87 bp to 947 bp; peptide length: 287 
Category: similarity to unknown protein 
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1 MATVMAATAA ERAVLEEEFR WLLHDEVHAV LKQLQDILKE ASLRFTLPGS 

51 GTEGPAKQEN FILGSCGTDQ VKGVLTLQGD ALSQADVNLK MPRNNQLLHF 

101 AFREDKQWKL QQIQDARNHV SQAIYLLTSR DQSYQFKTGA EVLKLMDAVM 

151 LQLTRARNRL TTPATLTLPE lAASGLTRMF APALPSDLLV NVYINLNKLC 

201 LTVYQLHALQ PNSTKNFRPA GGAVLHSPGA MFEWGSQRLE VSHVHKVECV 

251 IPWLNDALVY FTVSLQLCQQ LKDKISVFSS YWSYRPF 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_72inl6, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_72ml6, frame 3 


Report for DKFZphfbr2_72ml6.3 


(LENGTH] 287 

(MW) 32254.40 

[pl] 8.30 

[HOMOD TREMBL:AF025459_2 gene: "H14A12.3"; Caenorhabditis elegans cosmid H14A12. 3e-14 

[PROSITE) MYRISTYL 1 

[PROSITE] CK2_PH0SPH0_SITE 6 

[PROSITE] PKC_PHOSPH0_SITE 5 

[PROSITE J ASN_GLYCOSYLATI0N 1 

[KW) Alpha_Beta 

[KW] LOW COMPLEXITY 6.27 % 


SEQ MATVMAATAAERAVLEEEFRWLLHDEVHAVLKQLQDILKEASLRFTLPGSGTEGPAKQEN 

SEG xxxxxxxxxxxxxxxxxx 

PRD ccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccchhhh 

SEQ FILGSCGTDQVKGVLTLQGDALSQADVNLKMPRNNQLLHFAFREDKQWKLQQIQDARNHV 

SEG 

PRD hhccccccceeeeeeeeccccchhhhhhhcccccchhhhhhhhhchhhhhhhhhhhhchh 

SEQ SQAIYLLTSRDQSYQFKTGAEVLKLMDAVMLQLTRARNRLTTPATLTLPEIAASGLTRMF 

SEG 

PRD bhhhhhhhccccceeecchhhhhhhhhhhhhhhhhhhhhhcccccccccccccccccccc 

SEQ APALPSDLLVNVYINLNKLCLTVYQLHALQPNSTKNFRPAGGAVLHSPGAMFEWGSQRLE 

SEG 

PRD cccccccceeeeehhhhhhhhhhheeeecccccccccccccceeecccccccccccccee 

SEQ VSHVHKVECVIPWLNDALVYFTVSLQLCQQLKDKISVFSS YWSYRPF 

SEG 

PRD eeeeeeeeeeeecccceeeeeeehhhhhhhhhhhhheeeeeeeeccc 


Prosite for DKFZphfbr2_72ml6.3 


PSOOOOl 

212->216 

ASN 

_GLYCOSYLATION 

PDOCOOOOI 

PS00005 

42->45 

PKC* 

'PHOSPHO SITE 

PDOC00005 

PSQ0005 

128->131 

PKC' 

>HOSPHO SITE 

PDOC00005 

PS00005 

213->216 

PKC' 

'PHOSPHO SITE 

PDOC00005 

PS00005 

236->239 

PKC" 

"PHOSPHO SITE 

PDOC00005 

PS00005 

283->286 

PKC^PHOSPHO SITE 

PDOC00005 

PS00006 

a->12 

CK2^PHOSPHO SITE 

PDOC00006 

PS00006 

50->54 

CK2 

PHOSPHO SITE 

PDOC00006 

PS00006 

83->87 

CK2' 

'pHOSPHO SITE 

PDOC00006 

PS00006 

128->132 

CK2' 

"PHOSPHO SITE 

PDOC00006 

PS00006 

138->142 

CK2" 

■pHOSPHO SITE 

PDOC00006 

psooooe 

167->171 

CK2~ 

'PHOSPHO SITE 

PDOC00006 

PS00008 

64->70 

MYRISTYL 

FDOC00008 


(No Pfam data available for DKFZphfbr2_72iiil6. 3) 


319 


wo 01/12659 


PCT/IB00/0I496 


DKF2phfbr2_72nl2 


group: brain derived 

DKFZphfbr2_72nl2 encodes a novel 117 amino acid protein with similarity to a protein with 
conserved sequence in bacteria and eukariota. 

protein is very similar to human MM46, human and rat gangliosiode expression factor- 

LQ^^ni',^ TK'T"%'' ' ^^'^'-^ Laccaria bicolor symbiosisJeLtId p^^^^^ 

LBU93506_^1. The function of this highly conserved proteins is not known. P^orein 

genes^" ""^^ ^^"^ application in studying the expression profile of brain-specific 

strong similarity to rat GANGLIOSIDE EXPRESSION FACTOR 2 (GEF-2) 
complete cDNA, complete cds, EST hits 
Sequenced by LMU 
Locus: /roap='*12" 
Insert length: 1880 bp 

Poly A stretch at pos. 1859, polyadenylation signal at pos. 1830 


1 GGGGGCCGGT ATTTCTCCAT CTGGCTCTCC TCTACCTCCA GGCAGGCTCA 
51 CCCGAGATCC CCGCCCCGAA CCCCCCCTGC ACACTCGGCC CAGCGCTGTT 
101 GCCCCCGGAG CGGACGTTTC TGCAGCTATT CTGAGCACAC CTTGACGTCG 
151 GCTGAGGGAG CGGGACAGGG TCAGCGGCGA AGGAGGCAGG CCCCGCGCGG 
201 GGATCTCGGA AGCCCTGCGG TGCATCATGA AGTTCCAGTA CAAGGAGGAC 
251 CATCCCTTTG AGTATCGGAA AAAGGAAGGA GAAAAGATCC GGAAGAAATA 
301 TCCGGACAGG GTCCCCGTGA TTGTAGAGAA GGCTCCAAAA GCCAGGGTGC 
351 CTGATCTGGA CAAGAGGAAG TACCTAGTGC CCTCTGACCT TACTGTTGGC 
401 CAGTTCTACT TCTTAATCCG GAAGAGAATC CACCTGAGAC CTGAGGACGC 
451 CTTATTCTTC TTTGTCAACA ACACCATCCC TCCCACCAGT GCTACCATGG 
501 GCCAACTGTA TGAGGACAAT CATGAGGAAG ACTATTTTCT GTATGTGGCC 
551 TACAGTGATG AGAGTGTCTA TGGGAAATGA GTGGTTGGAA GCCCAGCAGA 
601 TGGGAGCACC TGGACTTGGG GGTAGGGGAG GGGTGTGTGT GCGCGACATG 
651 GGGAAAGAGG GTGGCTCCCA CCGCAAGGAG ACAGAAGGTG AAGACATCTA 
701 GAAACATTAC ACCACACACA CCGTCATCAC ATTTTCACAT GCTCAATTGA 
751 TATTTTTTGC TGCTTCCTCG GCCCAGGGAG AAAGCATGTC AGGACAGAGC 
801 TGTTGGATTG GCTTTGATAG AGGAATGGGG ATGATGTAAG TTTACAGTAT 
851 TCCTGGGGTT TAATTGTTGT GCAGTTTCAT AGATGGGTCA GGAGGTGGAC 
901 AAGTTGGGGC CAGAGATGAT GGCAGTCCAG CAGCAACTCC CTGTGCTCCC 
951 TTCTCTTTGG GCAGAGATTC TATTTTTGAC ATTTGCACAA GACAGGTAGG 
1001 GAAAGGGGAC TTGTGGTAGT GGACCATACC TGGGGACCAA AAGAGACCCA 
1051 CTGTAATTGA TGCATTGTGG CCCCTGATCT TCCCTGTCTC ACACTTCTTT 
1101 TCTCCCATCC CGGTTGCAAT CTCACTCAGA CATCACAGTA CCACCCCAGG 
1151 GGTGGCAGTA GACAACAACC CAGAAATTTA GACAGGGATC TCTTACCTTT 
1201 GGAAAATAGG GGTTAGGCAT GAAGGTGGTT GTGATTAAGA AGATGGTTTT 
1251 GTTATTAAAT AGCATTAAAC TGGAATTGAC AAGAGTGTTG AGCATCCCTG 
1301 TCTAACCTGC TCTTTCTCTT TGGTGCCCCT TATCTCACCC CTTCCTTGGA 
1351 ATTTAATAAG TCTCAGGCAT TTCCAATTGT AGACTAAAAC CACTCTTAGC 
1401 ATCTCCTCTA GTATTTTCCA TGTATCAGGA AAGAGGTGTC TTATGTAGGG 
1451 AGGGGGCAAG TATGAAGTAA GGTAATTATA TACTACTCTC ATTCAGGATT 
l:>01 CTTGCTCCCA TGCTGCTGTC CCTTCAGGCT CACATGCACA GGAATGCTAC 
1551 ATGATGGCCA GCTGCTTCCC TCCTTGGTTA TCATCCACTG CAGCTGCTAG 
1601 TTAGAAAGGT TTGGAGGGAT GACTTTTAGT AAATCATGGG GATTTTATTG 
1651 ATTTATTTTC ACTTTTGGGA TTTTGTGGGG TGGGAGTGGG GAGCAGGAAT 
1701 TGCACTCAGA CATGACATTT CAATTCATCT CTGCTAATGA AAAGGGTTCT 
1751 TTCTCTTGGG GGAAATGTGT GTGTCAGTTC TGTCAGCTGC AAGTTCTTGT 
1801 ATAATGAAGT CAATGCCATC AGGCCAAGGA AATAAAATAA TTGCTTACCT 
1851 TAAAAATCGA AAAAAAAAAA AAAAAAAAAC 


BLAST Results 


Entry HS4 18210 from database EMBL: 
human STS SHGC-104,96. 
Score = 1916, P = 4,0e-80, identities - 394/400 

Entry AC006514 from database EMBLNEW: 

*** SEQUENCING IN PROGRESS *** Homo sapiens; HTGS phase 1, 68 unordered 
pxeces . 

Score « 610, P » 2.7e-16, identities = 128/134 
4 exons 
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No Medline entry 


Medline entries 


Peptide information for frame 2 


ORF from 227 bp to 577 bp; peptide length: 117 
Category: strong similarity to known protein 


1 MKFQYKEDHP FEYRKKEGEK IRKKYPDRVP VIVEKAPKAR VPDLDKRKYL 
51 VPSDLTVGQF YFLIRKRIHL RPEDALFFFV NNTIPPTSAT MGQLYEDNHE 
101 EDYFLYVAYS DESVYGK 


BLAST? hits' 

Entry YQD9_CAEEL from database SWISSPROT: 
HYPOTHETICAL 14.8 KD PROTEIN C32D5.9 IN CHROMOSOME II. 
Score = 496, P = l,8e-47, identities = 91/116, positives 


105/116 


Entry SYRP_LACBI from database SWISSPROT: 
SYMBIOSIS-RELATED PROTEIN. 

Score = 390, P = 3.1e-36, identities = 68/117, positives « 94/117 
Entry LBU93506_1 from database TREMBL: 

product: "symbiosis-related protein"; Laccaria bicolor 
symbiosis-related protein mRNA, partial cds. 

Score = 390, P 3.1e-36, identities = 68/117, positives - 94/117 

Entry GEF2_RAT from database SWISSPROT: 
GANGLIOSIDE EXPRESSION FACTOR 2 <GEF-2) . 
Score = 373, p 2.0e-34, identities = 71/116, positives 


88/116 


Alert BLAST? hits for DKFZphfbr2_72nl2, frame 2 

TREMBLNEW:AF044671_1 product: "MM46"; Homo sapiens MM46 mRNA, complete 
cds., N = 1, Score - 549, P - 4.7e-53 

SWISSPROT :GEF2_HUMAN GANGLIOSIDE EXPRESSION FACTOR 2 (GEF-2)., N « 1, 
Score =373, P = 2.1e-34 

>TREMBLNEW:AF04 4671_1 product: •'MM46-; Homo sapiens MM4 6 mRNA, complete 
cds. 

Length = 117 

HSPs : 

Score « 549 <82.4 bits). Expect « 4.7e-53, P « 4.7e-53 

Identities = 101/116 (87%), Positives = 110/116 (94%) 

Query: 1 MKFQYKEDHPFEYRKKEGEKIRKKYPDRVPVIVEKAPKARVPDLDKRKYLVPSDLTVGQF 60 

^3KF YKE+HPFE R+ EGEKIRKKYPDRVPVIVEKAPKAR+ DLDK+KYLVPSDLTVGQF 
Sbjct: 1 MKFVYKEEHPFEKRRSEGEKIRKKYPDRVPVIVEKAPKARIGDLDKKKYLVPSDLTVGQF 60 

Query: 61 YFLIRKRIHLRPEDALFFFVNNTIFPTSATMGQLYEDNHEEDYFLYVAYSDESVYG 116 

YFLIRKRIHLR EDALFFFVNN IP?TSATMGQLY+f +HEED+FLy+AYSDESVYG 
Sbjct: 61 YFLIRKRIHLRAEDALFFFVNNVIPPTSATMGQLYQEHHEEDFFLYIAYSDESVYG 116 

Pedant information for DKFZphfbr2_72nl2, frame 2 
Report for DKFZphf br2_72nl2 . 2 


[LENGTH] 
[MW] 
tpl] 
[HOMOL] 


117 

14044.07 
8.67 

TREMBL:AF044671_1 product: "MM46*'; Homo sapiens MM46 mRNA, complete cds. le- 
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[FUNCATI 30.03 organization of cytoplasm (s. cerevisiae, YBL078cJ 4e-36 

fFUNCAT) 08.22 cytoskeleton-dependent transport (S. cerevisiae, YBL078c) 4e-36 

[FUNCATJ 06.13.04 lysosomal and vacuolar degradation is. cerevisiae, YBL078cJ 4e-36 

ISUPFAM) hypothetical protein YBL078c 8e-35 

[PROSITEJ ASN_GLYCOSYLATION 1 

[KW] Alpha_Beta 

SEQ MKFQYKEDHPFEYRKKBGEKIRKKYPDRVPVIVEKAPKARVPDLDKRKYLVPSDLTVGQF 
PRD cccccccccchhhhhhhhhhhhhhccccceeeeeccccccccccccceeecccccchhhh 

SEQ YFLIRKRIHLRPEDALFFFVNNTIPPTSATMGQLYEDNHEEDYFLYVAYSDESVYGK 
PRD hhhhhhhhhhccccceeeeecccccccchhhhhhhhhccccceeeeeeecccccccc 


PSOOOOl 


Prosite for DKFZphfbr2_72nl2 . 2 
81->85 ASN GLYCOSYLATION PDOCOOOOl 


(No Pfam data available for DKFZphfbr2_72nl2 .2) 
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DKFZphfbr2_78c24 


group: signal transduction 

DKFZphfbr2_78c24 encodes a novel 563 amino acid protein with strong similarity to guanylate- 
binding proteins (GBPs) . 

GBPs were originally described as proteins that are strongly induced by interferons and are 
capable of binding to agarose-immobilized guanine nucleotides. hGBPl, the first of two members 

of this protein family in humans, represents a novel type of GTPase. The novel protein 
contains an ATP/GTP-binding site motif A (P-loop) and a RGD cell attachment site. It seems to 
be a new member of the GBP-family and shows a splicing pattern not described previously. 

The new protein can find application in modulating/blocking the response of cells to 
interferons. 


strong similarity to guanine nucleotide-binding protein 1/2 
but different "splice variant" aa 211-245 of GBPl/2 missing 

Sequenced by MediGenomix 

Locus : unknown 

Insert length: 2952 bp 

Poly A stretch at pos. 2927, polyadenylation signal at pes. 2914 


1 CAGTTTCATT AGGCTCTGAA GCCATTACAA AGGTTGCTTA ACTTCTAATT 
51 ATTTGATCAC TGAGGAAAAT CCAGAAAGCT ACACAACACT GAAGGGGTGA 
101 AATAAAAGTC CAGCGATCCA GCGAAAGAAA AGAGAAGTGA CAGAAACAAC 
151 TTTACCTGGA CTGAAGATAA AAGCACAGAC AAGAGAACAA TGCCCTGGAC 
201 ATGGCTCCAG AGATCCACAT GACAGGCCCA ATGTGCCTCA TTGAGAACAC 
251 TAATGGGGAA CTGGTGGCGA ATCCAGAAGC TCTGAAAATC CTGTCTGCCA 
301 TTACACAGCC TGTGGTGGTG GTGGCAATTG TGGGCCTCTA CCGCACAGGA 
351 AAATCCTACC TGATGAACAA GCTAGCTGGG AAGAATAAGG GCTTCTCTCT 
401 GGGCTCCACA GTGAAATCTC ACACCAAAGG AATCTGGATG TGGTGTGTGC 
451 CTCACCCCAA AAAGCCAGAA CACACCTTAG TCCTGCTTGA CACTGAGGGC 
501 CTGGGAGATG TAAAGAAGGG TGACAACCAG AATGACTCCT GGATCTTCAC 
551 CCTGGCCGTC CTCCTGAGCA GCACTCTCGT GTACAATAGC ATGGGAACCA 
601 TCAACCAGCA GGCTATGGAC CAACTGTACT ATGTGACAGA GCTGACACAT 
651 CGAATCCGAT CAAAATCCTC ACCTGATGAG AATGAGAATG AGGATTCAGC 
701 TGACTTTGTG AGCTTCTTCC CAGATTTTGT GTGGACACTG AGAGATTTCT 
751 CCCTGGACTT GGAAGCAGAT GGACAACCCC TCACACCAGA TGAGTACCTG 
801 GAGTATTCCC TGAAGCTAAC GCAAGGTAAC AGGAAGCTTG CCCAGCTTGA 
851 GAAACTACAA GATGAAGAGC TGGACCCTGA ATTTGTGCAA CAAGTAGCAG 
901 ACTTCTGTTC CTACATCTTT AGCAATTCCA AAACTAAAAC TCTTTCAGGA 
951 GGCATCAAGG TCAATGGGCC TTGTCTAGAG AGCCTAGTGC TGACCTATAT 
1001 CAATGCTATC AGCAGAGGGG ATCTGCCCTG CATGGAGAAC GCAGTCCTGG 
1051 CCTTGGCCCA GATAGAGAAC TCAGCCGCAG TGCAAAAGGC TATTGCCCAC 
1101 TATGACCAGC AGATGGGCCA GAAGGTGCAG CTGCCCGCAG AAACCCTCCA 
1151 GGAGCTGCTG GACCTGCACA GGGTTAGTGA GAGGGAGGCC ACTGAAGTCT 
1201 ATATGAAGAA CTCTTTCAAG GATGTGGACC ATCTGTTTCA AAAGAAATTA 
1251 GCGGCCCAGC TAGACAAAAA GCGGGATGAC TTTTGTAAAC AGAATCAAGA 
1301 AGCATCATCA GATCGTTGCT CAGCTTTACT TCAGGTCATT TTCAGTCCTC 
1351 TAGAAGAAGA AGTGAAGGCG GGAATTTATT CGAAACCAGG GGGCTATTGT 
1401 CTCTTTATTC AGAAGCTACA AGACCTGGAG AAAAAGTACT ATGAGGAACC 
14 51 AAGGAAGGGG ATACAGGCTG AAGAGATTCT GCAGACATAC TTGAAATCCA 
1501 AGGAGTCTGT GACCGATGCA ATTCTACAGA CAGACCAGAT TCTCACAGAA 
1551 AAGGAAAAGG AGATTGAAGT GGAATGTGTA AAAGCTGAAT CTGCACAGGC 
1601 TTCAGCAAAA ATGGTGGAGG AAATGCAAAT AAAGTATCAG CAGATGATGG 
1651 AAGAGAAAGA GAAGAGTTAT CAAGAACATG TGAAACAATT GACTGAGAAG 
1701 ATGGAGAGGG AGAGGGCCCA GTTGCTGGAA GAGCAAGAGA AGACCCTCAC 
1751 TAGTAAACTT CAGGAACAGG CCCGAGTACT AAAGGAGAGA TGCCAAGGTG 
1801 AAAGTACCCA ACTTCAAAAT GAGATACAAA AGCTACAGAA GACCCTGAAA 
1851 AAAAAAACCA AGAGATATAT GTCGCATAAG CTAAAGATCT AAACAACAGA 
1901 GCTTTTCTGT CATCCTAACC CAAGGCATAA CTGAAACAAT TTTAGAATTT 
1951 GGAACAAGTG TCACTATATT TGATAATAAT TAGATCTTGC ATCATAACAC 
2001 TAAAAGTTTA CAAGAACATG CAGTTCAATG ATCAAAATCA TGTTTTTTCC 
2051 TTAAAAAGAT TGTAAATTGT GCAACAAAGA TGCATTTACC TCTGTACCAA 
2101 CAGAGGAGGG ATCATGAGTT GCCACCACTC AGAAGTTTAT TCTTCCAGAC 
2151 GACCAGTGGA TACTGAGGAA AGTCTTAGGT AAAAATCTTG GGACATATTT 
2201 GGGCACTGGT TTGGCCAAGT GTACAATAGG TCCCAATATC AGAAACAACC 
2251 ATCCTAGCTT CCTAGGGAAG ACAGTGTACA GTTCTCCATT ATATCAAGGC 
2301 TACAAGGTCr ATGAGCAATA ATGTGATTTC TGGACATTGC CCATGGATAA 
2351 TTCTCACTGA TGGATCTCAA GCTAAAGCAA ACCATCTTAT ACAGAGATCT 
2401 AGAATCTTAT ATTTTCCATA GGAAGGT7UUV GAAATCATTA GCAAGAGTAG 
2451 GAATTGAATC ATAAACAAAT TGGCTAATGA AGAAATCTTT TCTTTCTTGT 
2501 TCAATTCATC TAGATTATAA CCTTAATGTG ACACCTGAGA CCTTTAGACA 
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2551 GTTGACCCTG AATTAAATAG TCACATGGTA ACAATTATGC ACTGTGTAAT 
2601 TTTAGTAATG TATAACATGC AATGATGCAC TTTAACTGAA GATAGAGACT 
2651 ATGTTAGAAA ATTGAACTAA TTTAATTATT TGATTGTTTT AATCCTAAAG 
2701 CATAAGTTAG TCTTTTCCTG ATTCTTAAAG GTCATACTTG AAATCCTGCC 
2751 AATTTTCCCC AAAGGGAATA TGGAATTTTT TTTGACTTTC TTTTGAGCAA 
2801 TAAAATAATT GTCTTGCCAT TACTTAGTAT ATGTAGACTT CATCCCAATT 
2851 GTCAAACATC CTAGGTAAGT GGTTGACATT TCTTACAGCA ATTACAGATT 
2901 ATTTTTGAAC TAGAAATAAA CTAAACTAGA AACAAAAAAA AAAAAAAAAA 
2951 AA 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 3 


ORF from 201 bp to 1889 bp; peptide length: 563 
Category: strong similarity to known protein 
Classification: Cell signaling/communication 
Prosite motifs: RGD (272-275) 
ATP_GTP_A (45-53) 


1 MAPEIHMTGP MCLIENTNGE LVANPEALKI LSAITQPVW VAIVGLYRTG 

51 KSYLMNKLAG KNKGFSLGST VKSHTKGIWM WCVPHPKKPE HTLVLLDTEG 
101 LGDVKKGDNQ NDSWIFTLAV LLSSTLVYNS MGTINQQAMD QLYYVTELTH 
151 RIRSKSSPDE NENEDSADFV SFFPDFVWTL RDFSLDLEAD GQPLTPDEYL 
201 EYSLKLTQGN RKLAQLEKLQ DEELDPEFVQ QVADFCSYIF SNSKTKTLSG 
251 GIKVNGPCLE SLVLTYINAI SRGDLPCMEN AVLALAQIEN SAAVQKAIAH 
301 YDQQMGQKVQ LPAETLQELL DLHRVSEREA TEVYMKNSFK DVDHLFQKKL 
351 AAQLDKKRDD FCKQNQEASS DRCSALLQVI FSPLEEEVKA GIYSKPGGYC 
401 LFIQKLQDLE KKyYEEPRKG IQAEEILQTY LKSKESVTDA ILQTDQILTE 
451 KEKEIEVECV KAESAQASAK MVEEMQIKYQ QMMEEKEKSY QEHVKQLTEK 
501 MERERAQLLE EQEKTLTSKL QEQARVLKER CQGESTQLQN EIQKLQKTLK 
551 KKTKRYMSHK LKI 


BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_78c24, frame 3 

PIR:A41268 guanine nucleotide-binding protein 1 - human, N = 2* Score = 
1306, P = 4.9e-238 

P1R:A46459 macrophage-activation gene-1 protein mag-1 - mouse, N « 2, 
Score = 942, P « 8.9e-184 

PIR:S70524 guanine nucleotide-binding protein 2 - human, N = 2, Score « 
1131, P * 4.1e-210 

TREMBL:AF077007_1 gene: "Gbp2"; product: "inter feron-induced guanylate 
binding protein GBP-2"; Mus musculus interf eron-induced guanylate 
binding protein GBP-2 (Gbp2) mRNA, complete cds., N = 2, Score ^ 904, P 
= 1.2e-179 


>PIR:A41268 guanine nucleotide-binding protein 1 - human 
Length « 592 

HSPs: 

Score = 1306 (195.9 bits). Expect = 4.9e-238, Sum P(2) « 4.9e-238 
Identities = 264/332 (79%), Positives « 288/332 (86%) 

Query: 211 RKLAQLEKLQDEELDPEFVQQVADFCSYIFSNSKTKTLSGGIKVNGPCLESLVLTYINAI 270 

RKLAQLEKLQDEELDPEFVQQVADFCSYIFSNSKTKTLSGGI+VNGP LESLVLTY+NAI 
Sbjct: 245 RKLAQLEKLQDEELDPEFVQQVADFCSYIFSNSKTKTLSGGIQVNGPRLESLVLTYVNAI 304 
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Query; 

271 

Sbjct: 

305 

Query: 

331 

Sbjct: 

365 

Query: 

391 

Sbjct: 

425 

Query: 

451 : 

Sbjct: 

485 

Query: 

511 ; 

Sbjct: 

545 1 

Score = 

1012 


S GDLPCMENAVLALAQIENSAAVQKAIAHY+QQMGQKVQLP E+LQELLDLHR SEREA 
SSGDLPCMENAVLALAQIENSAAVQKAIAHYEQQMGQKVQLPTESLQELLDLHRDSEREA 

TEVYMKNSFKDVDHLFQKKLAAQLDKKRDDFCKQNQEASSDRCSALLQVIFSPLEEEVKA 
EV++++SFKDVDHLFQK+LAAQL+KKRDDFCKQNQEASSDRCS LLQVIFSPLEEEVKA 
I EVFI RSS FKDVDHLFQKELAAQLEKKRDDFCKQNQEASS DRCSGLLQVI FSPLEEEVKA 

GIYSKPGGYCLFIQKLQDLEKKYYEEPRKGIQAEEILQTYLKSKESVTDAILQTDOILTX 
GIYSKPGGY LF+QKLQDL+KKYYEEPRKGIQAEEILQTYLKSKES+TDAILQTDQ LT 
GIYSKPGGYRLFVQKLQDLKKKYYEEPRKGIQAEEILQTYLKSKESMTDAILQTDQTLTE 

XXXXXXXXXXXXXSAQASAKMVEEMOIKYQQMMEEKEKSYQEHVKQLTEKMXXXXXXXXX 
SAQASAKM++EMQ K +QMME+KE+SYQEH+KQLTEKM 


+TL KLQEQ ++LKE Q ES ++NEI 


Identities = 194/211 (91%), Positives 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 


200/211 (94%) 


4.9e-238 


1 MAPEIHMTGPMCHENTNGELVANPEALKILSAITQPWVVAIVGLYRTGKSYLMNKLAG 60 

MA EIHMTGPMCLIENTNG L+ANPEALKILSAITQP+VVVAIVGLYRTGKSYLMNKLAG 
1 MASEIHMTGPMCLIENTNGRLMANPEALKILSAITQPMVWAIVGLYRTGKSYLMNKLAG 60 

61 KNKGFSIiGSTVKSHTKGIWMWCVPHPKKPEHTLVLLDTEGLGDVKKGDNQNDSWIFTLAV 120 

K KGFSLGSTV+SHTKGIWMWCVPHPKKP H LVLLDTEGLGDV+KGDNQNDSWIF LAV 
6 1 KKKGFS LGSTVQSHTKGI WMWC V PH P KKPGHI LVLLDTEGLGDVEKGDNQN DSW I FALAV 120 

121 LLSSTLVYNSMGTINQQAMDQLYYVTELTHRIRSKSSPDENENE—DSADFVSFFPDFVW 178 

LLSST VYNS-eCTINQQAMDQLYYVTELTHRIRSKSSPDENENB DSADFVSFFPDFVW 
121 LLSSTFVYNSIGTINQQAMDQLYYVTELTHRIRSKSSPDENENEVEDSADFVSFFPDFVW 180 


179 TLRDFSLDLEADGQPLTPDEYLEYSLKLTQG 209 

TLRDFSLDLEADGQPLTPDEYL YSLKL +G 
181 TLRDFSLDLEADGQPLTPOEYLTYSLKLKKG 211 


(LENGTH] 

[MWl 

(Pll 

(HOMOL) 

(SUPFAM] 

tPROSITEJ 

fPROSITE} 

[KWl 

(KW) 

[KW] 


Pedant information for DKFZphfbr2_78c24, frame 3 
Report for DKFZphfbr2_78c24.3 


563 

64127.72 
5.45 

PIR:A41268 guanine nucleotide-bindxng protein 1 - human 0.0 

guanine nucleotide-binding protein 1 0.0 

ATP_GTP_A 1 

RGD 1 

TRANSMEMBRANE 1 

LOW_COMPLEXITY 6.75 % 

COILED COIL 10.48 % 


SCQ 
SEG 

PRD 

COILS 

MEM 

SEQ 

SEG 

PRD 

COILS 

MEM 

SEQ 

SEG 

PRD 

COILS 

MEM 

SEQ 
SEG 
PRD 
COILS 


MAPEIHMTGPMCLI ENTNGELVAN PEALKILSAITQPVWVAI VGLYRTGKS YLMNKLAG 
cccccccccceeeeeccccchhhhhhhhhhhhhhhcceeeeeeeecccccchhhhhhhhh 

MMMMMMMMMMMMMMMMM 

KNKGFSLGSTVKSHTKGIWMWCVPHPKKPEHTLVLLDTEGLGDVKKGDNQNDSWIFTLAV 

cccccccccccccccceeeeeecccccccceeeeeeeccccccccccccccchhhhhhhh 

LLSSTLVYNSMGTINQQAMDQLYYVTELTHRIRSKSSPDENENEDSADFVSFFPDFVWTL 
hhhhheeeccccchhhhhhhhhhhhhhhhhhhhhcccccccccccccceeeeccceeeeh 

RDFSLDLEADGQPLTPDEYLEYSLKLTQGNRKLAQLEKLQDEELDPEFVQQVADFCSYIF 
hhhhhhhhccccccccchhhhhhhhhhccchhhhhhhhhhhhhcccchhhhhhhhhhhhc 
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MEM 

SEQ SNSKTKTLSGGIKVNGPCLESLVLTYINAISRGDLPCMENAVLALAQIENSAAVQKAIAH 

SEG 

PRD cccceeeccccccccccchhhhhhhhhhhcccccccchhhhhhhhhhhhhhhhhhhhhhh 

COILS 

MEM /.[...., 

SEQ YDQQMGQKVQLPAETLQELLDLHRVSSREATEVYMKNSFKDVDHLFQKKLAAQLDKKRDD 

SEG 

PRD hhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhh 

COILS 

MEM . , ./,..y.,', 

SEQ rCKQNQEASSDRCSALLQVIFSPLEEEVKAGIYSKPGGYCLriQKLQDLEKKYYEEPRKG 

SEG 

PRD hhhhhhchhhhhhhhhhhhhhhhhhhhhhcccccccccceeehhhhhhhhhhhhhccccc 

COILS 

MEM 

SEQ IQAEEILQTYLKSKESVTDAILQTDQrLTEKEKEIEVECVKAESAQASAKMVEEMQIKYQ 

SEG xxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS 

MEM ] 

SEQ (»1MEEKEKSYQEHVKQLTEKMERERAQLLEEQEKTLTSKLQEQARVLKERCQGESTQLQN 

SEG xxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhh 

CGI LS cccccccccccccccccccccccccccccc cccccccccccccccccccccc 

MEM 

SEQ EIQKLQKTLKKKTKRYMSHKLKI 

SEG . . xxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhccc 

COILS ccccccc 

MEM 


Prosite for DKrZphfbc2_78c24. 3 

PS00016 272">275 RGD PDOC00016 
PS00017 45->53 ATP_GTP_A PDOC00017 


(No Pfam data available for DKF2phfbr2_78c24 .3) 
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DKFZphfbr2_78cil3 


group: brain derived 

DKFZphfbr2_78dl3 encodes a novel 259 amino acid protein with similarity to C. elegans putative 
protein from cosmid K08B12. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes. 

similarity to C. elegans K03B12.3 
Sequenced by MedlGenomix 

Locus: /map«"338.4 cR from top of ChrlS linkage group" 
Insert length: 2195 bp 

Poly A stretch at pos . 2175, polyadenylation signal at pes- 2156 

1 CGTCCGTCGG GCAGCAGCGG GGCTGTCTAT CCCGGCTGAG GACCCGCGGC 
51 CAGTGCGGGT GGCTGGCTTT GCCATTAGCG GGGGCCTTTC CTGAGGACGG 

101 CGTACGGAGT GTGGGGAATG AAGGATGGCA GCATGCCGTG CATTAAAAGC 

151 TGTTTTGGTA GATCTCAGTG GCACACTTCA CATTGAAGAT GCAGCTGTGC 

201 CAGGCGCACA GGAAGCTCTT AAAAGGTTAC GTGGTGCTTC TGTAATCATT 

251 AGGTTTGTGA CCAATACAAC CAAAGAGAGC AAGCAAGACC TGTTAGAAAG 

301 GTTGAGAAAA TTGGAATTTG ATATCTCTGA AGATGAAATA TTCACATCTC 

351 TGACTGCAGC CAGAAGTTTA CTAGAGCGGA AACAAGTCAG ACCCATGCTG 

401 CTAGTTGATG ATCGGGCACT ACCTGATTTC AAAGGAATAC AAACAAGTGA 

451 TCCTAATGCT GTGGTCATGG GATTGGCACC AGAACATTTT CATTATCAAA 

501 TTCTGAATCA AGCATTCCGG TTACTCCTGG ATGGAGCACC TCTGATAGCA 

551 ATCCACAAAG CCAGGTATTA CAAGAGGAAA GATGGCTTAG CCCTGGGGCC 

601 TGGACCATTT GTGACTGCTT TAGAGTATGC CACAGATACC AMGCCACAG 

651 TCGTGGGGAA ACCAGAGAAG ACGTTCTTTT TGGAAGCATT GCGGGGCACT 

701 GGCTGTGAAC CTGAGGAGGC TGTCATGATA GGAGATGATT GCAGGGATGA 

751 TGTTGGTGGG GCTCAAGATG TCGGCATGCT GGGCATCTTA GTAAAGACTG 

801 GGAAATATCG AGCATCAGAT GPAGPAMKM TTAATCCACC TCCTTACTTA 

851 ACTTGTGAGA GTTTCCCTCA TGCTGTGGAC CACATTCTGC AGCACCTATT 

901 GTGAAGCAAT GTGTGCATCT GAAGCAACTT GAAATGCAGC TTCTTATTGT 

951 CTGGAATGAA TCCCTTACCA ACTCAGTGCC AGCATCGGTA GACACCAGTC 
1001 AGTGCTGATC GCTTTTTAAC CCTCTTTTGT TGTGCATTAA TTAGAAAGAA 
1051 AGGTATTGAA TTGCGGCTAG CCAGTAAGCC TTGCTAATCT CTTTTATTTT 
1101 GTAACTGAAG ATGAGACCCA AAGAAAGGGA AAGCTGAGAT TTTGTGCCAT 
1151 TCCTTTTAAA ATATTCATCA GGTTAGGTGG GGCTGTGGGG GAAAAGCTAC 
1201 TACAGGGAAG AGTGTTCTCT GCTGTCTCTT CACTGGAAAA CAGGGAGGGG 
1251 GGATTTCAGA CTGTGAAGAA AGTTGAATGG TGGTTTTTAA ATTATAAAGT 
1301 AATGTATTAA AAGGTGCATT AGGCTGTAGT TCTAATATTG AGTTCAACTG 
1351 TGAAATCCAT CAGATGTGCC AAATGGAGAA GACAGAAAGC AACAAAGTGA 
1401 ATTGTTCTTT AGCCCAAGTG GTACAGTGAA TTTGCTTTAA CAGATGTTGA 
1451 AAACTAAATT TTCTACTGTA TTCCCAGCAC GGGTGACTTC TTTTTCTCTT 
1501 CATTAGCCAG AGATGACTAA TTTAAATTTA GAACCAGATT TTAATTTAAA 
1551 TTAATATTTC CATTAATAAC CTACTCATTG CAGATACCTA TTATACTGTG 
1601 TAACAGTTGT TTTGGAAATT TTATGTAAAA TTAAAACTAT CAGTATTTTA 
1651 CAGATGTTTT AATTAGACAT TGTTATTAAC AGGAACAGTG CAGAAACTAG 
1701 AATCAAGCCT TATAATATCT TATAGACCAT GCATTTTTGA AGTTAGTGTC 
1751 CACTAGGGTC CTATTAACTG TACATTTGCA AGATTTCATT ATTTTTGCCT 
1801 CTGACACTAT GGGAAAAATT TTTTAGAAGC TATTGGGACA GATTCAAGCT 
1851 TTTATGCACT TGGTTACTAC AGCTGTAAAA TGAAATCTCG TCTTGTAGCA 
1901 TGGATTATTC TTCTCATGTT AAACCCACCA AAATAAAGGG GACTAAATAG 
1951 GTAATGATTT TCCTAGTGCA TTTGCATACT GTGATAATCC TGGGCCTTGC 
2001 AATAGTTCTA CAGGGCTCTT GGGCATTGAA TTATTAGGAT GTAATTGTAC 
2051 ATCATTGTAG TGTTCACCTT ATTGAAGCTC ACTCTGATGT TAATGAGCTT 
2101 CGGGTTTTGA TGCTTGTTTA GAGATCAGCA GTCTTGGATG GGAGGGAACA 
2151 AAGCTAAATA AATGTTAGTT TGGTGAAAAA AAAAAAAAAA AAAAA 

BLAST Results 


Entry HS599355 from database EMBL: 
human STS WI-13484. 
Score - 1262, P = 3.6e-52, identities « 274/289 


Medline entries 
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No Medline entry 


Peptide information for frame 2 


ORF from 125 bp to 901 bp; peptide length: 259 
Category: similarity to unknown protein 
Classification: no clue 

1 MAACRALKAV LVDLSGTLHI EDAAVPGAQE ALKRLRGASV IIRFVTNTTK 

51 ESKQDLLERL RKLEFDISED EIFTSLTAAR SLLERKQVRP MLLVDDRALP 

101 DFKGIQTSDP NAVVMGLAPE HFHYQILNQA FRLLLDGAPL lAIHKARYYK 

151 RKDGLALGPG PFVTALEYAT DTKATVVGKP EKTFFLEALR GTGCEPEEAV 

201 MIGDDCRDDV GGAQDVGMLG ILVKTGKYRA SDEEKINPPP YLTCESFPHA 

251 VDHILQHLL 


BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_78dl3, frame 2 

TREMBL:CEUK08B12_1 gene: "K08B12.3"; Caenorhabditis elegans cosmid 
K08B12., N =^ 1, Score « 609, P = 2.2e-59 

TREMBL:CEC13C4_5 gene: "C13C4.4"; Caenorhabditis elegans cosmid C13C4, 
N = 1, Score = 408, P = 4.4e-38 


>TREMBL:CEUK08B12_1 gene: "K08B12.3"; Caenorhabditis elegans cosmid 
K08B12. 

Length = 257 

HSPS: 

Score = 609 (91.4 bits), Expect = 2.2e-59, P = 2.2e-59 
Identities = 132/251 (52%), Positives - 172/251 (68%) 

Query: 7 LKAVLVDLSGTLHIEDAAVPGAQEALKRLRGASVIIRFVTNTTKESKQDLLERLRKLEFD 66 

+ +VL+DLSGT+HIE+ A+PGAQ AL+ LR + + +FVTNTTKESK+ L +RL F 
SbjCt: 4 ISSVLIDLSGTIHIEEFAIPGAQTALELLRQHAKV-KFVTNTTKESKRLLHQRLINCGFK 62 

Query; 67 ISEDEIFTSLTAARSLLEEIKQVRPMLLVDDRALPDFKGIQTSDPNAVVMGLAPEHFHYQI 126 

+ ++EIFTSLTAAR L+ + Q RP +VDDRA+ DF+GI T DPNAW+GLAPE r+ 
Sbjct: 63 VEKEEIFTSLTAARDLIVKNQYRPFFIVDDRAMEDFEGISTDDPNAWIGLAPEKFNDTT 122 

Query: 127 LNQAFRLLLDG-APLIAIHKARYYKRKDGLALGPGPFVTALEYATDTKATWGKPEKTFF 185 
L ArRL+ + A LXAI+K RY++ GL LGPG +V LEY+ +AT+VGKP K FT 

Sbjct: 123 LTHAFRLIKEKKASLIAINKGRYHQTNAGLCLGPGTYVAGLEYSAGVEATIVGKPNKLFF 182 

Query: 186 LEALRGTG— CEPEEAVMIGDDCRDDVGGAQDVGMLGILVKTGKYRASDEEKINPPPYLT 243 

AL+ + AVMIGDD DD GA +GM ILVKTGK+R DE K+ 
Sbjct: 183 ESALQSLNENVDFSSAVMIGDDVNDDALGAIKIGMRAILVKTGKFRDGDELKVKN V 238 

Query: 244 CESFPHAVDHILQH 257 

SF AV+ I+++ 
Sbjct: 239 ANSFVDAVNMIIEN 252 


Pedant information for DKFZphfbr2_78dl3, frame 2 


Report for DKFZphfbr2_78dl3 .2 


(LENGTH) 
[MWl 
[pU 
[HOMOL] 

62 

[FUNCAT) 
(SUPFAM) 
IKW) 


259 

28536.04 
5.84 

TREMBL:CEUK08B12_1 gene: "K08B12.3"; Caenorhabditis elegans cosmid K08B12. 3e- 


r general function prediction 
nagD protein 4e~18 
Alpha_Beta 


[M. jannaschil, MJ14371 3e-05 
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SEQ MAACRALKAVLVDLSGTLHIEDAAVPGAQEALKRLRGASVI IRFVTNTTKESKQDLLERL 
PRD ccccccceeeeeecccceeeecccccchhhhhhhhhhccceeeeeeccccchhhhhhhhh 

SEQ RKLEFDISEDEIFTSLTAARSLLERKQVRPMLLVDDRALPDFKGIQTSDPNAWMGLAPE 
PRD hhhccccccceeeehhhhhhhhhhhhccceeeeeechhhhhhccccccccceeeeecccc 

SEQ HFHYQILNQAFRLLLDGAPLIAIHKARYYKRKDGLALGPGPFVTALEYATDTKATVVGKP 
PRD chhhhhhhhhhhhhhccceeeeeccccccccccccccccccchhhhhhhhccceeeeccc 

SEQ EKTFFLEALRGTGCEPEEAVMIGDDCRDDVGGAQDVGMLGILVKTGKYRASDEEKIHPPP 

PRD cchhhhhhhhhhccccceeeeecccchhhhhhhhhccceeeeeeeccccccccccccccc 

SEQ YLTCESFPHAVDHILQHLL 
PRD cccccchhhhhhhhhhccc 

(No Prosite data available for DKFZphfbr2_78cll3 .2) 
(No Pfam data available for DKF2phfbr2_78dl3 .2) 
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DKFZphfbr2_78»c24 


group: metabolism 

DKFZphfbr2_78k24 encodes a novel 372 amino acid protein with similarity to Mus musculus 
ubiquitin specific protease UBP43. 

The novel protein contains a Prosite ubiquitin carboxyl-terminal hydrolases family 2 signature 
2. Ubiquitin carboxyl-terminal hydrolases (EC 3.1,2.15) (UCH) (deubiquitinating enzymes) are 
thiol proteases that recognize and hydrolyze the peptide bond at the C-terminal glycine of 
ubiquitin. These enzymes are involved in the processing of poly-ubiquitin precursors as well 
as that of ubiquinated proteins. 

The new protein can find application in modulation of protein stability/degradation in cells. 


Ubiquitin carboxyl-terminal hydrolases family 2 signature 2. 


strong similarity to mouse ubiquitin specific protease UBP43 

Sequenced by MedlGenomix 

Locus: unlcnown 

Insert length: 1874 bp 

Poly A stretch at pos. 1852, polyadenylatxon signal at pos. 1836 


1 AGTCCCGACG TGGAACTCAG CAGCGGAGGC TGGACGCTTG CATGGCGCTT 
51 GAGAGATTCC ATCGTGCCTG GCTCACATAA GCGCTTCCTG GAAGTGAAGT 
101 CGTGCTGTCC TGAACGCGGG CCAGGCAGCT GCGGCCTGGG GGTTTTGGAG 
151 TGATCACGAA TGAGCAAGGC GTTTGGGCTC CTGAGGCAAA TCTGTCAGTC 
201 CATCCTGGCT GAGTCCTCGC AGTCCCCGGC AGATCTTGAA GAAAAGAAGG 
251 AAGAAGACAG CAACATGAAG AGAGAGCAGC CCAGAGAGCG TCCCAGGGCC 
301 TGGGACTACC CTCATGGCCT GGTTGGTTTA CACAACATTG GACAGACCTG 
351 CTGCCTTAAC TCCTTGATTC AGGTGTTCGT AATGAATGTG GACTTCACCA 
401 GGATATTGAA GAGGATCACG GTGCCCAGGG GAGCTGACGA GCAGAGGAGA 
451 AGCGTCCCTT TCCAGATGCT TCTGCTGCTG GAGAAGATGC AGGACAGCCG 
501 GCAGAAAGCA GTGCGGCCCC TGGAGCTGGC CTACTGCCTG CAGAAGTGCA 
551 ACGTGCCCTT GTTTGTCCAA CATGATGCTG CCCAACTGTA CCTCAAACTC 
601 TGGAACCTGA TTAAGGACCA GATCACTGAT GTGCACTTGG TGGAGAGACT 
651 GCAGGCCCTG TATACGATCC GGGTGAAGGA CTCCTTGATT TGCGTTGACT 
701 GTGCCATGGA GAGTAGCAGA AACAGCAGCA TGCTCACCCT CCCACTTTCT 
751 CTTTTTGATG TGGACTCAAA GCCCCTGAAG ACACTGGAGG ACGCCCTGCA 
801 CTGCTTCTTC CAGCCCAGGG AGTTATCAAG CAAAAGCAAG TGCTTCTGTG 
851 AGAACTGTGG GAAGAAGACC CGTGGGAAAC AGGTCTTGAA GCTGACCCAT 
901 TTGCCCCAGA CCCTGACAAT CCACCTCATG CGATTCTCCA TCAGGAATTC 
951 ACAGACGAGA AAGATCTGCC ACTCCCTGTA CTTCCCCCAG AGCTTGGATT 
1001 TCAGCCAGAT CCTTCCAATG AAGCGAGAGT CTTGTGATGC TGAGGAGCAG 
1051 TCTGGAGGGC AGTATGAGCT TTTTGCTGTG ATTGCGCACG TGGGAATGGC 
1101 AGACTCCGGT CATTACTGTG TCTACATCCG GAATGCTGTG GATGGAAAAT 
1151 GGTTCTGCTT CAATGACTCC AATATTTGCT TGGTGTCCTG GGAAGACATC 
1201 CAGTGTACCT ACGGAAATCC TAACTACCAC TGGCAGGAAA CTGCATATCT 
1251 TCTGGTTTAC ATGAAGATGG AGTGCTAATG GAAATGCCCA AAACCTTCAG 
1301 AGATTGACAC GCTGTCATTT TCCATTTCCG TTCCTGGATC TACGGAGTCT 
1351 TCTAAGAGAT TTTGCAATGA GGAGAAGCAT TGTTTTCAAA CTATATAACT 
1401 GAGCCTTATT TATAATTAGG GATATTATCA AAATATGTAA CCATGAGGCC 
1451 CCTCAGGTCC TGATCAGTCA GAATGGATGC TTTCACCAGC AGACCCGGCC 
1501 ATGTGGCTGC TCGGTCCTGG GTGCTCGCTG CTGTGCAAGA CATTAGCCCT 
1551 TTAGTTATGA GCCTGTGGGA ACTTCAGGGG TTCCCAGTGG GGAGAGCAGT 
1601 GGCAGTGGGA GGCATCTGGG GGCCAAAGGT CAGTGGCAGG GGGTATTTCA 
1651 GTATTATACA ACTGCTGTGA CCAGACTTGT ATACTGGCTG AATATCAGTG 
1701 CTGTTTGTAA TTTTTCACTT TGAGAACCAA CATTAATTCC ATATGAATCA 
1751 AGTGTTTTGT AACTGCTATT CATTTATTCA GCAAATATTT ATTGATCATC 
1801 TCTTCTCCAT AAGATAGTGT GATAAACACA GTCATGAATA AAGTTATTTT 
1851 CCACAAAAAA AAAAAAAAAA AAAA 


BLAST Results 


Entry AC005500 from database £MBL: 
, complete sequence. 

Score - 859, P = 5.7e-143, identities = 175/179 
8 exons matching Bp 317-1230 
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Medline entries 


99182491: 

A novel ubiquitin-specif ic protease, UBP43, cloned from leukemia 
fusion protein AMLl-ETO-expressing mice, functions in 
hematopoietic cell differentiation. 


Peptide information for frame 1 


ORF from 160 bp to 1275 bp; peptide length: 372 
Category: strong similarity to known protein 
Classification: Protein management 
Prosite motifs: UCH_2_2 (302-320) 


1 MSKAFGLLRQ ICQSILAESS QSPADLEEKK EEDSNMKREQ PRERPRAWDY 
51 PHGLVGLHNI GQTCCLNSLI QVFVMNVDFT RILKRITVPR GADEQRRSVP 
101 FQMLLLLEKM QDSRQKAVRP LELAYCLQKC NVPLFVQHDA AQLYLKLWNL 
151 IKDQITDVHL VERLQALYTI RVKDSLICVD CAMESSRNSS MLTLPLSLFD 
201 VDSKPLKTLE DALHCFFQPR ELSSKSKCFC ENCGKKTRGK QVLKLTHLPQ 
251 TLTIHLMRFS IRNSQTRKIC HSLYFPQSLD FSQILPMKRE SCDAEEQSGG 
301 QYELFAVIAH VGMADSGHYC VYIRNAVDGK WFCFNDSNIC LVSWEDIQCT 
351 YGNPNYHWQE TAYLLVYMKM EC 


BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_78k24, frame 1 

TREMBLNEW:AF069502_1 product: "ubiquitin specific protease UBP43"; Mus 
musculus ubiquitin specific protease UBP43 mRNA, complete cds., N » 1, 
Score = 1367, p = le-139 

SWISSPROT:UBP£_DROME UBIQUITIN CARBOXYL-TERMINAL HYDROLASE 64E (EC 
3.1.2.15) (UBIQUITIN THIOLESTERASE 64E) (UBIQUITIN-SPECIFIC PROCESSING 
PROTEASE 64E) (DEUBIQUITINATING ENZYME 64E) . , N = 2, Score = 248, P = 
5.3e-33 


>TREMBLNEW:AF069502_1 product: "ubiquitin specific protease OBP43'*; Mus 
musculus ubiquitin specific protease UBP43 mRNA, complete cds . 
Length = 368 

HSPs: 

Score = 1367 (205.1 bits). Expect = l.Oe-139, P = l.Oe-139 
Identities = 262/369 (71%), Positives = 295/369 (79%) 


Query: 

1 

MSKAFGLLRQICQSILAESSQSPADLEEKKEEDSNMKREQPRERPRAWDYPHGLVGLHNI 

60 



M K FGLLR+ CQS++AE Q A LEE E KR R+ AWD PHGLVGLHNI 


Sbjct: 

1 

MGKGFGLLRKPCQSVVAEPQQYSA-LEE— ERTMKRKRVLSRDLCSAWDSPHGLVGLHNI 

57 

Query: 

61 

GQTCCLNSLIQVFVMNVDFTRILKRITVPRGADEQRRSVPFQMLLLLEKMQDSRQKAVRP 

120 



GQTCCLNSL+QVF+MN+DF ILKRITVPR A+E++RSVPFQ+LLLLEKMQDSRQKA+ P 


Sbjct: 

58 

GQTCCLNSLLQVFMMNMDFRMILKRITVPRSAEERKRSVPFQLLLLLEKMQDSRQKALLP 117 

Query: 

121 

LELAYCLQKCNVPLFVQHDAAQLYLKLWNLXKDQITDVHLVERLQALYTIRVKDSLICVD 

180 



EL CLQK NVPLFVQHDAAQLYL +WNL KDQITD L ERLQ L+TI ++SLICV 


Sbjct: 

118 

TELVQCLQKYNVPLFVQHDAAQLYLTIWNLTKDQITDTDLTERLQGLFTIWTQESLICVG 

177 

Query: 

181 

CAMESSRNSSMLTLPLSLFDVDSKPLKTLEDALHCFFQPRELSSKSKCFCENCGKKTRGK 

240 



C ESSR S +LTL L LFD D+KPLKTLEDAL CF QP+EL+S C CE CG+KT K 


Sbjct: 

178 

CTAESSRRSKLLTLSLPLFDKDAKPLKTLEDALRCFVQPKELASSDMC-CETCGEKTPWK 

236 

Query : 

241 

QVLKLTHLPQTLTIHLMRFSIRNSQTRKICHSLYFPQSLDFSQILPMKRESCDAEEQSGG 

300 



QVLKLTHLPQTLTIHLMRFS RNS+T KICHS+ FPQSLDFSQ+LP + + D +EQS 


Sbjct : 

237 

QVLKLTHLPQTLTIHLMRFSARNSRTEKICHSVNFPQSLDFSQVLPTEEDLGDTKEQSEI 

296 

Query: 

301 

QYELFAVI AHVGMADSGH YCVYI RNAVDGKWFC FN DS N I CLVS WEDI QCT YGN PN YHWQE 

360 



YELFAVIAHVGMAD GHYC YIRN VDGKWFCFND$++C V+W+D+QCTYGN Y W+E 


Sbjct: 

297 

HYELFAVIAHVGMADFGHYCAYIRNPVDGKWFCFNDSHVCWVTHKDVQCTYGNKRYRWRE 

356 

Query: 

361 

TAYLLVYMK 369 
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TAVLLVY K 
SbjCt: 357 TAYLLVYTK 365 

Pedant information for DKFZphfbr2_78k24, frame 1 

Report for DKFZphfbr2_78k24 . 1 

[ LENGTH 1 372 

(MW) 43011.12 

[plj 8.05 

(HOMOL) TREMBLNEW:AF069502_1 product: "ubiquitin specific protease UBP43''; Mus musculus 

ubiquitin specific protease UBP43 mRNA, complete cds. le-151 

(FUNCATl 06.13 proteolysis (S. cerevisiae, YMR304w] 3e-19 

[FUNCAT] 06.13.01 cytoplasmic degradation [S. cerevisiae, yJL197w] 3e-16 

{FUNCAT] 06.07 protein modification (glycolsylation, acylation, myristylation, 

palmitylation, farnesyla tion and processing) (S. cerevisiae, YMR223wJ le-15 

[FUNCAT) 04.05.01.04 transcriptional control [S. cerevisiae, YNLlBSw] 6e-12 

[FUNCAT] 03.10 sporulation and germination [S. cerevisiae, YDR069cl 9e-ll 

I FUNCAT] 10.03.99 other osmosensing activities [S. cerevisiae, YDR069c) 9e-ll 

[FUNCATJ 30.10 nuclear organization [S, cerevisiae, yDR069c] 9e-ll 

[FUNCAT] 30.03 organization of cytoplasm (S. cerevisiae, YDR069c] 9e-ll 

[FUNCAT] 09.25 vacuolar and lysosomal biogenesis (S. cerevisiae, YDR069c} 9e-ll 

[BLOCKS] BL00582A Ribosomal protein L33 proteins 

[BLOCKS] BL00972E 

[BLOCKS] BL00972D 

[BLOCKS] BL00972A 

[EC] 2.4.2.29 Queuine tRNA-ribosyltransferase le-06 

[PIRKW] pentosyltransferase le-06 

[PIRKW] glycosyltransferase le-06 

[PIRKW] tRNA modification le-06 

(PIRKW) alternative splicing 7e-ll 

(PIRKW] hydrolase 7e-06 

[SUPFAM] deubiquinating enzyme SSV7 2e-09 

[PROSITE] UCH_2_2 1 

(PFAMJ Ubiquitin carboxyl -terminal hydrolases family 2 

[PFAM] Ubiquitin carboxyl -terminal hydrolases family 2 

[KW] Alpha_Beta 

SEQ MSKAFGLLRQICQSILAESSQSPADLEEKKEEDSNMKREQPRERPRAWDYPHGLVGLHNI 
PRD cccceeechhhhhhhhcccccccchhhhhhhhcccccccccccccccccccccccccccc 

SEQ GQTCCLNSLIQVFVMNVDFTRILKRITVPRGADEQRRSVPFQMLLLLEKMQDSRQKAVRP 
PRD cceeehhhhhhhhhcccchhhhhhhcccccccccchhhhhhhhhhhhhhhhhhhhccccc 

SEQ LELAYCLQKCNVPLrVQHDAAQLYLKLWNLIKDQITDVHLVERLQALYTIRVKDSLICVD 
PRD hhhhhccccccccchhhhhhhhhhhhhhhhhhhhhhcchhhhhhhhhhhhhhhhheeeee 

SEQ CAMESSRNSSMLTLPLSLFDVDSKPLKTLEDALHCFFQPRELSSKSKCFCENCGKKTRGK 
PRD ccccccccccccccccccccccccchhhhhhhhhhhhhhcccccccceeecccccccccc 

SEQ QVLKLTHLPQTLTIHLMRFSIRNSQTRKICHSLYFPQSLDFSQILPMKRESCDAEEQSGG 
PRD cceeeecccchhhhhhhhhhhccchhhhhccccccccccccccccccccccccccccccc 

SEQ QYELFAVIAHVGMADSGHYCVYIRNAVDGKWFCFNDSNICLVSWEDIQCTYGNPNYHWQE 
PRD eeeeeeeeeeeccccccceeeeeecccccceeeeccceeeeeecccccccccccccchhh 

SEQ TAYLLVYMKMEC 
PRD hhhhhhhhhccc 


Prosite for DKF2phfbr2_781c24 . 1 
PS00973 302->320 UCH_2_2 PDOC00750 

Pfam for DKFZphfbr2 78]c24.1 


HMM_NAME 

HNM 

Query 


Ubiquitin carboxyl-terminal hydrolases family 2 


*GIqNlGNTCYMNSHQCL* 
G+ N+G TC +NS+IQ+ 
56 GLHNIGQTCCLKSLIQVF 


73 
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HMM_NAME Ubiquitin carboxyl-terminal hydrolases family 2 

HMM *YdLYgVICHyGntldyGHYWaYVKNenhHRWkWYyFDDEtV* 

Y+L++VI H G D+GHY +y++N ++KW++F+D+++ 
Query 302 YELFAVIAHVG-MADSGHYCVYIRNAV— DGKWFCFKDSNI 339 


333 


wo 01/12659 


PCT/IBOO/01496 


DKFZphfbr2_78n23 


group: brain derived 

DKFZphfbr2_78n23 encodes a novel 329 amino acid protein with similarity to A.thaliana 
F26P21.80 protein. 

No informative BLAST results; No predictive prosite, pfam or SCOP raotife. 

The new protein can find application in studying the expression profile of brain-specific 
genes. 


similarity to A.thaliana F26P21.80 
Sequenced by MediGenomix 

Locus: /raap=**89.1 cR from top of Chrl9 linkage group* 

Insert length: 1447 bp 

Poly A stretch at pos. 1374, polyadenylation signal at pos. 1353 


1 TACAACTTCC GGCTGTAAAG ATGGCGGCTT CCTAGTGAGT CGGCGGCTGA 
51 CTTAGAAGGA GGTTCAGGCT ACGGTGAGCC GAAGCCACAC AGGAGCCATG 
101 GAAGTGGCAG AGCCCAGCAG CCCCACTGAA GAGGAGGAGG AGGAAGAGGA 
151 GCACTCGGCA GAGCCTCGGC CCCGCACTCG CTCCAATCCT GAAGGGGCTG 
201 AGGACCGGGC AGTAGGGGCA CAGGCCAGCG TGGGCAGCCG CAGCGAGGGT 
251 GAGGGTGAGG CCGCCAGTGC TGATGATGGG AGCCTCAACA CTTCAGGAGC 
301 CGGCCCTAAG TCCTGGCAGG TGCCCCCGCC AGCCCCTGAG GTCCAAATTC 
351 GGACACCAAG GGTCAACTGT CCAGAGAAAG TGATTATCTG CCTGGACCTG 
401 TCAGAGGAAA TGTCACTGCC AAAGCTGGAG TCGTTCAACG GCTCCAAAAC 
4 51 CAACGCCCTC AATGTCTCTC AGAAGATGAT TGAGATGTTC GTGCGGACAA 
501 AACACAAGAT CGACAAAAGC CACGAGTTTG CACTGGTGGT GGTGAACGAT 
551 GACACGGCCT GGCTGTCTGG CCTGACCTCC GACCCCCGCG AGCTCTGTAG 
601 CTGCCTCTAT GATCTGGAGA CGGCCTCCTG TTCCACCTTC AATCTGGAAG 
651 GACTTTTCAG CCTCATCCAG CAGAAAACTG AGCTTCCGGT CACAGAGAAC 
701 GTGCAGACGA TTCCCCCGCC ATATGTGGTC CGCACCATCC TTGTCTACAG 
7 51 CCGTCCACCT TGCCAGCCCC AGTTCTCCTT GACGGAGCCC ATGAAGAAAA 
801 TGTTCCAGTG CCCATATTTC TTCTTTGACG TTGTTTACAT CCACAATGGC 
851 ACTGAGGAGA AGGAGGAGGA GATGAGTTGG AAGGATATGT TTGCCTTCAT 
901 GGGCAGCCTG GATACCAAGG GTACCAGCTA CAAGTATGAG GTGGCACTGG 
951 CTGGGCCAGC CCTGGAGTTG CACAACTGCA TGGCGAAACT GTTGGCCCAC 
1001 CCCCTGCAGC GGCCTTGCCA GAGCCATGCT TCCTACAGCC TGCTGGAGGA 
1051 GGAGGATGAA GCCATTGAGG TTGAGGCCAC TGTCTGAACC ATCCCTGTAC 
1101 ATCTGCACCT TCTTGTGCAA GGAAGTCCTT GGCCTAAAGC CTTGGTTCTC 
1151 AAACTGGGTT CCTTGGGACC TCCGGGGTGG GGGGGTTCCA G6AGGCACGT 
1201 AGGGTACCTT GCAGGGTCCT AGGAGGGAAA CCCAGGATTC CAGGAGGGAT 
1251 CCCAGGAACT GTGGGCACCC ATTTTCTGTG TCTCCCAGCC CATTTCCACT 
1301 CCTAGTTTGT CATGGATAAT TTTTGTTCTT CCCTGTGTGA TTTTTGCCAT 
1351 CAAAATAAAA ATTTGAGACT CGTTAAAAAA AAAAAAAAAA AAAAAAAAAA 
1401 AAAAAAAAAA AAAAAAAAAA AAAAAAGAAA AAAAAAAAAA AAAAAAA 


BLAST Results 


Entry HS806352 from database EMBL: 
human STS EST192543. 
Score - 1285, P = 2.5e-5X, identities = 263/266 


Medline entries 


No Medline entry 


Peptide information for frame 2 


ORF from 98 bp to 1084 bp; peptide length: 329 
Category: similarity to unknown protein 
Classification: no clue 

1 MEVAEPSSPT EEEEEEEEHS AEPRPRTRSN PEGAEDRAVG AQASVGSRSE 


334 


wo 01/12659 


PCT/lB00/0149r> 


51 GEGEAASADD GSLNTSGAGP KSWQVPPPAP EVQIRTPRVN CPEKVIICLD 
101 LSEEMSLPKL ESFNGSKTNA LNVSQKMIEM FVRTKHKIDK SHEFALVVVN 
151 DDTAWLSGLT SDPRELCSCL YDLETASCST FNLEGLFSLI QQKTELPVTE 
201 NVQTIPPPYV VRTILVYSRP PCQPQFSLTE PMKKMFQCPY FFFDVVYIHN 
251 GTEEKEEEMS WKDMFAFMGS LDTKGTSYKY EVALAGPALE LHNCMAKLLA 
301 HPLQRPCQSH ASYSLLEEED EAIEVEATV 


BLASTP hits 

NO BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_78n23, frame 2 

PIR:T05304 hypothetical protein F26P21.80 - Arabidopsis thaliana, N 
1, Score « 142, P = 1.5e-07 


>PIR:T05304 hypothetical protein r26P21.80 - Arabidopsis thaliana 
Length = 264 


Score = 142 (21,3 bits). Expect = 1.5e-07, p = 1.5e-07 
Identities = 56/216 (25%), Positives ^ 97/216 (44%) 

Query: 93 EKVIICLDL-SEEMSLPKLESFNGSKTNALNVSQKMIEMFVRTKHKIDKSHBFALWVND 151 

E ++IC+D+ +E M K NG + ++ I +F+ K 1+ H FA + 

Sbjct: 26 EDILICIDVDAESMVEMKTTGTNGRPLIRMECVKQAIILFIHNKLSINPDHRFAFATLAK 85 

Query: 152 DTAWLSG-LTSDPRELCSCLYDLE-TASCSTFNLEGLFSLIQQKTELPVTENVQTIPPPY 209 

AWL TSD + L L S S +L LF Q+ ++ +N 
Sbjct: 86 SAAWLKKEFTSDAESAVASLRGLSGNKSSSRAOLTLLFRAAAQEAKVSRAQN: R 138 

Query: 210 VVRTILVYSRPPCQPQFSLTEPMKKMFQCPYFFFDWYIHNGTEEKEEEMSWKDMF-AFM 268 

+ R IL+Y R +P p+ + F DV+Y+H ++ + +D++ + + 
Sbjct: 139 IFRVILIYCRSSMRPTHEW— PLNQKL FTLDVMYLH DKPSPDNCPQDVYDSLV 189 

Query: 269 GSLD--TKGTSYKYEVALAGPALELHNCMAKLLAHPLQRPCQ 308 

+++ ++ Y +E G A + M+ LL HP QR Q 
Sbjct: 190 DAVEHVSEYEGYIFESG-QGLARSVFKPMSMLLTHPQQRCAQ 230 

Pedant information for DKF2phfbr2_78n23, frame 2 

Report for DKFZphfbr2_78n23.2 

[LENGTH] 32 9 

(MWJ 36560.10 
(PU 4.60 

[HOMOL] PIR:T05304 hypothetical protein F26P21.80 - Arabidopsis thaliana 7e-07 

tKW) Alpha Beta 

tKW) LOW COMPLEXITY 9.73 % 


SEQ MEVAEPSSPTEEEEEEEEHSAEPRPRTRSNPEGAEDRAVGAQASVGSRSEGEGEAASADD 

SEG . xxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccchhhhhhhhhhhccccccccccccchhhhhhhhhhhccccccccccccccc 

SEQ GSLNTSGAGPKSWQVPPPAPEVQIRTPRVNCPEKVIICLDLSEEMSLPKLESFKGSKTNA 

SEG 

PRD ccccccccccccccccccccceeeccccccccceeeeeccccccccccccccccccccee 

SEQ LNVSQKMIEMFVRTKHKI DKSHEFALWVNDDTAWLSGLTSDPRELCSCLYDLETASCST 

SEG 

PRD ehhhhhhhhhhhhhhhccccccceeeeeeccchhhhhcccccchhhhhhhhhcccccccc 

SEQ FNLEGLFSLI QQKTELPVTENVQT I PPPYVVRTILVYSRPPCQPQFSLTEPMKKMFQCPY 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhcccccccccceeeeeeecccccccccccchhhhhheeee 

SEQ fffdvvyihngteekeeemswkdmfafmgsldtkgtsykyevalagpalelhncmaklla 

SEG 

PRD eeeeeeeeccccchhhhhhhhhhhhhhhhcccccccceeeeecccccchhh^^ 

SEQ hplqrpcqshasyslleeedeaieveatv 

SEG xxxxxxxxxx. . . 

PRD hcccccccccchhhhhhhhhhhhhhhccc 
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(No Prosite data available for DKFZphfbr2_78n23 .2) 
(No Pfam data available for DKFZphfbr2_78n23 .2) 
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DKFZphfbr2_7a24 


group: brain derived 

DKF2phfbr2_7a24 encodes a novel 142 amino acid protein with similarity to the C-terminal part 
of transforming growth factor-beta activated kinases. 

The novel protein shows only similarity to the C-terminus of such kinases; no kinase domain is 
present. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 

similarity to C-terminus of TGF-beta-activated kinase 
complete cDNA, complete cds, EST hits 
Sec[uenced by GBF 

Locus : unknown 

Insert length: 1697 bp 

No poly A stretch found, no polyadenylation signal found 

1 GGGGAGAGAG GGGTTGTGAA GGGAAGCGGA AGGGAAGGGA AGGGAGGTCC 
51 CGTGGGACGC TGGGGTCTGG GGTAGAGCAG GTAGCAGCGT GCTGCCCTGA 

101 CAGCTGTCTC CGCTCCTCAG ATTGTCAGTG GCTGCTATGC AGCAGGTGCA 

151 GCCTGGTCTC TCACTGAGTC TCTACTCCAC AAAGGCAACG ACTGGCCAAG 

201 GCAGTGGCTG GCTCTGGGTT ACACAAGTGC AGACACTCAA CTAAGTGAGC 

251 TGGAAGACCC AGGAGAAGGC GGAGGCTCAG GTGCCCACAT GATCAGCACA 

301 GCCAGGGTAC CTGCTGACAA GCCTGTACGC ATCGCCTTTA GCCTCAATGA 

351 CGCCTCAGAT GATACACCCC CTGAAGACTC CATTCCTTTG GTCTTTCCAG 

4 01 AATTAGACCA GCAGCTACAG CCCCTGCCGC CTTGTCATGA CTCCGAGGAA 

4 51 TCCATGGAGG TGTTCAGACA GCACTGCCAA ATAGCAGAAG AATACCTTGA 

501 GGTCAAAAAG GAAATCACCC TGCTTGAGCA AAGGAAGAAG GAGCTCATTG 

551 CCAAGTTAGA TCAGGCAGAA GAGGAGAAGG TGGATGCTGC TGAGCTGGTT 

601 CGGGAATTCG AGGCTCTGAC GGAGGAGAAT CGGACGTTGA GGTTGGCCCA 

651 GTCTCAATGT GTGGAAC7UC TGGAGAAACT TCGAATACAG TATCAGAAGA 

701 GGCAGGGCTC GTCCTAACTT TAAATTTTTC AGTGTGAGCA TACGAGGCTG 

751 ATGACTGCCC TGTGCTGGCC AAAAGATTTT TATTTTAAAT GAATAGTGAG 

801 TCAGATCTAT TGCTTCTCTG TATTACCCAC ATGACAACTG TCTATAATGA 

851 GTTTACTGCT TGCCAGCTTC TAGCTTGAGA GAAGGGATAT TTTAAATGAG 

901 ATCATTAACG TGAAACTATT ACTAGTATAT GTTTTTGGAG ATCAGAATTC 

951 TTTTCCAAAG ATATATGTTT TTTTCTTTTT TAGGAAGATA TGATCATGCT 
1001 GTACAACAGG GTAGAAAATG GTAAAAATAG ACTATTGACT GACCCAGCTA 
1051 AGAATCGCGG GCTGAGCAGA GTTAAACCAT GGGACAAACC CATAACATGT 
1101 TCAGCATAGT TTCACGTATG TGTATTTTTA AATTTCATGC CTTTAATATT 
1151 TCAAATATGC TCAAATTTAA ACTGTCAGAA ACTTCTCTGC ATGTATTTAT 
1201 ATTTGCCAGA GTATAAACTT TTATACTCTG ATTTTTATCC TTCAATGATT 
1251 GATTATACTA AGAATAAATG GTCACATATC CTAAAAGCTT CTTCATGAAA 
1301 TTATTAGCAG AAACCATGTT TGAAACCAAA GCACATTTGC CAATGCTAAC 
1351 TGGCTGTTGT AATAATAAAC AGATAAGGCT GCATTTGCTT CATGCCATGT 
1401 GACCTCACAG TAAACATCTC TGCCTTTGCC TGTGTGTGTT CTGGGGGAGG 
1451 GGGGACATGG AAAAATATTG TTTGGACATT ACTTGGGTGA GTGCCCATGA 
1501 AGACATCAGT GAACTTGTAA CTATTGTTTT GTTTTGGATT TAAGGAGATG 
1551 TTTTAGATCA GTAACAGCTA ATAGGAATAT GCGAGTAAAT TCAGAATTGA 
1601 AACAATTTCT CCTTGTTCTA CCTATCACCA CATTTTCTCA AATTGAACTC 
1651 TTTGTTATAT GTCCATTTCT ATTCATGTAA CTTCTTTTTC ATTAAAC 

BLAST Results 

No BLAST result 

Medline entries 


98130593: 

Role of TAKl and TABl in BMP signaling in early Xenopus 
development . 
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Peptide information for frame 1 


ORF frcMD 289 bp to 714 bp; peptide length: 142 
Category: similarity to known protein 


1 MISTARVPAD KPVRIAFSLN DASDDTPPED SIPLVFPELD QQLQPLPPCH 
51 DSEESMEVFR QHCQIAEEYL EVKKEITLLE QRKKELIAKL DQAEEEKVDA 
101 AELVREFEAL TEENRTLRLA QSQCVEQLEK LRIQYQKRQG SS 

BLAST P hits 

Entry U92030_l from database TREMBL: 

product: "TAKl"; Xenopus laevis TGF-beta-activated kinase TAKl mRNA, 
complete cds. 

Score « 343, P = 1.3e-30, identities - 69/143, positives = 104/143 
Entry AB009356_1 from database TREMBL: 

product: "TGF-beta activated kinase la"; Homo sapiens mRNA for 
TGF-beta activated kinase la, complete cds. 

Score - 339, P o 2.6e-30, identities « 67/143, positives = 104/143 

Entry MMPK_1 from database TREMBL: 

product: "TAKl {TGF-beta-activated kinase)"; Mouse mRNA for TAKl 
(TGF-beta-activated kinase), complete cds. 

Score ~ 339, P = 2-6e-30, identities = 67/143, positives = 104/143 
Entry AB009357_1 from database TREMBL: 

product: "TGF-beta activated kinase lb"; Homo sapiens mRNA for 
TGF-beta activated kinase lb, complete cds. 

Score = 339, P - 3.2e-30, identities = 67/143, positives = 104/143 
Entry AB009358_1 from database TREMBL: 

product: "TGF-beta activated kinase Ic"; Homo sapiens mRNA for 
TGF-beta activated kinase Ic, complete cds. 

Score « 144, P = 3.8e-09, identities = 30/67, positives = 47/67 


Alert BLASTP hits for DKFZphfbr2_7a24 , frame 1 

PIR:JC5955 transforming growth factor-beta activated kinase (EC 
-.-.-.-) la - Human, N = 1, Score = 339, P = 3e-30 


>PIR:JC5955 transforming growth factor-beta activated kinase (EC -.-.-.-) la 
- Human 

Length = 579 

HSPs: 

Score = 339 (50.9 bits). Expect = 3.0e-30, P = 3.0e-30 
Identities = 67/143 (46%), Positives = 104/143 (72%) 

Query: 1 MISTARVPADKPVRI-AFSLNDASDDTPPEDSIPLVFPELDQQLQPLPPCHDSEESMEVF 59 

MI+T+ ++KP R ++ +D++D ++SIP+ + LD (3LQPL PC +S+ESM VF 
Sbjct: 437 MITTSGPTSEKPTRSHPWTPDDSTDTNGSDNSIPMAYLTLDHQLQPLAPCPNSKESMAVF 496 

Query: 60 RQHCQIAEEYLEVKKEITLLEQRKKELIAKLDQAEEEKVDAAELVREFEALTEENRTLRL 119 

QHC++A+EY++V+ EI LL QRK+EL+A+LDQ E+++ + + LV+E + L +EN++L 
Sbjct: 497 EQHCKMAQEYMKVQTEIALLLQRKQELVAELDQDEKDQQNTSRLVQEHKKLLDENKSLST 556 

Query: 120 AQSQCVEOLEKLRIQYQKRQGSS 142 

QC +QLE +R Q QKRQG+S 
Sbjct: 557 YYQQCKKQLEVIRSQQQKRQGTS 579 


Pedant information for DKFZphfbr2_7a24, frame 1 


Report for DKF2phfbr2_7a24 . 1 


[LENGTH) 142 

(MW) 16377.53 

[pll 4.64 

(HOMOLl TREMBL :U92030_1 product: 

mRNA, complete cds. 6e-26 

(PROSITEl CK2_PH0SPH0_SITE 3 


"TAKl"; Xenopus laevis TGF-beta-activated kinase TAKl 
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IPROSITEJ PKC_PHOSPHO_SITE 2 

tPROSITE) ASN_GLYCOSYLATION 1 

fPFAM) TNFR/NGFR cysteine-rich region 

(KWl All Alpha 

IKW] LOW^COMPLEXITY 7.04 % 

IKW] COILED COIL 33.10 % 


SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRO 

COILS 

SEQ 
SEG 
PRD 
COILS 


MISTARVPADKPVRIAFSLNDASDDTPPEDSIPLVFPELDQQLQPLPPCHDSEESMEVFR 

xxxxxxxxxx 

ccccccccccccccccccccccccccccccccccchhhhhhhhcccccccccchhhhhhh 


QHCQIAEEYLEVKKEITLLEQRKKELIAKLDQAEEEKVDAAELVREFEALTEENRTLRLA 

hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhh 
. . -CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

QSQCVEQLEKLRIQYQKRQGSS 

hhhhhhhhhhhhhhhhhhhccc 


Prosite for DKFZphfbr2 7a24.1 


PSOOOOl 

PS00005 
PS00005 
PS00006 
PS00006 
PS00006 


114->118 
4->7 

116->119 
18->22 
26->30 
77->81 


ASN_GLYCOSYLATION 

PKC_PHOSPH0_SITE 

PKC_PHOSPHO_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2 PHOSPHO SITE 


PDOCOOOOl 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 


Pfam for DKFZphfbr2 7a24.1 


HMM_NAHE 

HMM 

Query 


TNFR/NGFR cysteine-rich region 

*CpeG t Yt DWNHvpqClpC t rCe PEMGQYMvqPCTwTQNTVC* 
C++++ + + +Q C++ E+ ++++++ T + ++ 
CHDSEESMEVF-RQH— CQIAEE— YLEVKKEITLLEQRKK 


49 


84 
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DKF2phfbr2_7e22 


group: brain derived 

DKFZphfbr2_7e22.2 encodes a novel 286 amino acid protein similar to b561 cytochromes 

The new protein shows strong similarity to B561 cytochromes, but contains no heme binding 
site. In addition, a myc-type, helix-loop-helix dimerization domain domain is present. This 
helix-loop-helix domain mediates protein diraerization and has been found in proteins such as 
the myc family of cellular oncogenes, proteins involved in myogenesis and vertebrate proteins 
that bind specific DNA sequences in various immunoglobulin chains enhancers. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of bra in -specific 
genes. 


strong similarity to cytochrome b561 

complete cDNA, complete cds, EST hits 

Sequenced by GBF 

Locus : unknown 

Insert length: 4254 bp 

Poly A stretch at pos. 4234, polyadenylation signal at pos. 4217 


1 GGGGACTACC CAGAGGGCTG CCGCCGCCTC TCCAAGTTCT TGTGGCCCCC 
51 GCGGTGCGGA GTATGGGGCG CTGATGGCCA TGGAGGGCTA CCGGCGCTTC 
101 CTGGCGCTGC TGGGGTCGGC ACTGCTCGTC GGCTTCCTGT CGGTGATCTT 
151 CGCCCTCGTC TGGGTCCTCC ACTACCGAGA GGGGCTTGGC TGGGATGGGA 
201 GCGCACTAGA GTTTAACTGG CACCCAGTGC TCATGGTCAC CGGCTTCGTC 
251 TTCATCCAGG GCATCGCCAT CATCGTCTAC AGACTGCCGT GGACCTGGAA 
301 ATGCAGCAAG CTCCTGATGA AATCCATCCA TGCAGGGTTA AATGCAGTTG 
351 CTGCCATTCT TGCAATTATC TCTGTGGTGG CCGTGTTTGA GAACCACAAT 
401 GTTAACAATA TAGCCAATAT GTACAGTCTG CACAGCTGGG TTGGACTGAT 
451 AGCTGTCATA TGCTATTTGT TACAGCTTCT TTCAGGTTTT TCAGTCTTTC 
501 TGCTTCCATG GGCTCCGCTT TCTCTCCGAG CATTTCTCAT GCCCATACAT 
551 GTTTATTCTG GAATTGTCAT CTTTGGAACA GTGATTGCAA CAGCACTTAT 
601 GGGATTGACA GAGAAACTGA TTTTTTCCCT GAGAGATCCT GCATACAGTA 
651 CATTCCCGCC AGAAGGTGTT TTCGTAAATA CGCTTGGCCT TCTGATCCTG 
701 GTGTTCGGGG CCCTCATTTT TTGGATAGTC ACCAGACCGC AATGGAAACG 
751 TCCTAAGGAG CCAAATTCTA CCATTCTTCA TCCAAATGGA GGCACTG7UVC 
801 AGGGAGCAAG AGGTTCCATG CCAGCCTACT CTGGCAACAA CATGGACAAA 
851 TCAGATTCAG AGTTAAACAA TGAAGTAGCA GCAAGGAAAA GAAACTTAGC 
901 TCTGGATGAG GCTGGGCAGA GATCTACCAT GTAAAATGTT GTAGAGATAG 
951 AGCCATATAA CGTCACGTTT CAAAACTAGC TCTACAGTTT TGCTTCTCCT 
1001 ATTAGCCATA TGATAATTGG GCTATGTAGT ATCAATATTT ACTTTAATCA 
1051 CAAAGGATGG TTTCTTGAAA TAATTTGTAT TGATTGAGGC CTATGAACTG 
1101 ACCTGAATTG GAAAGGATGT GATTAATATA AATAATAGCA GATATAAATT 
1151 GTGGTTATGT TACCTTTATC TTGTTGAGGA CCACAACATT AGCACGGTGC 
1201 CTTGTGCAGA ATAGATACTC AATATGTGAA TATGTGTCTA CTAGTAGTTA 
1251 ATTGGATAAA CIGGCAGCAT CCCTGGCCTG TTGTCATGCA GTCATTTCCT 
1301 GTTAATTCTG GGAGACAATG ATTTCACAAC TAGAGGGAAG CAGTCCTAAA 
1351 AGTTTAAAAT CCGATAAGGA ATATCTGGGA CAGGGTTTAG ATCATGACTC 
1401 TACACAGATA CCATGATGAG AGTATATTAA AGAAATTTAG GAAAGCACCT 
14 51 GGTTCCTTTC TCCCCATGCC TGCCTTCTGC TCCCTCCCCA GCTGGTTTGG 
1501 GCTCAAATTG TCCCTGGAGA CTAGGGTTTA TGTTAGGGTA TTGATAGATT 
1551 AGAGCAGGTG GTTGAAGAGA TCTTCTCTGG TCAGACTTGG AAGAATTTCC 
1601 AAAAGTGAAG TTAGCCCCAA GACTTCCCTA GGGTTGATGT ACTTTATGAT 
1651 CCAGATGCTA AACTTCTTAG AATGAAAATA TGCTTCAACA CTTT^GTAGC 
1701 ATACACTGCC CTACAAACCT CAGAGAGCAC TTTTCCCCAA GTTCTTGTTT 
1751 TTATTTTTGA AAGTACTCAC ACAGCACTTA CTATGCTCCA AACACTCCTC 
1801 TAAGCACTTT ACACATATTA GCTCATTCAG TCCCCAGACA GACGGGATGA 
1851 AGTAGGTATT GTTACTGTTC CCATTTTACA GGTGAGAGAT TTGAAGCCTG 
1901 GGGAGGCTAG TAACTCACCC CAAGGTCACA CGGCTCATAC ATGGTGGGAC 
1951 TGAGACTCAG ATGCAGGCAG TCTGGCACCT CAGTCTGGAT TCTAACCATT 
2001 TCACTAAGCT ATTTTTGTCT TGTACTACTT TGACCCACCC CTGAATAT^C 
2051 CTCAATTGCT GGAGTGGGGT GTAGTTATTA AAGGGATGCT TTTTACCTTT 
2101 TGCTGTCTGC TGTGGCAGAT TCCCCAGATA ACCAAGGAAA AGGGGCCACC 
2151 CATACCTGGA AATAGGCCAT AGGGCCCCTA CTACTGCCAA CAAGCCATGG 
2201 CCTACCTTGA CACTTGTTTG ATCTTAAAAT TGTGTCTTGG TAACAAAAGA 
2251 TTTGGACAGG CATATCTGTA GCTTTCAAGT TAATTAATTG CAATATTTTT 
2301 TTCTTCAGGA TTTTAGCTGC TGAACAACTT TCAGTTTGGA GCTAAAAGAG 
2351 ACCTGTCTCA TGGTCTGCCC TTCCCTGGGG CAATAGCTAG GGTCTTTCCT 
2401 GATTTTTATG GAATTTTAGG GGATATTTTG AGCTTTGGGT TCTCAGTAGT 
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2451 GAATTGAGAC TTGGAGGTGA CTTTTCATGT TTGGAGTATC ATCTCTGTCT 
2501 GGGCTCTGGG CTGACAAATT AAAACCTAGA GTAGTGCTTA TGCTGAAATG 
2551 ATACTTTTCA TTTTTTGGTT GATTTTTTTG CCTTCCCTTC AATTTTAAAC 
2601 TGAAGCATTT TAATGTGGGT AGAAACTCTA CACCAAATAC ACTAAACATT 
2651 TTGGTGCTTA GTGGATTTCT TTTTAGGTAA CTGGTACTTA CTTCCAAAGA 
2701 CTGAATACAA GCCACACTCC ATCATATCCC TTAAACTTCA TGAAAAACCA 
2751 TTCAAGATCC CCTTGCTGCA ACACTGTTCT CTTCTTCTCT ACTAAATTCT 
2801 ATTTCCAAAA TTGGTAATAG AGCCAGAAGG ATCCCCAGTA CCCAGCCCTC 
2851 TGCCTGGCAC AAAGTGGTAG CACTU^TTAAA TTCAGTATGG GTGGAGCATG 
2901 GTACAGTCTT GGTGCCATAG AAGGAGTAGT TGCATAGTCA CACATCATTT 
2951 GATAAGTTGG ATGTTCCATT ACATAGAGGA ACACAAAATT CCAGGGTTTT 
3001 TGGAGGAAGG GATTAGATAG CGACTAAGCC GCCAGAATTG AGGTGGCCAT 
3051 rCCTTTTTGT ATAGGCTAAG AAACAGGTTA TCAGTGAAAA GTTAATTATG 
3101 GCTTTGGCAC TAGAATAGCA CTGTTGCAAA GTATTTAAGC ACCCCCCATC 
3151 TCAGCCCTTT ATTTTATCTT TCATGTGGGC TAATGTGAGG ATAATCTTAC 
3201 AGATATTATA GGAATTTCTT TTCTATCTTT ATGAAAACAA CGTATATAAA 
3251 ATATATCTAG AAAACCTTTG TTTGAGACTC TTATTTAATG GGCTTTTGAT 
3301 TCTAATGATA ATTGTACCTT TATCTTTCAA AAGCTGATAT TTCCTACCTA 
3351 AGCATCTCCC GAGAAAAATA TCTCATTAAA AAGCCCATAA ATAATAGGGG 
3401 AGAAGAAAGC CTTAGGTATC AATTCCAAAA CAGTGATTGA AATTTCCCAA 
3451 AATAATTATG GCTTCTGTCA TCTCCAGAGA TAATCTGGCT TGGTTTACCC 
3501 CATAATCTAA TTTCAGAAAA GAAAGCTTTA TTTTAACACT CATCTGAATC 
3551 AACATTAAAG CCTTTTCTCT CAAAGCGTTT ATTGAGAAAC TCAAATGAAT 
3601 ATACTTTTTG AATTACTGTC ATCAAAAGTG TACGGCTTCC TGTGCTGCTT 
3651 GTGTCAAATG GAACCTGCCC TCTAAAGCAC TTTCTTTCCT TTACTTGCGT 
3701 GGTTTCATGT AAGCTGTGCT GTTTAGAAAC AACATCTCAG ACTTTACAAA 
3751 GAAA TGAC AA AGAAGGCAAT TGCACTTTTT AAGGGATATC GACAAGCAGT 
3801 TTCTGTTTTC TAAAGGACAA AATACAGAGT GTGTGTCATT TTTAATTAGA 
3851 TTCTTTCCCC TGCTGAGTTG GAAATTCCAG TGCAGCACTG ATTGACCACA 
3901 GTTGCCAATC TAAAAGCACA AAGACAGAAG TAAAGCTTTA TGCTAATTTT 
3951 ATTTCAATAT GATAGAAAAT TTATCTTGGT ATGTCCTTTT TTAGATAACT 
4001 CCAGCAGGAA ACTGTAACTG CTATGTCTTT AGGAAAACGT AGAAGAAAGA 
4051 ACATTATTAT TCTTTAATTC CTACAAGGTA CTTGAAAACC TTAAGTGAAA 
4101 AAGATTTCTA TCTTTTTATC TTGGCGCATT TATGGAAAAA ATATTAACTG 
4151 TCCTGAATAT TTTATAATTT TGTAGGAAAA ATATGCATCT ATTTTTTCTT 
4201 GACTTCTTTT ATATAGTAAT AAAAGTTATT TTGGAAAAAA AAAAAAAAAA 
4251 AAAA 


BLAST Results 


Entry HSG20626 from database EMBL: 
human STS A005Z27. 
Score « 860, P » 3.0e-32, identities = 176/181 


Medline entries 


89030633: 

The structure of cytochrome b561. a secretory vesicle-specific electron 
transport protein. 


Peptide information for frame 2 


ORF from 74 bp to 931 bp; peptide length: 286 
Category: strong similarity to known protein 
Classification: unset 


1 MAMEGYRRFL ALLGSALLVG FLSVIFALVW VLHYREGLGW DGSALEFNWH 

51 PVLMVTGFVr IQGIAIIVYR LPWTWKCSKL LMKSIHAGLN AVAAILAIIS 

101 WAVFENHNV MNIANMYSLH SWVGLIAVIC YLLQLLSGFS VFLLPWAPLS 

151 LRAFLMPIHV YSGIVIFGTV lATALMGLTE KLIFSLRDPA YSTFPPEGVF 

201 VNTLGLLILV FGALIFWIVT RPQWKRPKEP NSTILHPNGG TEQGARGSMP 

251 AYSGNNMDKS DSELNHEVAA RKRNLALDEA GQRSTM 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKrZphfbr2_7e22, frame 2 
SW1SSPR0T:C561_SHEEP CYTOCHROME B561 (CYTOCHROME B-561) . , N • 1, Score 
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° 460, P = 1.3e-43 

PIR:S01167 cytochrome b561 - bovine, N « 1, Score = 457, p = 2.7e-43 

SWISSPROT:C561 PIG CYTOCHROME B561 (CYTOCHROME B-561)., N « 1, Score = 
452, P = 9.1e-43 

PIR:S53321 cytochrome B561 - human, N « 1, Score » 451, P = 1.2e-42 


>SWISSPROT:C561__SHEEP CYTOCHROME B561 (CYTOCHROME B-561). 
Length - 252 

HSPs: 

Score = 460 (69,0 bits). Expect = 1.3e~43, P = 1.3e-43 
Identities = 96/218 (44%), Positives = 131/218 (60%) 


Query: 18 LVGFLSVIFALVWVLHYREGLGWDGSALEFNWHPVLMVTGFVFIQGIAIIVYRLPWTWKC 77 

L+G V W+ YR G+ W+ SAL+FN HP+ MV G VF+QG A++VYR+ 

Sbjct: 23 LLGLTVVAMTGAWLGMYRGGIAWE-SALQFNVHPLCMVIGLVFLQGDALLVYRV— FRNE 7 9 

Query: 78 SKLLMKSIHAGLNAVAAILAIISWAVFENHNVNNIANMYSLHSWVGLIAVICYLLQLLS 137 

+K K +H L+ A ++A++ +VAVFE+H A++YSLHSW G++ + 0 L 

Sbjct: 80 AKRTTKVLHGLLHVFAFVIALVGLVAVFEHHRKKGYADLYSLHSWCGILVFALFFAQWLV 139 

Query: 138 GFSVFLLPWAPLSLRAFLMPIHVYSGIVIFGTVIATALMGLTEKLIFSLRDPAYSTFPPE 197 

GFS FL P A SLR+ P HV+ G IF +ATAL+GL E L+F L YSTF PE 
Sbjct: 140 GFSFFLFPGASFSLRSRYRPQHVFFGAAIFLLSVATALLGLKEALLFEL-GTKYSTFEPE 198 

Query: 193 GVFVNTLGLLILVFGALIFWIVTRPQWKRPKEPNSTIL 235 

GV N LGLL+ f ++ +I+TR WKRP + L 
Sbjct: 199 GVLANVLGLLLAAFATVVLYILTRADWKRPLQAEEQAL 236 

Pedant information for DKFZphfbr2_7e22, frame 2 


Report for DKrzphfbr2_7e22.2 

(LENGTH) 286 
(MW] 31638.58 
tpIJ 9.12 

(HOMOLJ SWISSPR0T:C561_SHEEP CYTOCHROME 8561 (CYTOCHROME B-561). 4e-40 

(PIRKW) transmembrane protein 9e-40 

[KWJ SIGNAL_PEPTIDE 40 

[KW] TRANSMEMBRANE 5 

fKW} LOW COMPLEXITY 4 . 90 % 

SEQ mamegyrrflallgsallvgflsvi falvwvlhyreglgwdgsalefnwhpvlmvtgfvf 

SEG 

PRO ccchhhhhhhhhhhhhhhhhhhhcchhhhhhhhhccccccccccccccccchhhhhhhhh 

MEM MMMMMMMMMMMM 

SEQ iqgiaiivyrlpwtwkcskllmksihaglnavaailaiisvvavfenhnvnnianmyslh 

SEG xxxxxxxxxxxxxx 

pro ccccceeeeeccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccceeecc 

MEM MMMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ swvgliavicyllqllsgfsvfllpwaplslraflmpihvysgivifgtviatalmglte 

SEG 

pro cccchhhhhhhhhhhhhhheeeeccccccccccccccceeeeeeeeeeehhhhhhhhhhh 

MEM .... mmmmmmmmmmmmmmmmmmmmm mmmmmmmmmmmmmmmmmmmmm . . . 

SEQ klifslrdpaystfppegvfvntlgllilvfgali fwivtrpqwkrpkepnstilhpn(;g 

SEG 

pro hhhhhhhccccccccccchhhhhhhhhhhhhhhheeeeeecccccccccccccccccccc 

MEM .MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ teqgargsmpaysgnnmdksdselnnevaarkrnlaloeagqrsth 

SEG 

pro cccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhcccc 

MEM 


(No Prosite data available for DKFZphfbr2_7e22 . 2) 
(No Pfam data available for DKFZphfbr2_7e22.2) 
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DKFZphfbr2_7j4 


group: brain derived 

DKFZphfbr2__7j4 encodes a novel 233 amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes » 


unknown 

complete cDNA, complete cds, 1 EST hit 

Sequenced by GBF 

Locus : unknown 

Insert length: 1050 bp 

Poly A stretch at pos. 1027, polyadenylation signal at pos. 1007 


1 GGGGACACAA AGGGGTGGTC ACCCTGCCCT CACCTTGACC TGTAAGTTGC 
51 CTAGGACAGT GGCCTGGTCC CAGGGGCTGT TGTGGGGAGT TGAAGAACAC 
101 CCTGGCCTCC TCCATCATGT CGGCCAAGAG GGCAGAATTG AAGAAAACAC 
151 ATCTGTGCAA GAACTACAAG GCAGTTTGCC TGGAATTGAA GCCAGAGCCG 
201 ACCAAAACAT TTGATTACAA AGCAGTTAAA CAAGAAGGGC GGTTTACCAA 
251 AGCAGGAGTG ACACAGGACC TAAAGAATGA ACTCAGGGAA GTGAGAGAAG 
301 AGCTCAAGGA GAAAATGGAG GAGATAAAAC AGATAAAGGA TCTAATGGAC 
351 AAGGATTTTG ATAAACTTCA CGAATTTGTG GAAATTATGA AGGAAATGCA 
4 01 GAAAGATATG GATGAGAAGA TGGACATTTT AATAAATACA CAGAAGAACT 
451 ATAAGCTTCC CCTTAGAAGA GCACCAAAGG AGCAGCAGGA ACTCAGGCTG 
501 ATGGG7UVAGA CTCACAGAGA ACCACAGCTC AGGCCCAAGA AAATGGATGG 
551 AGCCAGTGGA GTCAATGGAG CACCCTGTGC TCTTCACAAG AAGACGATGG 
601 CACCACAAAA AACAAAACAG GGCTCACTGG ATCCCCTTCA TCACTGTGGG 
651 ACCTGCTGCG AGAAATGTTT GTTGTGTGCT CTAAAGAACA ACTACAATCG 
701 GGGGAACATT CCTTCAGAGG CCTCAGGCCT TTACAAAGGT GGAGAGGAGC 
751 CAGTGACCAC CCAACCTTCT GTGGGCCACG CTGTGCCTGC CCCAAAGTCC 
801 CAGACTGAGG GAAGGTGAAG CTTAACTGCC AGCTTGAAAT GAGAGTAAAG 
851 AAGATACAGA GCAAACAGTG TTTCAGAAAC TGTCCTGCCC TGGGTGTGAT 
901 TCTTTGGCTT CAATTTGAAG GAGGAGGAAT GATGGGATTT CATATTTTAT 
951 TTCACACCAG TTCCTCCTTG TTTCATCTCT TTGCTAAGCT GGCTGCTTCT 
1001 ACCATCTAAT AAATAATTGG CCAAGTTAAA AAAAAAAAAA AAAAAAAAAA 


ORF from 117 bp to 815 bp; peptide length: 233 
Category: putative protein 


1 MSAKRAELKK THLCKNYKAV CLELKPEPTK TFDYKAVKQE GRFTKAGVTQ 
51 DLKNELREVR EELKEKMEEI KQIKDLMDKD FDKLHEFVEI MKEMQKDMDE 
101 KMDILINTQK NYKLPLRRAP KEQQELRLMG KTHREPQLRP KKMDGASGVN 
151 GAPCALHKKT MAPQKTKQGS LDPLHHCGTC CEKCLLCALK NNYNRGNIPS 
201 EASGLYKGGE EPVTTQPSVG HAVPAPKSQT EGR 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 3 


BLASTP hits 


Entry 
major 
Score 


JC2223 from database PIR: 

surface glycoprotein 3 - Pneumocystis carinii (fragment) 

« 109, P = 3.5e-04, identities = 41/136, positives = 67/136 
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Alert BLASTP hits for DKF2phfbr2_7 j4, frame 3 

TREMBLNEW:PCP115C_1 product: "P115C"; Pneumocystis carinii mRNA for 
P115C, partial sequence., N - 1, Score « 109, P - 0.00024 


>TREMBLNEW:PCP115C__1 product: -PllSC"; Pneumocystis carinii mRNA for P115C, 
partial sequence. 

Length =196 

HSPs: 

Score = .109 (16.4 bits). Expect = 2.4e-04, P = 2.4e-04 
Identities = 41/134 (30%), Positives = 67/134 (50%) 

Query; 14 CKN-YKAVCLELKPEPTKTFDYKAVKQEGRFTKA-GVTQDLKNELREVREELKEKMEEIK 71 

CK K C ELK + K VK+ TK G ++LK+++++ E KE++E K 

Sbjct: 22 CKTELKKYCEELKEADGLKVNDK-VKEICDDTKRDGKCKELKDKVKKELETFKEELE— K 78 

Query: 72 QIKDLMDKDFDKLHEFVEIMKEMQKDMDEKMDILINTQKNYKLPLRRAPKEQQELRLMGK 131 

+KD+ D++ +K E ++4E D D K + + + YKL +R E LR +GK 
Sbjct: 79 ALKDIKDENCEKYEEKCILLEETNHD-DVKKNCVKLREGCYKLKRKRVA-EDLLLRALGK 136 

Query: 132 THREPQLRPKKMDGAS 147 

+ 4 K D S 
Sbjct: 137 DVKNGECEKKMKDVCS 152 


Pedant information for DKFZphfbr2_7j4, frame 3 


Report for DKFZphfbr2_7j4 .3 


[LENGTH) 233 

[MW] 26533.95 

fpl] 9.18 

[PROSITEJ MYRISTYL 3 

[PROSITE) CK2_PH0SPH0_SITE 3 

(PROSITEJ PKC_PHOSPHO SITE 3 

[KW] All^Alpha 

[KW] LOWCOMPLEXITY 14.59 % 

[KW] COILED COIL 13.73 % 


SEQ MSAKRAELKKTHLCKNYKAVCLELKPEPTKTFDYKAVKQEGRFTKAGVTQDLKNELREVR 

SEG xxxxxxxxx 

PRD ccchhhhhhhhhhccchhhhhhhcccccccccccceeecccccccccccchhhhhhhhhh 

COI LS cccccccccccc 

SEQ EELKEKMEEIKQIKDLMDKDFDKLHEFVEIMKEMQKDMDEKMDILINTQKNYKLPLRRAP 

SEG xxxxxxxxx xxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhcchhhhhhhhhhhhhhhhhhchhhhhhhhhcccccccccccc 

COILS cccccccccccccccccccc. 

SEQ KEQQELRUIGKTHREPQLRPKKMDGASGVNGAPCALHKKTMAPQKTKQGSLDPLHHCGTC 

SEG 

PRD hhhhhhhhhccccccccccccccccccccccccchhhhhhcccccccccccccccccccc 
COILS 

SEQ CEKCLLCALKNNYNRGNIPSEASGLYKGGEEPVTTQPSVGHAVPAPKSQTEGR 

SEG 

PRD chhhhhhhccccccccccccccccccccccccccccccccccccccccccccc 

COILS 


Prosite for DKF2phfbr2_7j4 .3 


PS00005 


2->5 

PS00005 

108 

->111 

PS00005 

132 

->135 

PS00006 

132 

->136 

PS00006 

179-M83 

PS00006 

228' 

->232 

PS00008 

151- 

->157 

PS00008 

196- 

->202 

PS00008 

204- 

->210 


PKC_PHOS PHO_S I TE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO SITE 
CK2 PHOSPHO*"SITE 
CK2~PHOSPHO"SITE 
CK2_PHOSPH0~SITE 
MYRISTYL ~ 
MYRISTYL 
MYRISTYL 


PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00Q08 
PDOC00008 
PDOC00008 


(No Pfam data available for DKFZphfbr2_7j4 .3} 
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DKFZphfbr2_82c20 


group: transmembrane protein 

DKFZphfbr2_B2c20 encodes a novel 492 amino acid protein with very weak similarity to C. 
elegans cosmid D1007. 

The novel protein contains 7 transmembrane regions. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes and as a new marker for neuronal cells. 


similarity to C. elegans D1007.5 ; 
membrane regions : 1 

Summary DKFZphfbr2 82c20 encodes a novel 492 amino acid protein with 
similarity to a hypothetical C. elegans protein. 


similarity to C. elegans D1007.5 

complete cDNA (Bp 1-100 GC ritch), complete cds, 
potential start at Bp 128 matches Kozak consensus PyNNatgG, 
EST hits, localisation? primer B of STS doesn't match perfect! 
TRANSMEMBRANE 7 

Sequenced by DKFZ 

Locus: /map=-109.9 CR from top of Chrl linkage group**??? 
Insert length: 1804 bp 

Poly A stretch at pos. 1794, no polyadenylation signal found 


1 CGGCGGGAGC GCGCGGCTGA TACCCGGGAC TGGGCTGCGG CGGTTAGTCC 
51 TCTCCCGGCC GCCGTCGCCT CCGACATATT GCTCGCAGGA GCTGCGGCGG 
101 CGAAGCGGAG AGCACCGGGG GGAGGAGATG GGAGGACGAA GAGGTCCCAA 
151 CAGGACATCT TACTGTCGAA ATCCGCTCTG TGAGCCGGGA TCCTCGGGGG 
201 GCTCTAGTGG AAGCCACACT TCCAGTGCAT CGGTGACCAG TGTTCGTTCC 
251 CGCACCAGGA GCAGTTCTGG AACAGGCCTC TCCAGCCCTC CTCTGGCCAC 
301 CCAAACTGTT GTGCCTCTAC AGCACTGCAA GATCCCCGAG CTGCCAGTCC 
351 AGGCCAGCAT TCTGTTTGAG TTGCAGCTCT TCTTCTGCCA GCTCATAGCA 
401 CTCTTCGTCC ACTACATCAA CATCTACAAG ACAGTGTGGT GGTATCCACC 
451 TTCCCACCCA CCCTCCCACA CCTCCCTGAA CTTCCATCTG ATCGACTTCA 
501 ACTTGCTGAT GGTGACCACC ATCGTTCTGG GCCGCCGCTT CATTGGGTCC 
551 ATCGTGAAGG AGGCCTCTCA GAGGGGGAAC GTCTCCCTCT TTCGCTCCAT 
601 CCTGCTGTTC CTCACTCGCT TCACCGTTCT CACGGCAACA GGCTGGAGTC 
651 TGTGCCGATC CCTCATCCAC CTCTTCAGGA CCTACTCCTT CCTGAACCTC 
701 CTGTTCCTCT GCTATCCGTT TGGGATGTAC ATTCCGTTCC TGCAGCTGAA 
/ 751 TTGCGACCTC CGCAAGACAA GCCTCTTCAA CCACATGGCC TCCATGGGGC 

801 CCCGGGAGGC GGTCAGTGGC CTGGCAAAGA GCCGGGACTA CCTCCTGACA 
851 CTGCGGGAGA CGTGGAAGCA GCACACAAGA CAGCTGTATG GCCCGGACGC 
901 CATGCCCACC CATGCCTGCT GCCTGTCACC CAGCCTCATC CGCAGTGAGG 
951 TGGAGTTCCT CAAGATGGAC TTCAACTGGC GCATGAAGGA AGTGCTCGTC 
1001 AGCTCCATGC TGAGCGCCTA CTATGTGGCC TTTGTGCCTG TCTGGTTCGT 
1051 GAAGAACACA CATTACTATG ACAAGCGCTG GTCCTGTGAA CTCTTCCTGC 
1101 TGGTGTCCAT CAGCACCTCC GTGATCCTCA TGCAGCACCT GCTGCCTGCC 
1151 AGCTACTGTG ACCTGCTGCA CAAGGCCGCC GCCCATCTGG GCTGTTGGCA 
1201 GAAGGTGGAC CCAGCGCTGT GCTCCAACGT GCTGCAGCAC CCGTGGACTG 
1251 AAGAATGCAT GTGGCCGCAG GGCGTGCTGG TGAAGCACAG CAAGAACGTC 
1301 TACAAAGCCG TAGGCCACTA CAACGTGGCT ATCCGCTCTG ACGTCTCCCA 
1351 CTTCCGCTTC CATTTCTTTT TCAGCAAACC TCTGCGGATC CTCAACATCC 
1401 TCCTGCTGCT GGAGGGCGCT GTCATTGTCT ATCAGCTGTA CTCCCTAATG 
1451 TCCTCTGAAA AGTGGCACCA GACCATCTCG CTGGCCCTCA TCCTCTTCAG 
1501 CAACTACTAT GCCTTCTTCA AGCTGCTCCG GGACCGCTTG GTATTGGGCA 
1551 AGGCCTACTC ATACTCTGCT AGCCCCCAGA GAGACCTGGA CCACCGTTTC 
1601 TCCTGAGCCC TGGGGTCACC TCAGGGACAG CGTCCAGGCT TCAGCCAAGG 
1651 GCTCCCTGGC AAGGGGCTGT TGGGTAGAAG TGGTGGTGGG GGGGACAAAA 
1701 GACAAAAAAA TCCACCAGAG CTTTGTATTT TTGTTACGTA CTGTTTCTTT 
1751 GATAATTGAT GTGATAAGGA AAAAAGTCCT ATTTTTATAC TCCCAAAAAA 
1801 AAAA 


BLAST Results 


Entry HS285343 from database EMBL: 
human STS WI-17488. 
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Score = 1225, P * 1.3e-50, identities = 263/281 


Medline entries 


No Medline entry 


Peptide information for frame 2 


1 MGGRRGPNRT SYCRNPLCEP GSSGGSSGSH TSSASVTSVR SRTRSSSGTG 

51 LSSPPLATQT VVPLQHCKIP ELPVQASILF ELQLFFCQLI ALFVHYINIY 

101 KTVWWYPPSH PPSHTSLNFH LIDFNLLMVT TIVLGRRFIG SIVKEASQRG 

151 KVSLFRSILL FLTRFTVLTA TGWSLCRSLI HLFRTYSFLN LLFLCYPFGM 

201 YIPFLQLNCD LRKTSLFNHM ASMGPREAVS GLAKSRDYLL TLRETWKQHT 

251 RQLYGPDAMP THACCLSPSL IRSEVEFLKM DFNWRMKEVL VSSMLSAYYV 

301 AFVPVWFVKN THYYDKRWSC ELFLLVSIST SVILMQHLLP ASYCDLLHKA 

351 AAHLGCWQKV DPALCSKVLQ HPWTEECMWP QGVLVKHSKN VYKAVGHYNV 

401 AIPSDVSHFR FHFFFSKPLR ILNILLLLEG AVIVYQLYSL MSSEKWHQTI 

451 SLALILFSNY YAFFKLLRDR LVLGKAYSYS ASPQRDLDHR FS 

ORF from 128 bp to 1603 bp; peptide length: 492 
Category: similarity to unknown protein 
Prosite motifs: LEUCINE_ZIPPER (210-232) 
LEOCINE_ZIPPER (210-232) 


BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_82c20, frame 2 

TREMBL:CEAF3151_8 gene: "D1007.5"; Caenorhabditis elegans cosmid 
D1007,, N = 2, Score = 247, P = 4.6e-29 

>TREMBL;CEAF3151 8 gene: "01007. 5"; Caenorhabditis elegans cosmid D1007. 
Length =512 

HSPs: 

Score = 247 (37.1 bits). Expect = 4.6e-29, Sura P(2) = 4.6e-29 
Identities = 58/204 (28%), Positives = 102/204 (50%) 

<3uery: 291 VSSMLSAYYVAFVPVWFVKNTHYYDKRWSCELFLLVSISTSVILMQHLLPASYCDLLHKA 350 

+S ML 4V F + ++ W C+L ++V ++ + + +L P +Y DLLH+A 

Sbjct: 299 LSIMLPCIFVPFKTSQGIPQKILINEVWECQLAIWGLTAFSLYVAYLSPLNYLDLLHRA 358 

Query: 351 AAHLGCWQKVD-PAL CSNVLQHPWTEECMWP(3GVLVKHSKN-VYKAVGHYNV 400 

A HLG W +++ P + + PW+E C++ G V+ Y+A ++ 

Sbjct: 359 AIHLGSWHQIEGPRIGHTGSMSSAPTPWSEFCLYNDGETVQMPDGRCYRAKSSNSIRTVA 418 

Query: 401 AIPSDVSHFRFHFFFSKPLRILNILLLLEGAVIVYQLYSLMSSEKWHQTI SLALILFSNY 460 

A P H F KP ++NI+ E +1 Q + L+ + W ++ L++F+NY 

Sbjct: 419 AHPESSRHNTFFKVLRKPNNLINIMCSFEFLLIFIQFWMLVLTNDWQHIVTFVLLMFANY 478 

Query: 461 YAFFKLLRDRLVLGKAYSYSASPQRDL 487 

F KL +D+++L + Y S Q DL 
Sbjct: 479 LLFAKLFKDKIILSRIYEPS QEDL 502 

Score - 178 (26.7 bits). Expect = 4.3e-21, Sum P(2) = 4.3e-21 
Identities = 50/179 (27%), Positives = 90/179 (50%) 

Query: 262 HACCLSPSLIRSEVEFLKMDFNWRMKEVLVSSMLSAYYVAFVPVWFV— KNTHYYDKR— 317 

H C SP+ IR E++ L D R+K+ +■ + + +A+ +P FV K + ++ 
Sbjct: 262 HMCSDSPAQIREEIQVLIDDLVLRVKKSIFAGVSTAFLSIMLPCIFVPFKTSQGIPQKIL 321 

Query: 318 WSCELFLLVSISTSVILMQHLLPASYCDLLHKAAAHLGCWQKVD-PAL— -CSNV 368 

W C+L ++V ++ + + +L P +Y DLLH+AA HLG W +++ P + + 
Sbjct: 322 INEVWECQLAIWGLTAFSLYVAYLSPLNYLDLLHRAAIHLGSWHQIEGPRIGHTGSMSS 381 

Query: 369 LQHPWTEECMWPQGVLVKHSKN-VYKAVGHYNV-AIPSDVSHFRFHFFFSKPLRILNILL 426 

PW+E C++ G V+ Y+A ++ + + R + FF K LR N L+ 

Sbjct: 382 APTPWSEFCLYKDGETVQMPDGRCYRAKSSNSIRTVAAHPESSRHNTFF-KVLRKPNNLI 440 


346 


wo 01/12659 


PCT/IBOO/01496 


Score = US {21.9 bits). Expect = 4.6e-29, Sum P{2) = 4.6e>29 
Identities = 34/85 (39%), Positives = 50/86 (58%) 

Query: 52 SSPPLATQTVVPLQHCKIPELP-VQASILFELQLFFCQLIALFVHYINIYKTVWWYPPSH 110 

+S P A+ + + H P+4- Q + FE LF ++ALF+ Y+NIYKT+WW P S+ 
SbjCt: 19 ASIPRASGVTLSV-HPIWPDIQFT(2GELFFECTLFLYSVLALFLQYLNIYKTLWWLPKSY 77 

Query: 111 PPSHTSLNFHLIDFNLLMVTTIVLGRR 137 

H SL FHLI+ L ++LG R 
Sbjct: 78 — WHYSLKFHLINPYFLSCVGLLLGWR 102 

Score « 39 (5.9 bits). Expect = 6.8e-18, Sum P(2) - €.8e-18 
Identities = 12/41 (29%), Positives = 20/41 (48%) 

Query: 154 LFRSILLFLTRFTVLTATGWSLCRSLIHLFRTYSFLNLLFL 194 

L+ + LFL ++ + T W L +S H + +N FL 
Sbjct: 53 LYSVLALFL-QYLNIYKTLWWLPKSYWHYSLKFHLINPYFL 92 


Pedant information for DKFZphfbr2_82c20, frame 2 


Report for DKF2phfbr2_82c20 .2 


(LENGTH) 

492 


[MW] 

56274.05 


(PU 

9.51 


[HOMOL] 

TREMBL : CEAF31 5 1_8 

gene: "01007.5"; Caenorhabditis elegans cosmid D1007. 4e 

(PROSITEJ 

LEUCINE ZIPPER 1 


(PROSITEJ 

AMI DAT ION 2 


[PROSITE] 

MYRISTYL 5 


[PROSITEJ 

CAMP PHOSPHO SITE 

2 

( PROSITE] 

CK2 PHOSPHO SITE 

3 

[PROSITE) 

GLYCOSAMINOGLYCAN 

1 

[PROSITE) 

PKC PHOSPHO SITE 

5 

[PROSITE) 

ASN GLYCOSYLATION 

1 

(KWl 

TRANSMEMBRANE 7 


fKWl 

LOW_COMPLEXITY 

8.74 % 


SEQ MGGRRGPNRTSYCRNPLCEPGSSGGSSGSHTSSASVTSVRSRTRSSSGTGLSSPPLATQT 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRO ccccccccccccccccccccccccccccccccccccceeeccccccccccccccccccee 

MEM 

SEQ WPLQHCKIPELPVQASILFELQLFPCQLIALFVHYINIYKTVWWYPPSHPPSHTSLNFH 

SEG 

PRD eeeccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhheeeeeccccccccceeeeee 

MEM MMMMMMMMMMMMMMMMMMMMMMMMHMMMM MMMMMMMMM 

SEQ LIDFNLLMVTTIVLGRRFIGSIVKEASQRGKVSLFRSILLFLTRFTVLTATGWSLCRSLI 

SEG 

PRD eeehhhhhhhhhhhhheeeehhhhhhhcccchhhhhhhhhhhhhhhhhhcccchhhhhhh 

MEM MMMMMMMMMMMMMMNMM MMMMMMMMMMMMMMMMMMMNMMMMMM 

SEQ HLFRTYSFLNLLFLC YPFGMYI PFLQLNCDLRKTSLFNHMASMGPREAVSGLAKSRDYLL 

SEG 

PRD hhhhhhhhheeeeeeecccccceeeeccccchhhhhhhhhhccchhhhhhlihhhhhhhhh 

MEM 

SEQ TLRETWKQHTRQLYGPDAMPTHACCLSPSLIRSEVEFLKMDFNWRMKEVLVSSMLSAYYV 

SEG 

PRD hhhhhhhhhhhhccccccccccccccccchhhhhiihhhhhhhhhhhhhiihhhcchhhhhh 

MEM MMMMMMMMMMMMMMMMM 

SEQ AFVPVWFVKNTHYYDKRWSCELFLLVSISTSVILMQHLLPASYCDLLHKAAAHLGCWQKV 

SEG 

PEID heeeeeeeeccccccchhhhhhhhhhhcchhhhhhhhhhccchhhhhhhhhhhhhhhccc 

MEM MMMMMMMMM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ DPALCSNVLQHPWTEECMWPQGVLVKHSKNVYKAVGHYNVAIPSDVSHFRFHFFFSKPLR 

SEG XX 

PRD ccccccccccccccceeecccceeeeeccceeeeccccccccccccccceeeeeecccch 

MEM MMMMMMMMMM 

SEQ ILNILLLLEGAVIVYQLYSLMSSEKWHQTISLALILFSNYYAFFKLLRDRLVLGKAYSYS 

SEG xxxxxxxx 

PRD hhhhhhhhhhheeeeehhhhhcccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccc 

MEM MMHHMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMM 
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SEQ ASPQRDLDHRFS 

SEG 

PRD ccchhhhhhccc 

MEM 


Prosite for DKFZphfbr2_82c20.2 


PSOOOOl 

8->12 

PS00002 

47->51 

PS00004 

212->216 

PS00004 

316->320 

PS00005 

38->41 

PS00005 

147->150 

PS00003 

24X->244 

PS00005 

245->248 

PS00005 

443->446 

PS00006 

241->245 

PS00006 

273->277 

PS00006 

342->346 

PS00008 

21->27 

PS00008 

24->30 

PS00008 

28->34 

PS00008 

48->54 

PS00008 

231->237 

PS00009 

2->6 

PS00009 

134->138 

PS00029 

168'>190 


ASNGLYCOSYLATION 
GL YCOS AMI NOGL YCAN 
CAMP_PHOSPHO_SITE 
CAMP_PHOSPHO_S ITE 
PKCPHOS PHO_SITE 
PKC_PH0SPHO_SITE 
PKC PHOSPHO SITE 
PKC"PHOS PHO^S ITE 
PKCPHOS PHO~S I TE 
C K2_PH0S PHO_S I TE 
CK2 PHOSPHO SITE 
CK2~PH0SPH02SITE 
MYRISTYL 
MYRISTYL 
MYRISTYL 
MYRISTYL 
MYRISTYL 
AMIDATION 
AMIDATION 
LEUCINE ZIPPER 


PDOCOOOOl 
PDOC0OO02 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00009 
PDOC00009 
PDOC00029 


(No Pfam data available for DKFZphfbr2_82c20.2) 
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DKF2phfbr2_82el7 


group: transmembrane protein 

DKFZphfbr2_82el7 encodes a novel 311 amino acid protein with very weak similarity to C. 
elegans cosmid ROIBIO. 

The novel protein contains 6 transmembrane regions. 

No informative BLAST results; No predictive prosite, pfara or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes and as a new marker for neuronal cells. 


similarity to C. elegans "ROIBIO.5" ; 

membrane regions: 6 

Summary DKFZphf br2_82el7 encodes a novel 311 amino acid protein with 
similarity to a hypothetical C. elegans protein. 

similarity to C. elegans "ROIBIO.S" 

complete cDNA, EST HS763158 extendes the sequence, complete cds, EST 
hits 

six potential transmembrane domains 
Sequenced by DKFZ 

Locus: /map»"779_C_?; 818_A_1; 877_c_l; 734_C_12; 760_E_11; 171.7 cR from top of Chrl4 linkage 
group" 

Insert length: 1618 bp 

Poly A stretch at pos. 1608, polyadenylation signal at pos . 1588 

1 CTGATCTAGT GCTTCTCGAA AAAAACCTTC AGGCGGCCCA TGGCTGTCGA 
51 TATTCAACCA GCATGCCTTG GACTTTATTG TGGGAAGACC CTATTATTTA 

101 AAAATGGCTC AACTGAAATA TATGGAGAAT GTGGGGTATG CCCAAGAGGA 

151 CAGAGAACGA ATGCACAGAA ATATTGTCAG CCTTGCACAG AATCTCCTGA 

201 ACTTTATGAT TGGCTCTATC TTGGATTTAT GGCAATGCTT CCTCTGGTTT 

251 TACATTGGTT CTTCATTGAA TGGTACTCGG GGAAAAAGAG TTCCAGCGCA 

301 CTTTTCCAAC ACATCACTGC ATTATTTGAA TGCAGCATGG CAGCTATTAT 

351 CACCTTACTT GTGAGTGATC CAGTTGGTGT TCTTTATATT CGTTCATGTC 

401 GAGTATTGAT GCTTTCTGAC TGGTACACGA TGCTTTACAA CCCAAGTCCA 

451 GATTACGTTA CCACAGTACA CTGTACTCAT GAAGCCGTCT ACCCACTATA 

501 TACCATTGTA TTTATCTATT ACGCATTCTG CTTGGTATTA ATGATGCTGC 

551 TCCGACCTCT TCTGGTGAAG AAGATTGCAT GTGGGTTAGG GAAATCTGAT 

601 CGATTTAAAA GTATTTATGC TGCACTTTAC TTCTTCCCT^ TTTTAACCGT 

651 GCTTCAGGCA GTTGGTGGAG GCCTTTTATA TTACGCCTTC CCATACATTA 

701 TATTAGTGTT ATCTTTGGTT ACTCTGGCTG TGTACATGTC TGCTTCTGAA 

751 ATAGAGAACT GCTATGATCT TCTGGTCAGA AAGAAAAGAC TTATTGTTCT 

801 CTTCAGCCAC TGGTTACTTC ATGCCTATGG AATAATCTCC ATTTCCAGAG 

851 TGGATAAACT TGAGCAAGAT TTGCCCCTTT TGGCTTTGGT ACCTACACCA 

901 GCCCTTTTTT ACTTGTTCAC TGCAAAATTT ACCGAACCTT CAAGGATACT 

951 CTCAGAAGGA GCCAATGGAC ACTGAGTGTA GACATGTGAA ATGCCAAAAA 
1001 CCTGAGAAGT GCTCCTAATA AAAAAGTAAA TCAATCTTAA CAGTGTATGA 
1051 GAACTATTCT ATCATATATG GGAACAAGAT TGTCAGTATA TCTTAATGTT 
1101 TGGGTTTGTC TTTGTTTTGT TTATGGTTAG ACTTACAGAC TTGGAAAATG 
1151 CAAAACTCTG TAATACTCTG TTACACAGGG TAATATTATC TGCTACACTG 
1201 GAAGGCCGCT AGGAAGCCCT TGCTTCTCTC AACAGTTCAG CTGTTCTTTA 
1251 GGGCAAAATC ATGTTTCTGT GTACCTAGCA ATGTGTTCCC ATTTTATTAA 
1301 GAAAAGCTTT AACACGTGTA ATCTGCAGTC CTTAACAGTG GCGTAATTGT 
1351 ACGTACCTGT TGTGTTTCAG TTTGTTTTTC ACCTATAATG AATTGTAAAA 
1401 ACAAACATAC TTGTGGGGTC TGATAGCAAA CATAGAAATG ATGTATATTG 
1451 TTTTTTGTTA TCTATTTATT TTCATCAATA CAGTATTTTG ATGTATTGCA 
1501 AAAATAGATA ATAATTTATA TAACAGGTTT TCTGTTTATA GATTGGTTCA 
1551 AGATTTGTTT GGATTATTGT TCCTGTAAAG AAAACAATAA TAAAAAGCTT 
1601 ACCTACATAA AAAAAAAA 


BLAST Results 


Entry HS98114 6 from database EMBL: 
human STS Wl-6253. 
Length - 208 
Minus Strand HSPs: 

Score - 1040 (156.0 bits). Expect « 1.9e-40, P = 1.9e-40 
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Identities « 208/208 (100%), Positives - 208/208 (100%), Strand « Minus 
/ Plus 

Entry HSG20716 from database EMBL: 
human STS A006D06. • 
Length = 195 
Minus Strand HSPs: 

Score - 975 (146.3 bits). Expect * l,8e-37, P = 1.8e-37 

Identities » 195/195 (100%), Positives = 195/195 (100%), Strand = Minus 

/ Plus 


Medline entries 


No Medline entry 


Peptide information for frame 1 


1 MAVDIQPACL GLYCGKTLLF KNGSTEIYGE CGVCPRGQRT NAQKYCQPCT 
51 ESPELYDWLY LGFMAMLPLV LHWFFIEWYS GKKSSSALFQ HTTALFECSM 
101 AAIITLLVSD PVGVLYXRSC RVLMLSDWYT MLYNPSPDYV TTVHCTHEAV 
151 YPLYTIVFIY YAFCLVLMML LRPLLVKKIA CGLGKSDRFK SIYAALYFFP 
201 ILTVLQAVGG GLLYYAFPYI ILVLSLVTLA VYMSASEIEN CYDLLVRKKR 
251 LIVLFSHWLL HAYGIISISR VDKLEQDLPL LALVPTPALF YLFTAKFTEP 
301 SRILSEGANG H 


ORE from 40 bp to 972 bp; peptide length: 311 
Category: similarity to unkno%m protein 


BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_82el7, frame 1 

TREMBL:AF068718_5 gene: "ROIBIO.S"; Caenorhabditis elegans cosmid 
ROIBIO., N = 1, Score « 399, P = 1.4e-36 


>TREMBL:AF068718_5 gene: "ROIBIO.5"; Caenorhabditis elegans cosmid ROlBlO. 
Length » 670 

HSPs: 

Score = 399 (59.9 bits). Expect = 1.4e-36, P = l-4e-36 
Identities = 95/280 (33%), Positives = 152/280 (54%) 

Query: 2 AVDIQPACLGLYCGKTLLFKN GSTEIYGECGVCPRGQRTNAQKYCQPC 49 

A IQP+CLG +CG+T+L N GST + CG C G R NA C+ C 

Sbjct: 292 ASTIQPSCLG-FCGRTVLVGNYSEDVEATTTAAGSTSL-SRCGPCSFGYRNNAMSICESC 349 

Query: 50 TESPELYDWLYLGFMAMLPLVLHWFFIEWYSGKKSSSALFQ— HITALFECSMAAIITL 106 

+ YDW+YL F+A+LPL+LH FX + K + ++ ++ + E +A +1 + 
Sbjct: 350 DTPLQPYDWMYLLFIALLPLLLHMQFIR-IARKYCRTRYYEVSEYLCVILENVIACVIAV 408 

Query: 107 LVSDPVGVLYIRSCRVLMLSDWYTMLYNPSPDYVTTVHCTHEAVYPLYTIVFIYYAFCLV 166 

L+ P C + +WY YNP Y T+ CT^-E V-fPLY+I FI++ + 

Sbjct: 409 LIYPPRFTFFLNGCSKTDIKEWYPACYNPRIGYTKTMRCTYEWFPLYSITFIHHLILIG 468 

Query: 167 LMMLLRPLLVKKIACGLGKSDRFKSIYAALYFFPILTVLQAVGGGLLYYAFPYIILVLSL 226 

+++LR L + L K+ K YAA+ PIL V+ AV G+++Y FPYI+L+ SL 
Sbjct: 469 SILVLRSTLYCVL— LYKTYNGKPFYAAIVSVPILAVIHAVLSGWFYTFPYILLIGSL 525 

Query: 227 VTLAVYMSASEIENCYDLLVR KKRLIVLFSHWLLHAYGIISI 268 

+ +++ +++VR LI L L+ ++G+I+I 

Sbjct: 526 WAMCFHLALEGKRPLKEMIVRIATSPTHLIFLSITMLMLSFGVIAI 571 


Pedant information for DKFZphfbr2_82el7, frame 1 


Report for DKFZphfbr2_82el7 .1 
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(LENGTH) 

311 


(MW) 

35239.14 



7.91 


(HOMOL] 

TREMBL : AFO 6B 7 1 8_5 

gene: "R01B10.5"; Caenorhabditis elegans cosmid ROIBIO. 9e-36 

[PROSITE] 

AMIDATION 1 


I PROS I TE J 

MYRISTYL 3 


(PROSITE] 

CAMP PHOSPHO SITE 

1 

(PROSITEl 

CK2_PH0SPH0 SITE 

3 

(PROSITE] 

PKC~PHOSPHO SITE 

4 

[PROSITE] 

ASN GLYCOSYLATION 

1 

(KWJ 

TRANSMEMBRANE 6 


(KWl 

LOW COMPLEXITY 

7.72 % 


SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 


MAVDIQPACLGLYCGKTLLFKNGSTEIYGECGVCPRGQRTNAQKYCQPCTESPELYDWLY 
cccccccccccccccceeeeccccceeecccccccccccccceeecccccccccchhhhh 

MMMMMM 

LGFMAMLPLVLHWFFIEWYSGKKSSSALFQHITALFECSMAAIITLLVSDPVGVLYIRSC 

hhhhhhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhccccceeeeeece 
MMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM . . . 

RVLMLSDWYTMLYNPSPDYVTTVHCTHEAVYPLYTIVFIYYAFCLVLMMLLRPLLVKKIA 

xxxxxxxxxxxx. . . . 

eeeeecceeeeecccccceeeeeeeceeeeeeeeceeeeehhhhhhhhhhhhhhhhhhee 
• MMMMMMMMMMMMMMMMMMMMMMMMMMMMM. . . 

CGLGKSDRFKSIYAALYFFPILTVLQAVGGGLLYYAFPYIILVLSLVTLAVYMSASEIEN 

eecccccchhhhhhhhhhhccccccccccccceeeecceeeeehhhhhhhhhhhhhhhhh 
MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

CYDLLVRKKRLIVLFSHWLLHAYGIISISRVDKLEQDLPLLALVPTPALFYLFTAKFTEP 
XXXXXXXXXXXX 

hhhhhhhhhhhhhhhhhhhhhhcccceeeechhhhhhceeeeeecccceeeeeeeccccc 
MMMMMMMMMMMMMMMMMMMMM MMHMPOIMMMMMMMMMMMMMMM 

SRILSEGANGH 


ceeeeeccccc 
MM 


Prosite for DKFZphfbr2_82el7.1 


PSOOOOl 

22->26 

PS00004 

82->86 

PS00005 

80->83 

PS00005 

119->122 

PSOOOOS 

186->189 

PS00005 

294->297 

PS00006 

234->238 

psooooe 

236->240 

psooooe 

269->273 

psooooa 

11->17 

PS00008 

37->43 

PS00008 

182->188 

PS00009 

80->84 


ASN_GLYCOSYLATI0N 

CAMP_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC PHOSPHO SITE 

PKC_PHOSPHO~SITE 

PKC_PH0SPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PH0SPH0_S I TE 

CK2 PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

AMIDATION 


PDOCOOOOl 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00009 


(No Pfam data available for DKFZphfbr2_82el7 .1) 
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DKFZphfbr2_82e4 


group: signal transduction 

DKFZphfbr2_82e4 encodes a novel 473 amino acid protein with strong similarity to the 
calmodulin-binding proteins . 

The novel protein is similar to human and rat Ca2+/calmodulin-dependent protein kinase (EC 
2.7.1.123), rat calmodulin-binding protein, calmodulin binding protein kinase of Fugu rupies 
and Rattus norvegicus calcium/calmodulin-dependent protein kinase I. Calmodulin is the 
archetype. of the family of calcium-modulated proteins of which nearly 20 members have been 
found. Calmodulin is involved in regulation of growth and cell cycle as well as in signal 
transduction and the synthesis and release of neurotransmitters. The novel protein seems to be 
involved in calmodulin-mediated pathways in human neuronal cells. 

The new protein can find clinical application in modulating/blocking calraodulin-mediated 
pathways in human neuronal cells. 


strong similarity to calmodulin-binding proteins 

complete cDNA, complete cds, EST hits 
splice variant in comparison to rat 156542 
ESTs HSZZ54543/HS1141907 define splice variant 
see also DKFZphfbr2_82g20 unspliced form 

Sequenced by DKFZ 

Locus: /map-"200.5 cR from top of Chr3 linkage group" 
Insert length: 2923 bp 

Poly A stretch at pos. 2913, polyadenylation signal at pos. 2890 


1 ATGCTGGAGG TTCGCTAGCC GAAGCGGCTG CATCTGGCGC CGCGTCTGCC 
51 CCGCGTGCTC GGAGCGGATT CTGCCCGCCG TCCCCGGAGC CCTCGGCGCC 
101 CCGCTGAGCC CGCGATCACT TCCTCCCTGT GACCAACCGG CGCTGCAGGT 
151 TAGAGCCTGG CAATGCCGTT TGGGTGTGTG . ACTCTGGGTG ACAAGAAGAA 
201 CTATAACCAG CCATCGGAGG TGACTGACAG ATATGATTTG GGACAGGTCA 
251 TCAAGACTGA GGAGTTTTGT GAAATCTTCC GGGCCAAGGA CAAGACGACA 
301 GGCAAGCTGC ACACCTGCAA GAAGTTCCAG AAGCGGGACG GCCGCAAGGT 
351 GCGGAAAGCT GCCAAGAACG AGATAGGCAT CCTCAAGATG GTGAAGCATC 
401 CCAACATCCT ACAGCTGGTG GATGTGTTTG TGACCCGCAA GGAGTACTTT 
451 ATCTTCCTGG AGCTGGCCAC GGGGAGGGAG GTGTTTGACT GGATCCTGGA 
501 CCAGGGCTAC TACTCGGAGC GAGACACAAG CAACGTGGTA CGGCAAGTCC 
551 TGGAGGCCGT GGCCTATTTG CACTCACTCA AGATCGTGCA CAGGAATCTC 
601 AAGCTGGAGA ACCTGGTTTA CTACAACCGG CTGAAGAACT CGAAGATTGT 
651 CATCAGTGAC TTCCATCTGG CTAAGCTAGA AAATGGCCTC ATCAAGGAGC 
701 CCTGTGGGAC CCCCGAGTAT CTGGGCAACC CACCTTTCTA TGAGGAGGTG 
751 GAAGAAGATG ATTATGAGAA CCATGATAAG AATCTCTTCC GCAAGATCCT 
801 GGCTGGTGAC TATGAGTTTG ACTCTCCATA TTGGGATGAT ATTTCGCAGG 
851 CAGCCAAAGA CCTGGTCACA AGGCTGATGG AGGTGGAGCA AGACCAGCGG 
901 ATCACTGCAG AAGAGGCCAT CTCCCATGAG TGGATTTCTG GCAATGCTGC 
951 TTCTGATAAG AACATCAAGG ATGGTGTCTG TGCCCAGATT GAAAAGAACT 
1001 TTGCCAGGGC CAAGTGGAAG AAGGCTGTCC GAGTGACCAC CCTCATGAAA 
1051 CGGCTCCGGG CACCAGAGCA GTCCAGCACG GCTGCAGCCC AGTCGGCCTC 
1101 AGCCACAGAC ACTGCCACCC CCGGGGCTGC AGGTGGGGCC ACAGCTGCAG 
1151 CTGCGAGTGG AGCTACCTCA GCCCCTGAGG GTGATGCTGC TCGTGCTGCA 
1201 AAGAGTGATA ATGTGGCCCC CGCAGACCGT AGTGCCACCC CAGCCACAGA 
1251 TGGAAGTGCC ACCCCAGCCA CTGATGGCAG TGTCACCCCA GCCACCGATG 
1301 GAAGCATCAC TCCAGCCACT GATGGGAGTG TCACCCCAGC CACTGACAGG 
1351 AGCGCTACTC CAGCCACTGA TGGGAGAGCC ACACCAGCCA CAGAAGAGAG 
1401 CACTGTGCCC ACCACCCAAA GCAGTGCCAT GCTGGCCACC AAGGCAGCTG 
1451 CCACCCCTGA GCCGGCTATG GCCCAGCCGG ACAGCACAGC CCCAGAGGGC 
1501 GCCACAGGCC AGGCTCCACC CTCTAGTAAA GGGGAAGAGG CTGCTGGTTA 
1551 TGCCCAGGAG TCTCAAAGGG AGGAGGCCAG CTGAGTAGGC AGCCTGGTGA 
1601 GGGGGGGCAG GGGATGGGCA GGAGGGTGGG AGAGTGGATG AGGGGCTTCT 
1651 CACTGTACAT AGAGTCACTG GCATGATGCC CTCGCTCCCC CATGCCCCCA 
1701 CATCCCAGTG GGGCATAACT AGGGGTCACG GGAGAGCAGT CTCGTCTCCT 
1751 GTGTGTATGT GTGTGAGTGG TGGGCAGGCC AGTGGCAGGG CCGGCCCCAG 
1801 CCCCTGCATG GATTCCTTGT GGCTTTTCTG TCTTTTGCTA GCTTCACCAG 
1851 TTTCTGTTCC TTGTGGGATG CTGCTCTAGG GATACTCAGG GGGCTCCTGC 
1901 TCTCCTTCCC CTTCCCTTCT TGCCTCACCA TTCCCCTAGG CAGGCCCTGC 
1951 AGGTCCCACA CTCTCCCAGG CCCTAAACTT GGGCGGCCTT GCCCTGAGAG 
2001 CTGGTCCTCC AGCGAGGCCC TGTCAGCGGT CTTAGGCTCC TGCACATGAA 
2051 GGTGTGTGCC TGTGGTGTGT GGGCTGCTCT AGGAGCAGAT ACAGGCTGGT 
2101 ATAGAGGATG CAGAAAGGTA GGGCAGTATG TTTAAGTCCA GACTTGGCAC 
2151 ATGGCTAGGG ATACTGCTCA CTAGCTGTGG AGGTCCTCAG GAGTGGAGAG 
2201 AATGAGTAGG AGGGCAGAAG CTTCCATTTT TGTCCTTCCT AAGACCCTGT 
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2251 TATTTGTGTT ATTTCCTGCC TTTCCGAGTC CTGCAGTGGG CTGCCCTGTA 
2301 CCCTGAACCT CATGAGCCTC TAAGGGAAAG GAGGAACAAT TAGGACGTGG 
2351 CAATGAGACC TGGCAGGGCA GAGTACAAGC CCAGCACCCA GTGTCCCAGC 
2401 CTTACTGGGT CCTTACCCTG GGCCAAACAG GGAGGGCTGA TACCTCCTTG 
2451 CTCTTCCTAG ATGCCCACCT CCTACAATCT CAGCCCACAA GTCCTCTCCA 
2501 CCCTAGGGGG CTTGCTGCAT GGCAATAACT CATAATCTGA TTTGGAGGTT 
2551 TGCCCTTTAC AGGGGCAGAT TTTCTGCTCA GTTCAACAAT GAAATGAAGA 
2601 GGAACTCCCT CTTTCTACAG CTCACTTCTA TCAGAGGCCC AGGTGCCTCA 
2651 GAGCCACATT GAGTTGCTTT TTCTGGGATG AGGAAGTAGG GTTAAACTCC 
2701 CCAGTTTCCT GAGGGAGGCT CCTGACAGGT GCCCTTTGTC AGACCCTACC 
2751 ACAGCCTGGA TAGGCAGCCA CATTGGTCCT CGCCCTTGCT CGGCACTCCG 
2801 TGGTGGTCCT GCCCTTCTCC CTGCATGCCT GTGGGTCTGC TCTGGTGTGT 
2851 GAAGGTCGGT GGGTTAACTG TGTGCCTACT GAACCTGGCA AATAAACATC 
2901 ACCCTGCAAA GCCAAAAAAA AAA 


BLAST Results 


Entry HS452352 from database EMBL: 
human STS WI-15318. 
Length « 350 
Minus Strand HSPs: 

Score * 1547 (232.1 bits). Expect - 5.2e-63, P * 5.2e-63 

Identities = 331/348 (95%), Positives = 331/348 (95%), Strand = Minus / 

PI 


Medline entries 


94110847; 

J Neurosci 1994 Jan;14 (1) : 1-13 

1G5: a calmodulin-binding, vesicle-associated, protein 
Jcinase-like protein enriched in forebrain neurites. 

Godbout M, Er lander MG, Hasel KW, Danielson PE, Wong KK, Battenberg EL, 
Foye PE, 

Bloom FE, Sutcliffe JG 


Peptide information for frame 1 


1 MPFGCVTLGD KKNYNQPSEV TDRYDLGQVI KTEEFCEIFR AKDKTTGKLH 
51 TCKKFQKRDG RKVRKAAKNE IGILKMVKHP NILQLVDVFV TRKEYFIFLE 
101 LATGREVFDW ILDQGYYSER DTSNWRQVL EAVAYLHSLK IVHRNLKLEN 
151 LVYYNRLKNS KIVISDFHLA KLENGLIKEP CGTPBYLGNP PFYEEVEEDD 
201 YENHDKNLFR KTLAGDYEFD SPYWDDISQA AKDLVTRLME VEQDQRITAE 
251 EAXSHEWISG NAASDKNIKD GVCAQIEKNF ARAKWKKAVR VTTLMKRLRA 
301 PEQSSTAAAQ SASATDTATP GAAGGATAAA ASGATSAPEG DAARAAKSDN 
351 VAPADRSATP ATDGSATPAT DGSVTPATDG SITPATDGSV TPATDRSATP 
401 ATDGRATPAT EESTVPTTQS SAMLATKAAA TPEPAMAQPD STAPEGATGQ 
451 APPSSKGEEA AGYAQESQRE EAS 

ORF from 163 bp to 1581 bp; peptide length: 473 
Category: strong similarity to known protein 


BLASTP hits 
Entry S50193 from database PIR: 

Ca2+/calmodulin-dependent protein kinase (EC 2.7.1.123) I - rat 
Length = 374 

Score = 371 (130.6 bits). Expect « 2,2e-66, Sum P(2) « 2.2e-66 
Identities « 74/176 (42%), Positives = 115/176 (65%) 

Entry S57347 from database PIR: 

Ca2+/calmodulin-dependent protein kinase (EC 2.7.1.123) I - human 
Length = 370 

Score = 369 (129.9 bits). Expect « 4.6e-66, Sum P(2) * 4.6e-66 • 
Identities « 74/176 (42%), Positives = 114/176 (64%) 


Alert BLASTP hits for DKFZphfbr2_82e4, frame 1 

PIR: 156542 calmodulin-binding protein - rat, N - 2, Score = 1246, P 

4e-228 
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TREMBLNEW: FRU010348_3 product: "calmodulin binding protein kinase"; 
Fugu rubripes UBEl-like gene, PRGFR2 gene and gene encoding calmodulin 
binding protein kinase, clone 168J21, N = 2, Score = 846, P « 2,6e-139 

TREMBL:RNPRKI_1 product: "protein kinase 1"; Rattus norvegicus 
calcium/calmoduiin-dependent protein kinase I mRNA, complete cds., N = 
2, Score ^ 364, P = 5.1e-63 


>PIR: 156542 calmodulin-binding protein - rat 
Length = 504 

HSPs: 

Score * 1246 (186.9 bits), Expect = 4.0e-228, Sum P{2) » 4.0e-228 

Identities = 255/289 (88%), Positives = 259/289 (89%) 

Query: 188 GNPPFYEEVEEDDYENHDKNLFRKILAGDYEFDSPYWDDISQAAKDLVTRLMEVEQDQRI 247 

GNPPFYEEVEEDDYENHOKNLFRKILAGDYEFDSPYWDDISQAAKDLVTRLMEVEQDQRI 
Sbjct: 216 GNPPFYEEVEEDDYENHDKNLFRKILAGDYEFDSPYWDDISQAAKDLVTRLMEVEQDQRI 275 

Query: 248 TAEEAISHEWISGNAASDKNIKDGVCAQIEKMFARAKWKKAVRVTTLMKRLRAPEQSSTA 307 

TAEEAISHEWISGNAASDKNIKDGVCAQIBKNFARAKWKKAVRVTTLMKRLRAPEQS TA 
Sbjct: 276 TAEEAISHEWISGNAASDKNIKDGVCAQIEKNFARAKWKKAVRVTTLMKRLRAPEQSGTA 335 

Query: 308 AAQSASATDTATPGAAGGATAAAASGATSAPE GDAARAAKSDNVAPADRSAT 359 

A +D ATPGAAGGA AAAA GA A GDA AAKSD+^-A ADRSAT 

Sbjct: 336 AT SDAATPGAAGGAVAAAAGGAAPASGASATVGTGGDAGCAAKSDDMASADRSAT 390 

Query: 360 PATDGSATPATDGSVTPATDGSITPATDGSVTPATDRSATPATDGRATPATEESTVPTTQ 419 

PATDGSATPATDGSVTPATDGSITPATDGSVTPATDRSATPATDGRATPATEESTVP Q 
Sbjct: 391 PATDGSATPATDGSVTPATDGSITPAT£X;svTPATDRSATPATDGRATPATEESTVPAAQ 450 

Query: 420 SSAMLATKAAATPEPAMAQPDSTAPEGATGQAPPSSKGEEAAGYAQESQREEAS 473 
ssA A kaaatpepa+aqpdsta egat(;qappsskgeea G AQESQR E s 

Sbjct: 451 SSAAPAAKAAATPEPAVAQPDSTALEGATGQAPPSSKGEEATGCAQESQRVETS 504 

Score = 978 (146.7 bits). Expect = 4.0e-228, Sum P(2) = 4.0e-228 
Identities » 186/187 (99%), Positives = 187/187 (100%) 

Query: 1 MPFGCVTLGDKKNYNQPSEVTDRYDLGQVIKTEEFCEIFRAKDKTTGKLHTCKKPQKRDG 60 

MPFGCVTLGDKKNYNQPSEVTDRYDLGQV+KTEEFCEIFRAKDKTTGKLHTCKKFQKRDG 
Sbjct: 1 MPFGCVTLGDKKNYNQPSEVTDRYDLGQVVKTEEFCEIFRAKDKTTGKLHTCKKBXJKRDG 60 

Query: 61 RKVRKAAKNEIGILKMVKHPNILQLVDVFVTRKEYFIFLELATGREVFDWILDQGYYSER 120 

rkvrkaakneigi lkmvkhpnilqlvdvfvtrkeyfi FLELATGREVFDWILDQGY YSER 
Sbjct: 61 RKVRKAAKNEIGILKMVKHPNILQLVDVFVTRKEYFIFLELATGREVFDWILDQGYYSER 120 

Query: 121 DTSNWRQVLEAVAYLHSLKIVHRNLKLENLVYYNRLKNSKIVISDFHLAKLENGLIKEP 180 

DTSNVVRQVLEAVAYLHSLKIVHRNLKLENLVYYNRLKNSKIVISDFHLAKLENGLIKEP 
Sbjct: 121 DTSNVVRQVLEAVAYLHSLKIVHRNLKLENLVYYNRLKNSKIVISDFHLAKLENGLIKEP 180 

Query: 181 CGTPEYL 187 

CGTPEYL 
Sbjct: 181 CGTPEYL 187 


Pedant information for DKFZphfbr2_82e4, frame 1 


Report for DKrzphfbr2_82e4 . 1 


(LENGTH) 473 

[MW] 51208.89 

(pi} 5.30 

(HOMOL] PIR: 156542 calmodulin-binding protein - rat 0.0 

(FUNCAT) 30.03 organization of cytoplasm (S. cerevisiae, YFR014c) 4e-30 

(FUNCAT) 10.99 Other signal-transduction activities (S. cerevisiae, YFR0l4cJ 4e-30 

[FUNCAT] 03,01 cell growth [S. cerevisiae, YFR014c] 4e-30 

[FUNCAT] 30.10 nuclear organization (S. cerevisiae, YKLlOlw] 2e-26 

[FUNCAT] 03.22 cell cycle control and mitosis |S. cerevisiae, YKLlOlw) 2e-26 

[FUNCAT) U.04 dna repair (direct repair, base excision repair and nucleotide excision 

repair) [S. cerevisiae, YDLlOlc] 8e-26 

[FUNCAT] 98 classification not yet clear-cut (S. cerevisiae, YCL024w] 5e-24 

(FUNCAT) 03.25 cytokinesis (S. cerevisiae, YDR507c] 7e-23 

(FUNCAT) 03-04 budding, cell polarity and filament formation (s. cerevisiae, YDR507cl 
7e-23 ' 

[FUNCAT] 03.22.01 cell cycle check point proteins (S. cerevisiae, YPL153c) le-21 

(FUNCAT] 03.19 recombination and dna repair (S. cerevisiae, YPL153c) le-21 
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trUNCATJ 
tFUNCAT] 
3e-19 
(FUNCAT] 
IFUNCAT) 
I FUNCAT] 
I FUNCAT J 
(FUNCAT J 
I FUNCAT] 
[FUNCAT] 
[FUNCAT) 

(S. 

(FUNCAT) 
I FUNCAT) 
[FUNCAT) 
YPL031C] 7e-08 
(FUNCAT) 
[ FUNCAT) 
7e-08 
(FUNCAT) 
palmitylation, 
[FUNCAT] 
(FUNCAT) 
(FUNCAT) 
(FUNCAT) 
cerevisiae, 
[FUNCAT] 
5e-06 
[FUNCAT) 
(FUNCAT) 
[FUNCAT) 
(FUNCAT) 
le-05 
[FUNCAT) 
YNL183C) 8e-05 
(FUNCAT) 
8e-05 
(FUNCAT) 
(FUNCAT] 
(BLOCKS) 
(BLOCKS] 
(SCOP] 
[SCOP] 
(SCOP) 
(SCOP) 
(SCOP) 
(SCOP) 
(SCOP) 
(SCOP) 
(SCOP) 
[SCOP J 
(SCOP) 
(SCOP) 
(SCOP) 
(SCOP) 
(SCOP) 
(EC) 
[EC] 
[EC] 
(EC) 
(EC) 
[EC] 
(PIEIKWI 
[PIRKW] 
(PIRKW) 
(PIRKW) 
(PIRKW) 
[PIRKW] 
[PIRKW) 
[PIRKW] 
(PIRKW) 
[PIRKW) 
[PIRKW] 
(PIRKW) 
(PIRKW) 
(PIRKW) 
[PIRKW] 
(PIRKW) 
(PIRKW) 
(PIRKW) 
(PIRKW) 


11.01 stress response (S. cerevisiae, YDR477w] 3e-19 
01.05.04 regulation of carbohydrate utilization (s. 


cerevisiae, YDR477w) 


99 unclassified proteins (S. cerevisiae, YPH41c)* le-16 

03.16 dna synthesis and replication (S. cerevisiae, YMROOlc) 3e-16 
03.13 meiosis (S. cerevisiae, YOR351c) le-15 

30.02 organization of plasma membrane (S. cerevisiae, YDR122w) 3e-14 

10.03.11 key Jcinases (S. cerevisiae, YCR073c) 6e-ll 
09.01 biogenesis of cell wall [S. cerevisiae, YNR031c] 8e-ll 

10.02.11 key kinases [S. cerevisiae, YJL095w) 2e-09 

03.07 pheromone response, mating-type determination, sex-specific proteins 
cerevisiae, YLR362w) le-08 

10.05.11 key kinases [S. cerevisiae, YLR362w) le-08 
10.04,11 key kinases [S. cerevisiae, YLR362w) le-08 

02.19 metabolism of energy reserves (glycogen, trehalose) (S. cerevisiae, 

04.05.01.04 transcriptional control (S. cerevisiae, YPL031c) 7e-08 
01.04.04 regulation of phosphate utilization (s. cerevisiae, YPL031c) 

06.07 protein modification (glycolsylation, acylation, myristylation, 
farnesylation and processing) [S. cerevisiae, YFL033cJ le-07 

04.99 other transcription activities (S. cerevisiae, YFL033c] le-07 
10.05.09 regulation of g-protein activity (S. cerevisiae, YBL016w) 5e-07 
05.07 translational control [S. cerevisiae, YDR283c] 8e-07 
YHR079°r5° regulation of lipid, fatty-acid and sterol biosynthesis [S. 

30.07 organization of endoplasmatic reticulum (S. cerevisiae, YHR079c) 

30.01 organization of cell wall [S. cerevisiae, YiR019c) le-05 

30.90 extracellular/secretion proteins (S. cerevisiae, YIR019c) le-05 

01.05.01 carbohydrate utilization [S. cerevisiae, YIR019c) le-05 
04.05.01.01 general transcription activities (s. cerevisiae, YDL108w] 


01.02.04 regulation of nitrogen and sulphur utilization 
08.99 other intracellular-transport activities 


(S. cerevisiae. 


[S. cerevisiae, YNL183c) 

03.10 sporulation and germination (S. cerevisiae, YDR523c) 2e-04 
c energy conversion [M, genitalium, MG109] 3e-04 
BL00107A Protein kinases ATP-binding region proteins 
BL00939F 

dlgol 5.1.1.1.9 MAP kinase Erk2 (rat Rattus norvegicus 3e-62 

dlwfc 5.1.1.1.8 MAP kinase p38 [human (Homo sapiens) 5e-59 

dlkoa_2 5.1.1.1.7 (1-350) Twitchin, kinase domain (Caenorhabditi le-75 
dlkoba_ 5.1.1.1.6 Twitchin, kinase domain [California sea har le-72 

^lp*i*^ 5.1,1.1.5 gamma-subunit of glycogen phosphorylase kinas 4e-65 

5.1.1.2.4 insulin receptor (Human (Homo sapiens) 2e-56 

dlapme_ 5.1.1.1.4 cAMP-dependent PK, catalytic subunit [mouse (Mu 4e-71 
dlfgka_ 5.1.1.2.3 Fibroblast growth factor receptor 1 (human (Horn le-50 
dlydre_ 5.1.1.1.3 cAMP- dependent PK, catalytic subunit [bovine (Bo 3e-70 
dlfmk_3 5.1.1.2.2 (168-437) c-src tyrosine kinase [human (Horn 5e-49 
dlcdkb_ 5.1.1.1.2 cAMP-dependent PK, catalytic subunit (pig (Su 2e-72 
d2hcka3 5.1.1.2.1 (167-437) Haemopoetic cell kinase Hck (huma 5e-46 

<^^csn 5.1.1.1.11 Casein kinase-1, CKl [Schizosaccharomyces pombe 9e-42 

dljsua_ 5.1.1.1.1 Cyclin-dependent PK [Human (Homo sapiens) le-56 
dlckia_ 5.1.1.1.10 Casein kinase-1, CKl [rat (Rattus norvegicus) 9e-52 
2.7.1.38 Phosphorylase kinase 3e-29 

2.7.1.123 Ca2+/calmodulin-dependent protein kinase Be-6e 
2.7.1.128 (Acetyl-CoA carboxylase) kinase 2e-17 
2.7.1.117 Myosin-light-chain kinase 2e-38 

2.7.1.109 [Hydroxymethylglutaryl-CoA reductase (NADPH) ] kinase 2e-17 
2.7,1.37 Protein kinase 6e-28 
phosphotransferase 8e-66 
nucleus 2e-24 
transferase 8e-30 - 
calcium 2e-27 
duplication 4e-19 
tandem repeat 2e-31 
phorbol ester binding le-16 
zinc le-16 

cell cycle control 2e-20 

serine/threonine-specific protein kinase 8e-66 
phospholipid binding le-16 
autophosphorylation 8e-66 
brain le-14 
heterotetramer 2e-16 
polymer 3e-29 
mitosis 2e-20 
magnesium 7e-22 
ATP 8e-66 

alternative initiators le-29 
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[ PIRKW] 

phosphoprotsin 86"*66 

[ PIRKW) 



frl vffynrrit'e^i rt dA— 1 Q 



(PIRKW) 

protsin ki^nssc 26*^28 

( PIRKW] 

testis 3e— 28 

f PIRKWl 


( PIRKW] 

cAMP ]DjLncii.nQ 1g~16 

f PIRKWl 


(PIRKW] 

s t met Ural Di'ohAin 4p — IQ 

(PIRKW) 

calcium binding 3g~45 

r PTRKWl 


I PIRKWl 


r PT RK^W 1 
I IT 1 r\I\V» J 

xxpopx.ot6xn ^€ A D 

r PT RIfUl 
[ tr I r\I\W J 

\ mil c^Ia ^ — ^ Q 
CaxuxcIC muSCXc 46 X9 

f PTRKWl 


I PT RKMl 

my XX 5 cy X a c ion iq 

r PTRvwi 

I c 1 t\r\.v% J 

c<r itcirio 

r PTRKU} 

UXVXSXUll ^ C JO 

r P Tovw 1 

calmodulin binding 8€~66 

1 1 i rvrvvv j 

smooth ^l muscle VS'OX 


€ibiron6ctxn type III rcp63t^ homoXoQy 7g~31 


imiQuno^lobulm bomoXo^y 7€*'3X 

I ft Mrl J 

ribosoroal protsin S6 Icinass II 36~26 

t QrtPcaM 1 

calcium— dspendsnt protein kinase 5e'~29 

r QfipraM 1 
I o U r f An J 

AMP— activated protein Icinase 7e— 22 

1 oUtr r An 1 

protein kinase akt le~14 

r crtDc*'nM 1 
I oUFr AEVJ 

protein Icinase SPKl 3e-20 

I OUr r An J 

unassigned Ser/Thr or Tyr-specific prctein kinases 2e-36 

f CriDPAM 1 

Ca2+/calmoduiin-dependent protein kinase 3e-45 

I oUrt An J 

calmodulin repeat homology 5e~29 

I bUrr AM J 

protein kinase DUNl 2e~24 

f CIIDCDM 1 

I oUr r An J 

Dictyostelium cAMP-dependent protein kinase catalytic chain le-14 

I oUrr Anj 

death~associated protein kinase 2e~31 

f crTDPn M 1 
ISUrc AflJ 

myosin-light-chain kinase^ nonmuscle le-29 

[ SUPFAM) 

pleckstrin repeat homology le-14 

[SUPFAM) 

ankyrin repeat homology 2e-31 

( SUPF7VM) 

protein kinase homology 8e— 66 

( SUPFAM] 

Ca2+/calmodulin-dependent protein kinase II 8e-36 

( SUPFAM) 

twitchin le-18 

ISUPFAMJ 

protein kinase C zinc-binding repeat homology le-16 

iSUre AMJ 

titin se—iy 

I bUrr AW J 

protein kinase cdrl 2e— 20 

I oU r r Mi 1 J 

kinase-rela t ed transf orminQ protein 2e*~38 

(SUPFAMJ 

Ca2+/calmodulin-dependent protein kinase I 8e-66 

ISUPFAMJ 

kinase interaction domain homology 2e-24 

(SUPFAMJ 

protein kinase C mu le-16 

(PROSITEl 

AMIDATION 1 

(PROSITE) 

MYRISTYL 3 

(PROSITE) 

CK2 PHOSPHO SITE 10 

(PROSITE) 

TYR PKOSPHO SITE 2 

(PROSITE) 

PKC_PHOSPHO_SITE 11 

(PFAMJ 

Eukaryotic protein kinase domain 

IKW) 

All Alpha 

(KW| 

3D 

(KWJ 

LOW COMPLEXITY 7.40 % 


SEQ MPFGCVTLGDKKNYNQPSEVTDRYDLGQVIKTEEFCEIFRAKDKTTGKLHTCKKFQKRDG 

SEG 

Ia06- CEETTTGGGCEBEEEECBCGGGGGEEEEEETTTTCEEEEEEEEC 

SEQ RKVRKAAKNEIGILKMVKHPNILQLVDVFVTRKEYFIFLELATGREVFDWILDQGYYSER 

SEG 

Ia06- HHHHHHHHHCCTTTBCCEEEEEEETTEEEEEECCCCCEEHHHHHHHTTTTBHH 

SEQ DTSNVVRQVLEAVAYLHSLKIVHRNLKLENLVYYNRLKNSKIVISDFHLAKLENGLIKEP 

SEG 

Ia06- HHHHHHHHHHHHHHHHHHHCCCTTTTTTTTEEECCCTTTTCEEECCCTTTTCHHHHHCCC 

SEQ CGTPEYLGNPPFYEEVEEDDYENHDKNLFRKILAGDYEFDSPYWDDISQAAKDLVTRLME 

SEG 

Ia06- HHHHHHHCCTTTTTT THHHHHHHHHCCCCCCTTTTTTTTCHHHHHHHHHHCT 

SEQ VEQDQRITAEEAISHEWISGNiRASDKNIKDGVCAQIEKNFARAKWKKAVRVTTLMKRLRA 

SEG 

Ia06- TTGGGCCCHHHHHHTTTTTTCCCCCCBHHHHHHHHHHHHHCCTTTTTTBTHHHHHHHC . . 

SEQ PEQSSTAAAQSASATDTATPGAAGGATAAAASGATSAPEGDAARAAKSDNVAPADRSATP 

SEG . .xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

Ia06- 
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SEQ ATDGSATPATDGSVTPATDGSITPATDGSVTPATDRSATPATDGRATPATEESTVPTTQS 

SEG 

Ia06- 

SEQ SAMLATKAAATPEPAMAQPDSTAPEGATGQAPPSSKGEEAAGYAQESQREEAS 

SEG 

Ia06- 


Prosite for DKFZphfbr2_82e4 .1 


PS00005 

21->24 

PKC PHOSPHO 

SITE 

PDOC00005 

PS00005 

46->49 

PKC PHOSPHO^ 

SITE 

PDOC00005 

PS00005 

51->54 

PKC PHOSPHO' 

'site 

PDOC00005 

PS00005 

91->94 

PKC PHOSPHO" 

"site 

PDOC00005 

PS00005 

103->106 

PKC~PHOSPHO' 

"site 

p[xx:oooo5 

PS00005 

118->121 

PKC PHOSPHO' 

"site 

PDOC00005 

PS00005 

138->141 

PKC^PHOSPHO' 

'site 

PDOC00005 

PS00005 

264->267 

PKC PHOSPHO" 

'site 

PDOC00005 

PS00005 

394->397 

PKCPHOSPHO' 

'site 

PDOC00005 

PS00005 

454->457 

PKC PHOSPHO" 

site 

PDOC00005 

PS00005 

467->470 

PKC PHOSPHO" 

"site 

PDOC00005 

PS00006 

7->ll 

CK2 PHOSPHO" 

site 

PDOC00006 

PS00006 

91->95 

CK2 PHOSPHO 

'site 

PDOC00006 

PSO0OO6 

103->107 

CK2 PHOSPHO" 

"site 

PDOC00006 

PS00006 

118-M22 

CK2 PHOSPHO' 

"site 

PDOC00006 

PSO00O6 

248->252 

CK2 PHOSPHO 

"site 

PDOC00006 

PS00006 

313->317 

CK2 PHOSPHO' 

"site 

PDOC00006 

PS00006 

336->340 

CK2 PHOSPHO" 

site 

PDOC00006 

PS00006 

442->446 

CK2 PHOSPHO" 

"site 

PDOC00006 

PS00006 

455->459 

CK2 PHOSPHO" 

"site 

PDOC00006 

PS00006 

467->471 

CK2 PHOSPHO" 

'site 

PDOC00006 

PS00007 

456->464 

TYR PHOSPHO SITE 

PDOC00007 

PS00007 

127->136 

TYR PHOSPHORS I TE 

PDOC00007 

PS00008 

260->266 

MYRISTYL 

PDOC00008 

PS00008 

321->327 

MYRISTYL 


PDOC00008 

PS00008 

324->330 

MYRISTYL 


PDOC00008 

PS00009 

59->63 

AMI DAT I ON 


PDOC00009 


Pfam for DKF2phfbr2_82e4 . 1 


HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 
Query 


Eukaryotic protein kinase domain 

* YeigRi r GeGsFGtVYkCiWr . TGel VAIKI I kkr sms FlREIq 

Y +G++I F ++++++++ TG++ K++ KR+ + +EI 

24 YDLGQVIKTEEFCEIFRAKDKTTGKLHTCKKFQKRDGRKVRKAAKNEIG 72 

IMRrLnHPNITRFYDwFedddDHIYMIMEYMeGGDLFDYIrrngpMsEwe 
I+++++HPNI+++ D+F + +-♦'+ + +E++ G + FD+I ++G++SE++ 
73 ILKMVKHPNILQLVDVFV-TRKEYFIFLELATGREVFDWILDQGYYSERD 121 

IrflMyQILrGMeYLHSMgllHRDLKPENILIDeN. . . gqIKIcDFGUVR 
++++Q+L++++YLHS +I+HR LK EN+ + ++ I I+DF LA+ 

122 TSNWRQVLEAVAYLHSLKIVHRNLKLENLVYYNRLKNSKIVISDFHLAK 171 


qMnnYerMttfCGTPWY* 
+ N ++ + CGTP+Y 
172 LEN— GLIKEPCGTPEY 


186 


188 


*GepPFyd ■ dnMemlmrliqrfrrpfWpnCSeElyDFMr 

G PPFY+ + +++I++++++F +P+W+ +S ++D+++ 

GNPPFYEEVEEDDYENHDKNLFRKILAGDYEFDSPYWDDISQAAKDLVT 


wCWnyDPekRPTFrQILnHPWF* 
+++++ ++R+T+++++ H W+ 
237 RLMEVEQDQRITAEEAISHEWI 


236 


258 
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PCT/IBOO/014% 


group: transmembrane protein 

DKF2phfbr2_82gl4 encodes a novel 208 amino acid proline-rich protein without similarity to 
known proteins. 

The protein contains one transmembrane domain. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes and as a new marker for neuronal cells. 


unknown proiin rich protein 
membrane regions: 1 

Summary DKFZphf br2_82gl4 encodes a novel 208 amino acid protein. 


unknown proiin rich protein 

complete cDNA, complete cds, EST hits 

TRANSMEMBRANE 1 
Sequenced by DKFZ 

Locus: /raap="26.2 cR from top of Chrl6 linkage group** 
Insert length: 2059 bp 

Poly A stretch at pos. 2049, polyadenylation signal at pos. 2024 


1 AGAAGTGCGA CTGCCAGCTG CCGAGGCGTT CGGTCCTGCT GTTGCGGCCG 
51 CTGCCCCAGG GCTGCGGGGA CGCTCCCGGA GCCCTGCCTG TCCCCTGTCC 
101 ATCCAGGCCA GCAGCTGAAG GAGCCTCACC TGCCTCCCTT CTCTGAGTAG 
151 CACGGATTTG AGGAGAAGCA GCGAAGATGT CCAGCGAGCC TCCCCCTCCT 
201 TATCCTGGGG GCCCCACAGC CCCACTTCTG GAAGAGAAAA GTGGAGCCCC 
251 GCCCACCCCA GGCCGTTCCT CCCCAGCTGT GATGCAGCCC CCTCCAGGCA 
301 TGCCACTGCC CCCTGCGGAC ATTGGCCCCC CACCCTATGA GCCGCCGGGT 
351 CACCCAATGC CCCAGCCTGG CTTCATCCCA CCACACATGA GTGCAGATGG 
401 CACCTACATG CCTCCGGGTT TCTACCCTCC TCCAGGCCCC CACCCACCCA 
4 51 TGGGCTACTA CCCCCCAGGG CCCTACACGC CAGGGCCCTA CCCTGGCCCT 
501 GGGGGCCACA CAGCCACAGT CCTGGTCCCT TCAGGAGCTG CCACCACGGT 
551 GACAGTGCTG CAGGGAGAGA TCTTTGAGGG AGCGCCTGTG CAGACGGTGT 
601 GTCCCCACTG CCAGCAGGCC ATCGCCACCA AGATCTCCTA CGAGATTGGC 
651 TTGATGAATT TCGTGCTGGG TTTCTTCTGT TGCTTCATGG GATGTGATCT 
701 GGGCTGCTGC CTGATCCCCT GCCTCATCAA TGACTTCAAG GATGTGACGC 
751 ACACATGCCC CAGCTGCAAA GCCTACATCT ACACGTACAA GCGCCTGTGC 
801 TAACGGAGCT GGGACTCGGG ACTCCCCCGC CTGTCAGTCT GGCCCCCTGT 
851 GCTTTGCTCC CTGCGCTCAG TGGTCACTTT CCCGCTCCCA CTTGGGGCTG 
901 GGAGCCGTGC CACCATCCCC TAGAAGTCCT GTCCTCTTCA CCCTGCCCTA 
951 CCTGAGCCGC TGACTCTTCT GGCAAAAATT CTGTTGGGAT TTAAGGCCAA 
1001 GGGTCAGTGG GTGGCAGGGG GCTGGCAATG AGCTTGTGTG TTGTTGGTCT 
1051 GCTTGGTGTG TGTGATCGGG AAGATAAGCT GGGAGGGGTC TCGTGCTGGG 
1101 GTCCTGATGC CTCTGTTTCC AAACAAGGTA CAGGTTCAGT CCAGACTCTT 
1151 TCCCCCTGGG ACCAACAGCA GCCAGAGCAG TTAGCCAGTT AGTCCCCAGG 
1201 CCTGTGGCCA CAGGCGTTTC TGACCTGCTG GGCCGAGAAT GGGTAAGTTG 
1251 TCTGGAGTCA GGTGGGCCCA CGTAGGACAG GGTCACAAAG CCTGGGTTTG 
1301 TTTCTGGGTA CTTTGCGCCT CTGGGGTGCT AGAGGTGGGG CATGGTGGCT 
1351 GGAAGTAAAA CTGCCAACTC TGGCCCTCAG AACTCTCAGG TATAGAAGCC 
1401 CAGGATGTCT AATACCCTGT CCCAGTGCCC GAGAGCTGCC TGGTGTCAGG 
1451 TAGAGAGGAC ACTGTACCTG GGTGAATGAT CAGACCCTGG TAGCTAAGAA 
1501 GGAACTTGTC CCTTTGAGTC AGTGTGCAGA CCCCCTTTCA GGCCATGCCT 
1551 CTGTGAACCC TGTATTGCTG GGGCCGGAAG GAGCCCCTGA GCCTAGCCCC 
1601 TTCCCGTCTG CCCTGTGTCC TCACTGCGTG TGGGTATGAC CTCTGCCTGG 
1651 TGGCTGGTGT ATCCCAACTG GGCAAGAGAT GGCAGAGGGT CCCCCTTGTG 
1701 GGTGCGCTTG GATGTGCAGA GCCTTCTCCA TGGATTTTCT TCCCTGTAAG 
1751 TGCCGGGCCC CCCACCCCAG CTGACAGGCT GTTGCTGTGC CTGCTCACAC 
1801 CTGCTCCTGC AGGCACACTG GGCTAGGGAC GAGGAAGGAG CAGCCACAAG 
1851 TGGTAGAACT GCCTTGGTGG ACACCAGCCT CGCCCTGTCT TTATTTCCTG 
1901 AATGGTTTGT GAACTTGCTC ACCTGGACCA CTGTATCCTG CCACTGTCCT 
1951 TCCTGGTCTC GCACTGCCAC TGCATGGCCT CCTGTCACTG TGAATCGTGG 
2001 CCCAGTCTCA GTTTGTAGTT TCTCATTAAA TTGGCCCTTT CACTCCCCCA 
2051 AAAAAAAAA 


BLAST Results 
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Entry HS727347 from database EMBL: 
human STS WI-16589. 
Length ~ 275 
Plus Strand HSPs: 

Score « 1365 (204.8 bits). Expect = 3.0e-55, P = 3.0e-55 

Identities = 275/276 (99%), Positives - 275/276 (99%), Strand » Plus / 

PI 


Medline entries 


No Medline entry 


Peptide information for frame 3 


1 MSSEPPPPYP GGPTAPLLEE KSGAPPTPGR SSPAVMQPPP 
51 PPPYEPPGHP MPQPGFIPPH MSADGTYMPP GFYPPPGPHP 
101 TPGPYPGPGG HTATVLVPSG AATTVTVLQG EIFEGAPVQT 
151 TKISYEIGLM NFVLGFFCCF MGCDLGCCLI PCLINDFKDV 
201 lYTYKRLC 

ORF from 177 bp to 800 bp; peptide length: 208 
Category: similarity to known protein 


BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_82gl4, frame 3 

PIR:S57447 HPBRII-7 protein - human, N » 1, Score = 206, P = 8.4e-16 

PIR:A47655 spliceosome-associated protein SAP 62 - human, N - 1, Score 
198, P = 4.3e-15 


GMPLPPADIG 
PMGYYPPGPY 
VCPHCQQAIA 
THTCPSCKAY 


>PIR:S57447 HPBRII-7 protein - human 
Length « 551 

HSPs: 


Score ^ 206 (30.9 bits). Expect = 8.4e-16, P » 8.4e-16 
Identities * 57/115 (49%), Positives - 62/115 (53%) 


Query: 

5 

PPPPypGGPTAPLLEEKSGAPPTPGRSSPAVMQPPPGMPLPPADIGPP PYEP 

56 



PPPP+P G T P G P PG P PPPG LPP GPP P P 


Sbjct: 

226 

PPPPFPAGQTPP— RPPLGPPGPPGPPGP PPPGQVLPPPLAGPPNRGDRPPPPVLF 

279 

Query: 

57 

PGHPMPQP—GFIPPHMSADGTYMP-PGFYPPPGPHPPM-GYYPP-GPYTPGPYPGPGGH 

111 



PG P QP G +PP G P PG+ PPPGP PP G PP GP+ P P PGP G 


Sbjct: 

280 

PGQPFGQPPLGPLPP GPPPPVPGYGPPPGPPPPQQGPPPPPGPFPPRP-PGPLGP 

333 

Query: 

112 

TATVLVP 118 




T+ P 


Sbjct: 

334 

PLTLAPP 340 


Score 

= 177 

(26.6 bits). Expect = l.le-12, P l.le-12 


Identities - 55/120 (45%), Positives = 61/120 (50%) 


Query: 

5 

PPPPYPGGPTAP — LLEEKSGAPPTPG-RSSPAVM QP PPGMPLPPADIGPPPYE 

55 



P PP P GP P +L PP G R P V+ QP PP PLPP GPPP 


Sbjct: 

244 

PGPPGPPGPPPPGOVLPPPLAGPPNRGDRPPPPVLFPGQPFGQPPLGPLPP— GPPP-P 

299 

Query : 

56 

PPGHPMPQPGFIPPHMSADGTYMPPGFYPP--PGP-HPPMGYYPPGPYTPGPYPG PG 

109 



PG+ P PG PP G PPG +PP PGP PP+ PP P+ PGP PG P 


Sbjct: 

300 

VPGYG-PPPGPPPPQQ—- GPPPPPGPFPPRPPGPLGPPLTLAPP-PHLPGPPPGAPPPA 

354 


Query: 110 GHTATVLVP 118 

K P 
Sbjct: 355 PHVNPAFFP 363 


Score = 168 (25.2 bits). Expect = l.le-11, P = l.le-11 
Identities - 47/118 (39%), Positives = 51/118 (43%) 

Query: 5 PPPPYPG-GPTAPLLEEKSGAPPTPGRSSPAVMQP — PPGMPLPPADI-GPPPYEPPGHP 60 
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PPPP PG GP + G PP PG P P PP PP + GPPP PP P 

Sbjct: 296 PPPPVPGYGPPPGPPPPQQGPPPPPGPFPPRPPGPLGPPLTLAPPPHLPGPPPGAPPPAP 355 

Query: 61 MPQPGFIPPHMSADGTYMPPGFYPPPGPHPPMGYYPPGPYTPGPYPGPGGHTATVLVPSG 120 

P F PP ++ MP P P P G PP PY G Y PG T P 

Sbjct: 356 HVNPAFFPPPTNSG MPTSDSRGPPPTDPYGR-PP-PYDRGDYGPPGREMDTARTPLS 410 

Query: 121 AA 122 
A 

Sbjct: 411 EA 412 

Score = 156 {23.4 bits). Expect = 2.1e-10, P = 2.1e-10 
Identities = 44/103 (42%), Positives = 50/103 (48%) 

Query: 6 PPPYPGGPTAPLLEEKSGAPPT-PGRSSPAVMQPPPGMPLPPADIGPPPYEPPGHPMPQP 64 

P PGG P G PP P +P +PP G P PP GPPP PG +P P 

Sbjct: 208 PGAVPGGDRFPGPAGPGGPPPPFPAGQTPP— RPPLGPPGPPGPPGPPP PGQVLPPP 262 

Query: 65 GFIPPHMSADGTYMPPGFYP-PPGPHPPMGYYPPGPYTP GPYPGP 108 

PP+ D PP +P P PP+G PPGP P GP PGP 
Sbjct: 263 LAGPPNRG-DRP-PPPVLFPGQPFGQPPLGPLPPGPPPPVPGYGPPPGP 309 

Score « 121 (18.2 bits). Expect = 5.2e-05, P = 5.2e-05 
Identities = 40/90 (44%), Positives = 45/90 (50%) 

Query: 23 GAPPTPGRSSPAVMQPP-PGMPLPPAD-IGPP-PYEPPGHPMPQPG-FIPPHMSADGTYM 78 

G PG + P PP P PP +GPP P PPG P P PG +PP ++ 
Sbjct: 213 GGDRFPGPAGPGGPPPPFPAGQTPPRPPLGPPGPPGPPG-P-PPPGQVLPPPLAG 265 

Query: 79 PP — GFYPPPG PHPPMGyYPPGPYTPGPYPG-PG 109 

PP G PPP P P G P GP PGP P PG 
Sbjct: 266 PPNRGDRPPPPVLFPGQPFGQPPLGPLPPGPPPPVPG 302 


Pedant information for DKFZphfbr2_82gl4, frame 3 


Report for DKF2phfbr2_82gl4 . 3 


[LENGTH! 208 

(MWJ 21862.47 

[pi] 5.55 

[PROSITEI MYRISTYL 3 

[PROSITEJ PKC_PHOSPHO_SITE 2 

(KWl TRANSMEMBRANE 1 

[KWl LOW COMPLEXITY 39.90 


SEQ 
SEG 
PRO 
MEM 


MSSEPPPPYP(K;PTAPLLEEKSGAPPTPGRSSPAVMQPPPC3dPLPPADIGPPPYEPPGHP 

. . . .xxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxx 

ccccccccccccccchhhhhhccccccccccccccccccccccccccccccccccccccc 


SEQ 
SEG 
PRO 
MEM 


MPQPGFIPPHMSADGTYMPPGFYPPPGPHPPMGYYPPGPYTPGPYPGPGGHTATVLVPSG 

xxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

ccccccccccccccccccccccccccccccccccccccccccccccccccceeeeecccc 


SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 


AATTVTVLQGEI FEGAPVQTVCPHCQQAI ATKI S YEIGLMNFVLGFFCCFMGCDLGCCLI 

cceeeeeeeeeeecccceeeeccchhhhhhhhhhhhhhhceeeeeeeeeecccccceeec 
MMMMMMMMMMMMM 

PCLINDFKDVTHTCPSCKAYIYTYKRLC 

eeeecccccccccccccceeeeeeeccc 

MMMM 


Prosite for DKFZphfbr2_82gl4 .3 


PS00005 
PS00005 
PS00008 
PS00008 
PS00008 


196->199 
203->206 
109->115 
120->126 
172'>178 


PKC_PHOSPHO_SITE 
PKCPHOSPHO SITE 
MYRISTYL 
MYRISTYL 
MYRISTYL 


PDOC00005 
PDOC00005 
PDOC00008 
PDOC00008 
PDOC00008 


(No Pfam data available for DKFZphfbr2_82gl4 .3) 
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DKFZphfbr2_82il7 


group: signal transduction 

DKFZphtes2_82il7 encodes a novel 334 amino acid protein with similarity to the plasma membrane 
substrate for the cAMP-dependent protein kinase. 

The novel protein is a transmembrane protein with strong similarity to the phospholemman 
protein, a membrane substrate for the cAMP-dependent protein kinase. It seems to serve as a 
chloride channel or as a chloride-channel regulator. 

The new protein can find application in modulating/blocking cAMP-dependent protein kinase- 
dependent pathways. 


similarity to plasma membrane substrate for cAMP-dependent protein kinase 

complete cDNA, complete cds, EST hits 

potential start at Bp 31 matches Kozak consensus PyNKatgG 
might be a SODIDM/POTASSIUM-TRANSPORTING ATPASE 
TRANSMEMBRANE 1 

Sequenced by DKFZ 

Locus: /map«"ll; 920_E_12; 786_(A,H)_H; (797, 802)_(E, H)_7" 
Insert length: 1647 bp 

Poly A stretch at pos. 1637, polyadenylation signal at pos. 1615 


1 AGTCTCGGAG GGGACCGGCT GTGCAGACGC CATGGAGTTG GTGCTGGTCT 
51 TCCTCTGCAG CCTGCTGGCC CCCATGGTCC TGGCCAGTGC AGCTGAAAAG 
101 GAGAAGGAAA TGGACCCTTT TCATTATGAT TACCAGACCC TGAGGATTGG 
151 GGGACTGGTG TTCGCTGTGG TTCTCTTCTC GGTTGGGATC CTCCTTATCC 
201 TAAGTCGCAG GTGCAAGTGC AGTTTCAATC AGAAGCCCCG GGCCCCAGGA 
251 GATGAGGAAG CCCAGGTGGA GAACCTCATC ACCGCCAATG CAACAGAGCC 
301 CCAGAAAGCA GAGAACTGAA GTGCAGCCAT CAGGTGGAAG CCTCTGGAAC 
351 CTGAGGCGGC TGCTTGAACC TTTGGATGCA AATGTCGATG CTTAAGAAAA 
401 CCGGCCACTT CAGCAACAGC CCTTTCCCCA GGAGAAGCCA AGAACTTGTG 
451 TGTCCCCCAC CCTATCCCCT CTAACACCAT TCCTCCACCT GATGATGCAA 
501 CTAACACTTG CCTCCCCGCT GCAGCCTGTG GTCCTGCCCA CCTCCCGTGA 
551 TGTGTGTGTG TGTGTGTGTG TGTGTGACTG TGTGTGTTTG CTAACTGTGG 
601 TCTTTGTGGC TACTTGTTTG TGGATGGTAT TGTGTTTGTT AGTGAACTGT 
651 GGACTCGCTT TCCCAGGCAG GGGCTGAGCC ACACGGCCAT CTGCTCCTCC 
701 CTGCCCCCGT GGCCCTCCAT CACCTTCTGC TCCTAGGAGG CTGCTTGTTG 
751 CCCGAGACCA GCCCCCTCCC CTGATTTAGG GATGCGTAGG GTAAGAGCAC 
801 GGGCAGTGGT CTTCAGTCGT CTTGGGACCT GGGAAGGTTT GCAGCACTTT 
851 GTCATCATTC TTCATGGACT CCTTTCACTC CTTTAACAAA AACCTTGCTT 
901 CCTTATCCCA CCTGATCCCA GTCTGAAGGT CTCTTAGCAA CTGGAGATAC 
951 AAAGCAAGGA GCTGGTGAGC CCAGCGTTGA CGTCAGGCAG GCTATGCCCT 
1001 TCCGTGGTTA ATTTCTTCCC AGGGGCTTCC ACGAGGAGTC CCCATCTGCC 
1051 CCGCCCCTTC ACAGAGCGCC CGGGGATTCC AGGCCCAGGG CTTCTACTCT 
1101 GCCCCTGGGG AATGTGTCCC CTGCATATCT TCTCAGCAAT AACTCCATGG 
1151 GCTCTGGGAC CCTACCCCTT CCAACCTTCC CTGCTTCTGA GACTTCAATC 
1201 TACAGCCCAG CTCATCCAGA TGCAGACTAC AGTCCCTGCA ATTGGGTCTC 
1251 TGGCAGGCAA TAGTTGAAGG ACTTCCTGTT CCGTTGGGGC CAGCACACCG 
1301 GGATGGATGG AGGGAGAGCA GAGGCCTTTG CTTCTCTGCC TACGTCCCCT 
1351 TAGATGGGCA GCAGAGGCAA CTCCCGCATC CTTTGCTCTG CCTGTCAGTG 
1401 GTCAGAGCGG TGAGCGAGGT GGGTTGGAGA CTCAGCAGGC TCCGTGCAGC 
14 51 CCTTGGGAAC AGTGAGAGGT TGAAGGTCAT AACGAGAGTG GGAACTCAAC 
1501 CCAGATCCCG CCCCTCCTGT CCTCTGTGTT CCCGCGGAAA CCAACCAAAC 
1551 CGTGCGCTGT GACCCATTGC TGTTCTCTGT ATCGTGACCT ATCCTCAACA 
1601 ACAACAGAAA AAAGGAATAA AATATCCTTT GTTTCCTAAA AAAAAAA 


BLAST Results 


Entry HS31455 from database EMBL: 
human STS WI-2739. 
Length » 103 
Minus Strand HSPs: 

Score = 487 (73.1 bits). Expect = 4.46-14, p = 4.4e-14 

Identities = 101/104 (97%), Positives = 101/104 (97%), Strand = Minus / 

Plus 

frame shift in primer binding site 
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Medline entries 


91250422: 

Purification and complete sequence determination of the major plasma 
membrane substrate 

for cAMP-dependent protein kinase and protein kinase C in myocardium. 
95091702: 

Protein kinase C and cyclic AMP-dependent protein kinase phosphorylate 
phospholenunan, 

an insulin and adrenaline-regulated membrane phosphoproteinr at 
specific sites in the 

carboxy terminal domain. 

95138184: 

Mat-8, a novel phospholemraan-like protein expressed in human breast 
tumors, induces a 

chloride conductance in xenopus oocytes. 


1 MELVLVFLCS LLAPMVIASA AEKEKEMDPF HYDYQTLRIG GLVFAWLFS 
51 VGILLILSRR CKCSFNQKPR APGDEEAQVE NLITANATEP QKAEN 

ORF from 32 bp to 316 bp; peptide length: 95 
Category: strong similarity to known protein 


SWISSPROT:PLM_HUMAN PHOSPHOLEMMAN PRECURSOR., N = 1, Score = 196, P =» 
l-2e-15 

TREMBL:AF091390_1 product: "phospholetnman precursor**; Mus musculus 
phospholemman precursor, gene, complete cds., N => 1, Score « 187, P 
i.ie-14 

PIR:A40533 cAMP-dependent protein kinase major membrane substrate 
precursor - dog, N « 1, Score - 189, P = 6.5e-15 

SWISSPROT:PLM_RAT PHOSPHOLEMMAN PRECURSOR., N = 1, Score = 185, P ^ 
1.7€-14 


>SWISSPROT:PLM_HUMAN PHOSPHOLEMMAN PRECURSOR. 
Length = 92 


Peptide information for frame 2 


BLASTP hits 


No BLASTP hits available 


Alert BLASTP hits for DKFZphfbr2_82il7, frame 2 


HSPs: 


Score » 196 (29.4 bits). Expect « 1.2e-15, p = 1.2e-15 
Identities - 43/85 (50%), Positives = 56/85 (65%) 


Query: 


4 VLVFLCSLLAPMVLASAAEKEKEMDPFHYDYQTLRIGGLVFAWLFSVGILLILSRRCKC 63 

+LVF LL + AE KE DPF YDYQ+L+IGGLV A 4-LF +GIL++LSRRC+C 

7 ILVFCVGLLT MAKAESPKEHDPFTYDYQSLQIGGLVIAGILFILGILIVLSRRCRC 62 


Sbjct : 


Query: 


Sbjct: 


64 SFNQKPRA— PGDEEAQVENLITANAT 88 

FNQ+ R P +EE +1 +T 
63 KFNQQQRTGEPDEEEGTFRSSIRRLST 89 


Pedant information for DKF2phfbr2_82il7, frame 2 


Report for DKFZphfbr2_82il7.2 


[LENGTH] 
IMW] 


95 

10542.37 
5.05 

SWISSPROTrPLM HUMAN PHOSPHOLEMMAN PRECURSOR, 3e-15 
BL01310 


[pU 


[HOMOLJ 
t BLOCKS) 
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[EC] 

3.6.1.37 Na+/K+-exchanging ATPase 6e-08 

IPIRKW] 

transmembrane protein 

16-09 

[PIRKW] 

hydrolase 6e-08 


[PROSITEI 

ATPIGI PLM MATS 

1 

[PROSITEJ 

MYRISTYL 1 


[PROSITE] 

CK2 PHOSPHO SITE 

1 

(PROSITE] 

TYR PHOSPHO SITE 

1 

{PROSITEI 

PKC PHOSPHO SITE 

2 

(PROSITEI 

ASN_GLYCOSYLATION 

1 

[KWJ 

Alpha Beta 


[KWl 

SIGNAL_PEPTIDE 19 



SEQ 
PRD 


MELVLVFLCSLLAPMVLASAAEKEKEMDPFHYDYQTLRIGGLVFAVVLFSVGILLILSRR 
ccchhhhhhhhhhccccccccccccccccccceeeeecccceeeehhhhhhheeeeehhh 


SEQ 
PRD 


CKCSFNQKPRAPGDEEAQVENLITANATEPQKAEN 
hhhcccccccccccchhhhhhhhhhhccccccccc 


Prosite for DKFZphfbr2_82il7.2 

PSOOOOl 8 6->90 ASN^GLYCOSYLATION PDOCOOOOl 

PS00005 36->39 PKC_PHOSPHO_SITE PDOC00005 

PS00005 58->61 PKC_PHOSPHO_SrTE PDOC00005 

PS00006 19->23 CK2_PH0SPH0 SITE PDOC00006 

PSOO0O7 25->33 TYR_PH0SPH02SITE PDOC00007 

PS00008 41->47 MYRISTYL PDOC00008 

PS01310 28->42 ATP1G1_PLM MATS PDOC01014 


(No Pfam data available for DKF2phfbr2_S2il7 . 2) 


363 


wo 01/12659 


PCT/IBOO/01496 


DKFZphfbr2_82i24 


group: nucleic acid management 

DKFZphfbr2_82i24 encodes a novel 547 amino acid protein with similarity to DEAD-box 
superfamily ATP-dependent helicases. 

RNA helicases comprise a large family of proteins that are involved in basic biological 
systems such as nuclear and mitochondrial splicing processes, RNA editing, rRNA processing, 
translation initiation, nuclear mRNA export, and mRNA degradation. RNA helicases are essential 
factors in cell development and differentiation, and some of them play a role in transcription 
and replication of viral single-stranded RNA genomes. The members of the largest subgroup, the 
DEAD and OEAH box proteins, exhibit a strong dependence of the unwinding activity on ATP 
hydrolysis. 

The novel protein contains a DEAD-box an ATP/GTP-binding site motif A (P-loop, interacting 
with one of the phophate groups of the nucleotide) and a leucine zipper. Mutations in the 
closely related Drosophila Hlc gene result in lethality in hc«nozygotes . Therefore the new 
protein seems to be critical involved in RNA processing in eukariontic c ells. 

The new protein can find application in modulating RNA metabolism and gene expression. 


strong similarity to DEAD-box subfamily ATP-dependent helicase 
complete cDNA, complete cds, EST hits 

potential Start at Bp 9 matches Kozak consensus PyNNatgG, 
[PFAMJ Helicases conserved C-terminal domain 
[PFAM] DEAD and DEAH box helicases 

Sequenced by DKFZ 

Locus: /map="720A_3; 758_H 4; 772_E_3; e04_A5; 175.5 cR from topFT of Chr7 linkage group- 
Insert length: 1860 bp 

Poly A stretch at pos. 1850, polyadenylation signal at pos. 1829 


1 AGCAGCGCCA TGGAGGACTC TGAAGCACTG GGCTTCGAAC ACATGGGCCT 
51 CGATCCCCGG CTCCTTCAGG CTGTCACCGA TCTGGGCTGG TCGCGACCTA 
101 CGCTGATCCA GGAGAAGGCC ATCCCACTGG CCCTAGAAGG GAAGGACCTC 
151 CTGGCTCGGG CCCGCACGGG CTCCGGGAAG ACGGCCGCTT ATGCTATTCC 
201 GATGCTGCAG CTGTTGCTCC ATAGGAAGGC GACAGGTCCG GTGGTAGAAC 
251 AGGCAGTGAG AGGCCTTGTT CTTGTTCCTA CCAAGGAGCT GGCACGGCAA 
301 GCACAGTCCA TGATTCAGCA GCTGGCTACC TACTGTGCTC GGGATGTCCG 
351 AGTGGCCAAT GTCTCAGCTG CTGAAGACTC AGTCTCTCAG AGAGCTGTGC 
401 TGATGGAGAA GCCAGATGTG GTAGTAGGGA CCCCATCTCG CATATTAAGC 
451 CACTTGCAGC AAGACAGCCT GAAACTTCGT GACTCCCTGG AGCTTTTGGT 
501 GGTGGACGAA GCTGACCTTC TTTTTTCCTT TGGCTTTGAA GAAGAGCTCA 
551 AGAGTCTCCT CTGTCACTTG CCCCGGATTT ACCAGGCTTT TCTCATGTCA 
601 GCTACTTTTA ACGAGGACGT ACAAGCACTC AAGGAGCTGA TATTACATAA 
651 CCCGGTTACC CTTAAGTTAC AGGAGTCCCA GCTGCCTGGG CCAGACCAGT 
701 TACAGCAGTT TCAGGTGGTC TGTGAGACTG AGGAAGACAA ATTCCTCCTG 
751 CTGTATGCCC TGCTCAAGCT GTCATTGATT CGGGGCAAGT CTCTGCTCTT 
801 TGTCAACACT CTAGAACGGA GTTACCGGCT ACGCCTGTTC TTGGAACAGT 
851 TCAGCATCCC CACCTGTGTG CTCAATGGAG AGCTTCCACT GCGCTCCAGG 
901 TGCCACATCA TCTCACAGTT CAACCAAGGC TTCTACGACT GTGTCATAGC 
951 AACTGATGCT GAAGTCCTGG GGGCCCCAGT CAAGGGCAAG CGTCGGGGCC 
1001 GAGGGCCCAA AGGGGACAAG GCCTCTGATC CGGAAGCAGG TGTGGCCCGG 
1051 GGCATAGACT TCCACCATGT GTCTGCTGTG CTCAACTTTG ATCTTCCCCC 
1101 AACCCCTGAG GCCTACATCC ATCGAGCTGG CAGGACAGCA CGCGCTAACA 
1151 ACCCAGGCAT AGTCTTAACC TTTGTGCTTC CCACGGAGCA GTTCCACTTA 
1201 GGCAAGATTG AGGAGCTTCT CAGTGGAGAG AACAGGGGCC CCATTCTGCT 
1251 CCCCTACCAG TTCCGGATGG AGGA6ATCGA GGGCTTCCGC TATCGCTGCA 
1301 GGGATGCCAT GCGCTCAGTG ACTAAGCAGG CCATTCGGGA GGCAAGATTG 
1351 AAGGAGATCA AGGAAGAGCT TCTGCATTCT GAGAAGCTTA AGACATACTT 
1401 TGAAGACAAC CCTAGGGACC TCCAGCTGCT GCGGCATGAC CTACCTTTGC 
1451 ACCCCGCAGT GGTGAAGCCC CACCTGGGCC ATGTTCCTGA CTACCTGGTT 
1501 CCTCCTGCTC TCCGTGGCCT GGTACGCCCT CACAAGAAGC GGAAGAAGCT 
1551 GTCTTCCTCT TGTAGGAAGG CCAAGAGAGC AAAGTCCCAG AACCCACTGC 
1601 GCAGCTTCAA GCACAAAGGA AAGAAATTCA GACCCACAGC CAAGCCCTCC 
1651 TGAGGTTGTT GGGCCTCTCT GGAGCTGAGC ACATTGTGGA GCACAGGCTT 
1701 ACACCCTTCG TGGACAGGCG AGGCTCTGGT GCTTACTGCA CAGCCTGAAC 
1751 AGACAGTTCT GGGGCCGGCA GTGCTGGGCC CTTTAGCTCC TTGGCACTTC 
1801 CAAGCTGGCA TCTTGCCCCT TGACAACAGA ATAAAAATTT TAGCTGCCCC 
1851 AAAAAAAAAA 


BLAST Results 
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Entry HSG05793 from database EMBL: 
human STS WI-6581. 
Length =206 
Minus Strand HSPs: 

Score = 992 (148.8 bits). Expect - 6.0e-38, P - 6.0e-38 

Identities = 204/208 (98%), Positives « 204/208 (98%), Strand = Minus / 


Entry AC004938 from database EMBL: 

Homo sapiens clone DJ0971C03; HTGS phase 1, 18 unordered pieces. 
Score = 1269, P = 6.5e-202, identities = 269/282 
12 exons. Bp -87920-93706 (matching 1-1497) 


Medline entries 


No Medline entry 


Peptide information for frame 1 


ORF from 10 bp to 1650 bp; peptide length: 547 
Category: strong similarity to known protein 
Classification: Nucleic acid management 
Prosite motifs: ATP GTP_A (51-59) 
LEUCINE_ZIPPER (149-171) 


1 MEDSBALGFE HMGLDPRLLQ AVTDLGWSRP TLIQEKAIPL ALEGKDLLAR 
51 ARTGSGKTAA YAIPMLQLLL HRKATGPVVE QAVRGLVLVP TKELARQAQS 
101 MIQQLATYCA RDVRVANVSA AEDSVSQRAV LMEKPDVWG TPSRILSHLQ 
151 QDSLKLRDSL ELLWDEADL LFSFGFEEEL KSLLCHLPRI YQAFLMSATF 
201 NEDVQALKEL ILHNPVTLKL QESQLPGPDQ LQQFQWCET EEDKFLLLYA 
251 LLKLSLIRGK SLLFVNTLER SYRLRLFLEQ FSIPTCVLNG ELPLRSRCHI 
301 ISQFNQGFYD CVIATDAEVL GAPVKGKRRG RGPKGDKASD PEAGVARGID 
351 FHHVSAVLNF DLPPTPEAYI HRAGRTARAN NPGIVLTFVL PTEQFHLGKI 
401 EELLSGENRG PILLPYQFRM EEIEGFRYRC RDAMRSVTKQ AIREARLKEI 
451 KEELLHSEKL KTYFEDNPRD LQLLRHDLPL HPAVVKPHLG HVPDYLVPPA 
501 LRGLVRPHKK RKKLSSSCRK AKRAKSQNPL RSFKHKGKKF RPTAKPS 


BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKPZphfbr2_82i24, frame 1 

TREMBL:AF017777_10 gene: "hlc"; product: "helicase"; Drosophila 
melanogaster tweety (tty), flightless (fli), dodo (dod), penguin (pen), 
small optic lobes (sol), innocent bystander (iby), waclaw (waw) , bobby 
SOX (bbx), sluggish (slg) , helicase (hlc), misato (mst) , and la costa 
(Ics) genes, complete cds., N « 1, Score = 1230, P = 3.2e-125 

TREMBL:SPCC1494_6 gene: "SPCC1494.06c"; product: '^atp dependent 
helicase**; S.pombe chromosome ii cosmid cl494., N «= 2, Score = 753, p « 


PIR:S51412 hypothetical protein YLR276c - yeast (Saccharomyces 
cerevisiae), N = 2, Score = 711, p = 8.2e-117 

TREMBL:AF025451_^2 gene: "C24H12 , 4"; Caenorhabditis elegans cosmid 
C24H12., N « 2, Score = 564, P « 2,7e-9.9 


>TREMBL:AF017777_10 gene: "hlc"; product: **helicase"; Drosophila 

melanogaster tweety (tty), flightless (fli), dodo (dod), penguin (pen), 
small optic lobes (sol), innocent bystander (iby), waclaw (waw), bobby sox 
(bbx), sluggish (slg), helicase (hlc), misato (mst), and la costa (Ics) 
genes, complete cds. 
Length » 560 

HSPs: 

Score - 1230 (184.5 bits). Expect « 3.2e-125, P « 3.2e-125 
Identities = 251/497 (50%), Positives «- 344/497 (69%) 
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Query: 9 FEHMGLDPRLLQAVTDLGWSRPTLIQEKAIPLALEGKDLLARARTGSGKTAAYAIPMLQL 68 

F + LD R+L+AV LGW +PTLIQ AIPL LEGKD++ RARTGSGKTA YA+P++Q 
Sbjct: 11 FHELELDQRILKAVAQLGWQQPTLIQSTAIPLLLEGKDVVVRARTGSGKTATYALPLIQK 70 

Query: 69 LLHRKATGPVVEQAVRGLVLVPTKELARQAQSMIQQLATYCARDVRVANVS-AAEDSVSQ 127 

+ L+ K EQ V -»-VL PTKEL RQ++ +I+QL C + VRVA+++ ++ D+V+Q 

Sbjct: 71 ILNSKLNAS--EQYVSAVVLAPTKELCRQSRKVIEQLVESCGKVVRVADIADSSNDTVTQ 128 

Query: 128 RAVLMEKPDVVVGTPSRILSHLQQDSLKLRDSLELLVVDEADLLFSFGFEEELKSLLCHL 187 

RLE PD+W TP+ +L++ + S+ +E LWDEADL+F++G+E++ K L+ HL 

Sbjct: 129 RHALSESPDIVVATPANLLAYAEAGSWDLKHVETLVVDEADLVFAYGYEKDFKRLIKHL 188 

Query: . 188 PRI YQAFLMSATFNEDVQALKELILHNPVTLKLQESQLPGPDQLQQFQVVCETEEDKFLL 247 

P lYQA L+SAT +DV +K L L+NPVTLKL+E +L DQL +++ E E DK + 
Sbjct: 189 PPIYQAVLVSATLTDDVVRMKGLCLNNPVTLKLEEPELVPQDQLSHQRILAE-ENDKPAI 247 

Query: 248 LYALLKLSLIRGKSLLFVNTLERSYRLRLFLEQFSIPTCVLNGELPLRSRCHIISQFNQG 307 

LYALLKL LIRGKS++FVN+++R Y++RLFLEQF I CVLN ELP R H ISQFN+G 
Sbjct: 248 LYALLKLRLIRGKSIIFVNSIDRCYKVRLFLEQFGIRACVLNSELPANIRIHTISQFNKG 307 

Query: 308 FYDCVIATDAEVLGAPVKGKRRGRGPKGDKASDPEAGVARGIDFHHVSAVLNFDLPPTPE 367 

YD +IA+D + P G + K ++ D E+ +RGIDF V+ V+NFD P 

Sbjct: 308 TYDIIIASDEHHMEKP — GGKSATNRKSPRSGDMESSASRGIDFQCVNNVINFDFPRDVT 365 

Query: 368 AYIHRAGRTARANNPGIVLTFVLPTEQFHLGKIEELL SGENRGPILLPYQFRMEEI 423 

+YIHRAGRTAR NN G VL+FV E +E+ L + + 1+ yQF+MEE+ 

Sbjct: 366 SYIHRAGRTARGNNKGSVLSFVSMKESKVNDSVEKKLCDSFAAQEGEQIIKNYQFKMEEV 425 

Query: 424 EGFRYRCRDAMRSVTKQAIREARLKEIKEELLHSEKLKTYFEDNPRDLQLLRHDLPLHPA 483 

E FRYR +D R+ T+ A+ + R++EIK E+L+ EKLK +FE+M RDLQ LRHD PL 
Sbjct: 426 ESFRYRAQDCWRAATRVAVHDTRIREIKIEILNCEKLKAFFEENKRDLQALRHDKPLRAI 485 

Query: 484 VVKPKLGHVPDYLVPPALRGLV 505 

V+ HL +P+Y+VP ALI- +V 
Sbjct: 486 KVQSHLSDMPEYIVPKALKRVV 507 

Pedant information for DKFZphfbr2_82i24, frame 1 


Report for DKFZphfbr2_82i24. 1 


(LENGTH) 547 

(MWJ 61589.88 

Ipl) 9.34 

[HCMOD TREMBL:AF017777_10 gene: "hlc"; product: "helicase"; Drosophila melanogaster 

tweety (tty), flightless (fli) , dodo (dod) , penguin (pen), small optic lobes (sol), innocent 
bystander (iby), waclaw (waw) , bobby s ox (bbx), sluggish (slg), helicase (hlc), misato (mst), 
and la costa (Ics) genes, complete cds. le-121 


[FUNG AT] 
(FUNCAT] 
2e-42 
I FUNCAT 1 
(FUNCAT J 
[ FUNCAT 1 
[FUNCAT] 
cerevisiae, 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 

influenzae, 

[FUNCAT] 

[FUNCAT J 

(FUNCAT] 

[BLOCKS] 

[BLOCKS) 

[BLOCKS] 

[BLOCKS] 

[PIRKW] 

[PIRKW] 

f PIRKW] 

(PIRKW) 

(PIRKW) 

(PIRKW] 

(PIRKW) 

[PIRKW] 

[PIRKW] 

[PIRKW] 

(PIRKW) 


98 classification not yet clear-cut (S. 
j mrna translation and ribosome biogenesis 


cerevisiae, YLR276c) le-109 

(H. influenzae, HI0231 RNA) 


[S. cerevisiae, 
[S. cerevisiae, 
(S. cerevisiae, 
{S- cerevisiae. 


YKR059W] 
YDL160C] 
YPL119C] 
YMR290C] 


recombination and repair 


3e-39 
3e-35 
3e-29 
4e-29 
(H. 


04.01.04 rrna processing (S. cerevisiae, YLLOOSw] 8e-40 

06.10 assembly of protein complexes [S. cerevisiae, YLLOOSw] 86-40 

30.10 nuclear organization (S. cerevisiae, YLLOOSw] 8e-40 

05.04 translation (initiation, elongation and termination) (S. 

YKR059W] 3e-39 

30.03 organization of cytoplasm 
04-99 other transcription activities 
04.05.03 mrna processing (splicing) 
04.05.01.07 chromatin modification 
1 genome replication, transcription, 

HI0892] le-27 

09-01 biogenesis of cell wall (S. cerevisiae, YJL033w] 2e-27 
30.16 mitochondrial organization [S. cerevisiae, YDR194C] 4e-21 

99 unclassified proteins [S. cerevisiae, yGL064c) le-05 

BL00039D DEAD-box subfamily ATP-dependent helicases proteins 
BL00039C DEAD-box subfamily ATP-dependent helicases proteins 
BL00039B DEAD-box subfamily ATP-dependent helicases proteins 
BL00p39A DEAD-box subfamily ATP-dependent helicases proteins 
nucleus 4e-34 
RNA binding 7e-41 
DEAD box 2e-38 
transmembrane protein 9e-20 
DNA binding 8e-23 
ATP le-107 

purine nucleotide binding 2e-3B 
P-loop le-107 

hydrolase 2e-35 

protein biosynthesis 2e-38 

ATP binding 7e-43 
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[SUPFAM] WW repeat homology le-26 

[SUPFAMj DEAD/H box helicase homology le-107 

( SUPFAM] unassigned DEAD/H box helicases le-107 

{SUPFAMJ ATP- dependent RNA helicase DBPl 3e-31 

(SUPFAM} ATP-dependent RNA helicase DHHl 2e-35 

[SUPFAM) translation initiation factor eIF-4A 2e-38 

(SUPFAMJ tobacco ATP-dependent RNA helicase DBIO le-26 

(PROSITEl ATP_GTP_A 1 

IPROSITE) LEUCINE_ZIPPER 1 

{PFAMJ Helicases conserved C-tenninal domain 

tPFAMJ DEAD and DEAH box helicases 

[KWl Alpha Beta 

(KWJ LOW COMPLEXITY 9.87 % 


SEQ 
SEG 
PRO 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 


MEDSEALGFEHMGLDPRLLQAVTDLGWSRPTLIQEKAIPLALEGKDLLARARTGSGKTAA 

ccccccccccccccchhhhhhhhhhccccccccccccccccccccceeeeecccccccee 

YAIPMLQLLLHRKATGPVVEQAVRGLVLVPTKELARQAQSMIQQLATYCARDVRVANVSA 

ehhhhhhhhhhhcccccccccceeeeeeccchhhhhhhhhhhhhhhhhhhcceeeeeecc 

AEDSVSQRAVLMEKPDWVGTPSRILSHLQQDSLKLRDSLELLWDEADLLFSFGFEEEL 

xxxxxxxxxxxx 

ccchhhhhhhhhcccceeeeccccchhhhhhcccccchhhhhhhhhhhhhhhhhcchhhh 

KSLLCHLPRIYQAFLMSATFNEDVQALKELILHNPVTLKLQESQLPGPDQLQQFQVVCET 

hhhhhhccchhhhhhhhhccchhhhhhhhhhhcccceeeeeccccccchhhhhhhhhhhh 

EEDKFLLLYALLKLSLIRGKSLLFVNTLERSYRLRLFLEQFSIPTCVLNGELPLRSRCHI 
xxxxxxxxxxx 

hhhhhhhhhhhhhhhhccceeeeeeehhhhhhhhhhhhhhcccceeeccccchhhhhhhh 

ISQFNQGFYDCVIATDAEVLGAPVKGKRRGRGPKGDKASDPEAGVARGIDFHHVSAVLNF 
xxxxxxxxxxxxx 

hhhhhccceeeeeeccccccccccccccccccccccccccccccccccccccceeeeeec 

DLPPTPEAYIHRAGRTARANNPGIVLTFVLPTEQFHLGKIEELLSGENRGPILLPYQFRM 

ccccccceeeeccccccccccccceeeeeecchhhhhhhhhhhhhhhccccccccccchh 

EEIEGFRYRCRDAMRSVTKQAIREARLKEIKEELLHSEKLKTYFEDNPRDLQLLRHDLPL 

hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccchhhhhhh^ 

HPAWKPHLGHVPDYLVPPALRGLVRPHKKRKKLSSSCRKAKRAKSQNPLRSFKHKGKKF 

xxxxxxxxxxxxxxxxxx 

cccccccccccccceeeccccccccccccccccccchhhhhhcccccccccccccccccc 

RPTAKPS 
ccccccc 


Prosite for DKF2phfbr2_82i24 . 1 


PS00017 
PS00029 


51->59 
149->171 


ATP_GTP_A 
LEUCINE ZIPPER 


PDOC00017 
PDOC00029 


Pfam for DKFZphfbr2_82i24 . 1 


HMM_NAME 

DEAD and DEAH box helicases 


HMM 
Query 

13 

* gLpPWI LRnI y eMGFEkPTPIQQqAI Pi ILeGRDVMACAQTGSGKTAAF 
GL+P +L +++++G+++PT IQ++AIP++LEG+D++A+A TGSGKTAA+ 
GLDPRLLQAVTDLGWSRPTLIQEKAIPLALEGKDLLARARTGSGKTAAY 

61 

HMM 

Query 

62 

lIPMLQHIDwdP. . .WpqpPQdPrALILAPTRELAMQIQEEcRkFgkHMn 
+ IPMLQ +++ + + + +R+L+L+PT ELA+Q Q +++++ ++ 
AIPMLQLLLHRKATGPVVEQA-VRGLVLVPTKELARQAQSMIQQLATYCA 

110 

HMM 
Query 

111 

g . IRImcIYGGtnMRdQMRmLeRGpPHIVIATPGRLIDHIERgtldLDr . 

+R++ + + Q +L+++P ++V++TP R-I-+ H+++ +L+L+-*- 
RDVRVANVSAAEDSVSQRAVLMEKP-DVVVGTPSRILSHLQQDSLKLRDS 

159 

HMM 


I eMLVMDEADRMLDMGFI DQIRr IMrql PMpwNRQTMMFS ATMPde I qEL 
+E LV DEAD +++ GF++++ +-I-P + Q + SAT+ +++Q L 
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Query 

HMM 

Query 


160 LELLVVDEADLLFSFGFEEELKSLLCHLP— RIYQAFLMSATFNEDVQAL 207 


ARrFMRNPIRInldMdElTtnEnlkQwYiyVerEMWKfdcLcrLIe* 
+ +++NP+ + +++L + ++Q+ +4+E E++KF +L+ L++ 
208 KELI LHNPVTLKLQESQLPGPDQLQQFQVVCETEEDKFLLLYALLK 


253 


HMM_NAME Helicases conserved C- terminal domain 

HMM *EileeWLknlGIrvmYIHCklMpQeERdeIMddFNnGEynVLIcTDV. . . 

+L+ +L++ I+++++ G +P + R 1+ +FN+G Y++ I+TD+ 
Query 272 YRLRLFLEQFSIPTCVLNGELPLRSRCHIISQFNQGFYDCVIATDAEVL 320 

HMM ggRGIDIPdVNHVINYDMPWNPEqYI 

+RGID+ V+ V N+D+P +PE YI 
Query 321 GAPVKGKRRGRGPKGDKASDPEAGVARGIDFHHVSAVLNFDLPPTPEAYI 370 

HMM QRIGRTgRIG* 

+R+GRT+R++ 
Query 371 HRAGRTARAN 380 


368 


wo 01/12659 


PCT/IBOO/01496 


DKFZphfbr2_82ml6 

group: brain derived 

A^thalianf"F28AL^"''*''^" ^ "^''^^ ^""^""^ ^"^^^ protein with very weak similarity to 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

genes^"* P^^tein can find application in studying the expression profile of brain-specific 

similarity to A.thaliana F28A23.140 

complete cDNA, complete cds, few EST hits 
many ATGs in front of the ORF 
TRANSMEMBRANE 1 

Sequenced by DKFZ 

Locus: /map="4'' 

Insert length: 2715 bp 

Poly A stretch at pos. 2705, polyadenylation signal at pes. 2687 

1 AGAGGAGGGG AGAGGACTGG GGAGCCGAGC CAGAGCCGGG CTGCCTGCCA 
51 CCCGGCTGCT CGTCCGCTAG CTGGGGAGGA GCGCTCCACC CGCAACTGAC 
101 AAAGGATGGG AGAATGCCCG CGCCCCGGGA TGCCGGCCGC ACGCAGCCTG 
151 GCGGCCGCCT GAGCTACTTC ACCCTCCGCC GGTAAGTGAC TGCAAACATC 
201 ATTCATTCAA TCAGCCTCAC TGGGAGCCCC TTCTCTCCGG CTGGTAGTCC 
251 TGGGCGGCTT GTCCCTGATC CCGAGCGGGG CTTGGCACAG CATCAGCCCT 
301 GGAGGGCAGG CAGCAGGTGC CTTTGCCTGG TGGGTCCACT GGGGAGCGTG 
351 GCTGGGGTTC GCGGCGGGTG CTGCCACCCA ACCTGCGGGC GGCGGGCTCG 
401 CCCAGTAGGC GCCTCTCTGG TGAGAGGAGG CGGCTCCAGC CCGCATCCTG 
451 GGGTAGTTGC TACTATTGGC CCCCAGCGCC CGCTCTGCGC GCGCGCCGTT 
501 TCTGGCGGAT- CCCCAGTGCG CGGCGCGCTG TTTACACCGG CGTGGTACTA 
551 GTCACGGAGC CGCACCCCTC GGAAAGCGCG GAGTCGATGA CAGCCACTTC 
601 ACAGGCTCAC GCGCTCCTAG TGTGGGCTTG AAGGGGACGG GGACCGATTA 
651 CCAAAGGAGA GCGCTGAGTA CGGAAGACAC AGGGCAGCCT TTGTCTTGGG 
701 TTTAGCGCTG ATGCGCTCAA CCCTGAGTCG GGTTCACTGC AACTGTTGTG 
751 TCCGATTTCG GTTCCCTGCA ACCGCCCTCC TGGGCGAGAG ATGTCATTGT 
801 GTTCCTGCGG CCAGCGGGAC TGAGAGCTGG GACTTAAGAC GCCAGGAGGG 
851 TCCTGCGCTC ACGGGAAATG TACCCCAAAA GAACTCTGAG AGAATATACT 
901 CAACTGTCCT GCTGTGATTA AACAAGACTG CTGTATTTTA ATTTCAGAAA 
951 TTGAAAAGGG ATAGGAGGAA GGGGAAAATG CTGGGCTGGT GTGAAGCGAT 
1001 AGCCCGTAAC CCTCACAGAA TTCCAAACAA CACGCGAACA CCCGAGATCT 
1051 CAGGGGATTT GGCTGACGCC TCACAAACCT CCACATTGAA TGAAAAATCC 
1101 CCAGGGCGAT CTGCAAGTCG ATCAAGTAAC ATTTCAAAAG CAAGCAGCCC 
1151 AACAACAGGG ACAGCTCCCA GGAGCCAGTC AAGGTTGTCT GTCTGTCCAT 
1201 CCACTCAGGA CATCTGCAGA ATCTGTCACT GCGAAGGGGA TGAAGAGAGC 
1251 CCCCTCATCA CACCCTGTCG CTGCACTGGG ACACTGCGCT TTGTCCACCA 
1301 GTCCTGCCTC CACCAGTGGA TAAAGAGCTC AGATACACGC TGCTGTGAGC 
1351 TCTGCAAGTA TGACTTCATA ATGGAGACCA AGCTCAAACC CCTCCGGAAG 
1401 TGGGAGAAAC TACAGATGAC CACAAGTGAA AGGAGGAAAA TATTCTGCTC 
1451 TGTCACATTC CACGTAATCG CGATCACCTG TGTGGTTTGG TCTTTGTATG 
1501 TATTGATAGA CCGGACAGCG GAGGAAATCA AGCAAGGCAA TGACAATGGT 
1551 GTCCTTGAAT GGCCATTTTG GACAAAACTG GTTGTGGTAG CCATTGGCTT 
1601 CACAGGAGGT CTTGTCTTCA TGTACGTACA GTGTAAAGTC TATGTTCAGT 
1651 TGTGGCGCAG GCTGAAGGCC TACAACCGTG TGATCTTTGT ACAAAATTGC 
1701 CCAGACACTG CCAAAAAACT GGAGAAGAAC TTCTCATGTA ATGTAAACAC 
17 51 AGACATCAAA GATGCTGTGG TAGTGCCTGT ACCACAAACA GGTGCAAATT 
1801 CACTGCCATC TGCAGAGGGT GGCCCCCCTG AAGTTGTATC AGTCTGATGG 
1851 AACCTGTTGG GAGTTTCTTC ACCGAAGAAT ATCTTTCTAG CCCTCAGCCA 
1901 CTACAAATGA CAGAAGTGAC CTTGAATTAT TTACTCCCTT CAGCTCCTCC 
1951 TTTCTCCTAC TGACACATTT TTCCTGACTT TGTTCAAAGA GGAAAGGAGA 
2001 AAAACAAACA AACAGACCAA ATGCCCAGGA GCCCATGAAG TAATAGCGTA 
2051 AAGTAAAGTA TGATATGGAA ATGTGAAGTT TGCAAGAGAA TGATTTCCAA 
2101 GACAATTAAG AACTACTGGG GCAATGAATG CTTTTAGGCA GTAATCAAAG 
2151 ATTAAATGGA CCCATGATAC TCTTCTTCAC AGTAACAGGG GAAAAGTTCA 
2201 AGAATACAGA CTTGAATTGC GATGTGTATT ACTTCTAGGG CCTTGTAATG 
2251 TTAACTGTCT CATCTGGAAA TAATAACTAA CATATTTGGT TTTAAGCCTG 
2301 AAATTGTCTG CATTATCCCT AAGTCACATT GGAAGTGAAC TTGGAGGATG 
2351 CATATTTTGA TATGCTTTGA CAGCTAACAG ATTTGTATGG TTTAGTGGAG 
2401 TCTGGTTATT TTGACAGATG CATGTTTTTT TTAAATAGAT GCAATATACA 
2451 TTTGAAGACA TTGATATTTG GAATTAATTA TGTTTGTTTA AGTCACGCAA 
2501 AAGATTTTCA GAAAATGTTC GGATATAATT AGCTCTGTTA AATACCCACA 
2551 GAACTGTTAT CAGGTCTTAT ATTTATTTTC ATCTGGTTCC TCTAATACAG 
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2601 TGCTGTCCAA TAGAAACACA ACAGCCACAA ATGCAGGCCA CAGATGCAAA 
2631 TATTTAACTT CCCAGTAGCC CTATTTTAAA AAGTAAAAAT AAATGTTTGT 
2701 TTGTTAAAAA AAAAA 


BLAST Results 


Entry G37457 from database EMBLNEW: 
SHGC-57357 Human Homo sapiens STS genomic. 

Length = 458 
Plus Strand HSPs: 

Score = 2116 <317.5 bits), Expect « 4.3e-91, P = 4.3e-91 
Identities = 444/456 (97%) 


Medline entries 


No Medline entry 


Peptide information for frame 3 


1 MLGWCEAIAR NPHRIPNNTR TPEISGDLAD ASQTSTLNEK SPGRSASRSS 
51 NISKASSPTT GTAPRSQSRL SVCPSTQDIC RICHCEGDEE SPLITPCRCT 
101 GTLRFVHQSC LHQWIKSSDT RCCELCKyOF IMETKLKPLR KWEKLQMTTS 
151 ERRKIFCSVT FHVIAITCVV WSLYVLIDRT AEEIKQGNDN GVLEWPFWTK 
201 LWVAIGFTG GLVFMYVQCK VYVQLWRRLK AYNRVIFVQN CPDTAKKLEK 
251 NFSCNVNTDI KDAVVVPVPQ TGANSLPSAE GGPPEWSV 

ORE from 978 bp to 1844 bp; peptide length: 289 
Category: similarity to unknown protein 


BLASTP hits 
Entry AB011169_1 from database TREMBL: 

gene: "KIAA0597"; product: "KIAA0597 protein"; Homo sapiens mRNA for 

KIAA0597 protein, partial cds. 

Score = 188, P = 6.0e-12, identities - 30/54, positives - 38/54 
Entry SPBC14F5_7 from database TREMBL: 

gene: •'SPBC14F5. 07-; product; "hypothetical protein"; S.pombe 
chromosome II cosmid cl4FS. 

Score =• 185, P = 1.9e-ll, identities = 29/53, positives - 38/53 
Entry CEY57A10B_1 from database TREMBL: 

gene: "YSVAIOB.I"; Caenorhabditis elegans cosmid Y57A10B 

Score - 171, p « 2.6e-10, identities « 40/107, positives •= 58/107 


Alert BLASTP hits for DKFZphfbr2_82ml6, frame 3 

TREMBL :ATF28A23_1 4 gene: "F28A23 . 140"; product: "putative protein"; 
Arabldopsis thaliana DNA chromosome 4, BAC clone F28A23 {ESSAII 
project), N « 1, Score « 198, P « 3,4e-13 


>TREMBL:ATF28A23_14 gene: "F28A23. 140"; product: "putative protein"; 

Arabidopsis thaliana DNA chromosome 4, BAC clone F28A23 (ESSAII project) 
Length - 1,051 

HSPs: 

Score = 198 (29.7 bits). Expect - 3.4e-13, P = 3.4e-13 
Identities « 38/103 (36%), Positives - 61/103 (59%) 

Query: 28 LADASQTSTLNEKSPGRSASRS-SNISKASSPTTGTAPRSQSRLSVCPSTQDICRICHCE 86 

+++ S +S+ + SP +++ SN+ A S TG+ +D+CRIC 
SbjCt: 20 VSEPSVSSSSSSSSPNQASPNPFSNMDPAVSTATGSRYVDDDE DEEDVCRICRNP 74 

Query: 87 GDEESPLITPCRCTGTLRFVHQSCLHQWIKSSDTRCCELCKYDF 130 

GD ++PL PC C+G+++FVHQ CL QW+ S+ R CE+CK+ F 
Sbjct: 75 GDADNPLRYPCACSGSIKFVHQDCLLQWLNHSNARQCEVCKHPF 118 
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Pedant information for DKFZphfbr2_82ml6, frame 3 


Report for DKFZph£br2_82ral6 .3 


(LENGTH) 289 

(MWJ 32308,36 

[pi] 8.76 

[HOMOLJ PIR:T00268 hypothetical protein KIAA0597 - human (fragment) 9e-14 

[FONCATJ 04.99 other transcription activities [S. cerevisiae, YIL030c] 4e-09 

[PIRKW] transmembrane protein 9e-08 

[PROSITE] MYRISTYL 1 

[PROSITEl CK2_PH0SPH0_SITE 4 

(PROSITE) TyR_PHOSPHO_SITE 1 

(PROSITE) PKC_PHOSPHO_SITE 3 

(PROSITE) ASN_GLYCOSYLATION 3 

(KW) Alpha_Beta 

(KWJ LOW COMPLEXITY 6.57 % 


SEQ MLGWCEAIARNPHRIPNNTRTPEISGDLADASQTSTLNEKSPGRSASRSSNISKASSPTT 

xxxxxxxxxxxxxxxxxxx. . 

PRD ccchhhhhhccccccccccccccccchhhhhhhhhccccccccccccccccccccccccc 

SEQ GTAPRSQSRLSVCPSTQDICRICHCEGDEESPLITPCRCTGTLRFVHQSCLHQWIKSSDT 

SEG 

PRD ccccccccccccccccceeeeeeecccccccccccccccccceeeeehhhhhhhhhcccc 

SEQ RCCELCKYDFIMETKLKPLRKWEKLQMTTSERRKI FCSVTFHVI AITC WWSLYVLI DRT 

SEG 

PRD ceeeeeeehhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccc 

SEQ AEEIKQGNDNGVLEWPFWTKLVWAIGFTGGLVFMYVQCKVYVQLWRRLKAYNRVIFVQN 

SEG 

PRD ccccccccccceeehhhhheeeeeeecccccceeeeehhhhhhhhhhhhhhhheeeeeee 

SEQ CPDTAKKLEKNFSCNVNTDIKDAWVPVPQTGANSLPSAEGGPPEVVSV 

SEG 

PRD ccchhhhhhccccccccccceeeeeeecccccccccccccccccccccc 


Prosite for DKFZphfbr2_82ml6. 3 


PSOOOOl 

17->21 

ASN 

GLYCOSYLATION 

PDOCOOOOl 

PSOOOOl 

51->55 

asn' 

"glycosylation 

PDOCOOOOl 

PSOOOOl 

251->255 

ASN~ 

'GLYCOSYLATION 

PDOCOOOOl 

PS00005 

102->105 

PKC" 

PHOSPHO SITE 

PDOC00005 

PS00005 

150->153 

PKC' 

"PHOSPHO SITE 

PDOC0QO05 

PS00005 

244->247 

PKC" 

'pHOSPHO SITE 

PDOC00005 

PS00006 

36->40 

CK2' 

>HOSPHO SITE 

PDOC00006 

PS00006 

75->79 

CK2' 

'PHOSPHO SITE 

PDOC00006 

PS0D006 

148->152 

CK2" 

"pHOSPHO SITE 

PDOC00006 

PS0D006 

180->184 

CK2" 

"PHOSPHO SITE 

PDOCG0006 

PS00007 

121->129 

tyr" 

"PHOSPHO SITE 

PDOC00007 

PS00008 

187->193 

MYRISTYL 

PDOC00008 


(No Pfam data available for DKFZphfbr2_82ml6.3) 
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DKFZphfbr2_82m6 


group: signal transduction 

DKF2phfbr2_82m6.3 encodes a novel 654 amino acid protein with similarity to murine sphingosine 
kinase. ~ 

Sphingosine kinase is a new type of lipid kinase, which is regulated by growth factors. The 
enzyme phosphorylates sphingosine, which subsequently exerts intracellular and extracellular 

actions. Intracellulary, sphingosine 1-phosphate (SPP) promotes proliferation and inhibits 
apoptosis. In yeast, survival of ceils exposed to heat shock indicates is dependend on SPP. 
Extracellulary, SPP inhibits cell motility and influences cell morphology, effects that appear 
to be mediated by the G protein-coupled receptor EDGl. 

The new protein can find application in modulating/blocking the shingosine kinase 
intracellular signal transmission pathway. 


strong similarity to mouse "sphingosine kinase" 

complete cDNA, complete cds, EST hits, 
YLR260w/YOR171c Lcb5p/Lcb4p » long chain base kinases, 
involved in biosynthesis of sphingolipids 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 2875 bp 

Poly A stretch at pos. 2865, polyadenylation signal at pos. 2838 


1 AGTGTTGGAG GTGAGGAGGC GGGGCTGGCA GGGCTAGTCG GGGCATCTGG 
51 AAATTTCCGA CCCCACGCTT CGGGCGTTTC CTTATCAGGT TCACCGCTCC 
101 CTGATCTCGC GCTGCACTTC GTAGGCGCAG CCGCTGCTTG GGAAGTCCTA 
151 CTTAAGAGCT GAAGGTCAGG CCAGGACAGT GAGACCTGAC TCCTTGCTCC 
201 TACCAGCCTA CTATGGCTTA AGACCCAGGG CCAGGGTCCC GTTGATGTAA 
251 CAGAGCAGAG GACCAGCAGA TGAATGGACA CCTTGAAGCA GAGGAGCAGC 
301 AGGACCAGAG GCCAGACCAG GAGCTGACCG GGAGCTGGGG CCACGGGCCT 
351 AGGAGCACCC TGGTCAGGGC TAAGGCCATG GCCCCGCCCC CACCGCCACT 
401 GGCTGCCAGC ACCTCGCTCC TCCATGGCGA GTTTGGCTCC TACCCAGCCC 
451 GAGGCCCACG CTTTGCCCTC ACCCTTACAT CGCAGGCCCT GCACATACAG 
501 CGGCTGCGCC CCAAACCTGA AGCCAGGCCC CGGGGTGGCC TGGTCCCGTT 
551 GGCCGAGGTC TCAGGCTGCT GCACCCTGCG AAGCCGCAGC CCCTCAGACT 
601 CAGCGGCCTA CTTCTGCATC TACACCTACC CTCGGGGCCG GCGCGGGGCC 
651 CGGCGCAGAG CCACTCGCAC CTTCCGGGCA GATGGGGCCG CCACCTACGA 
701 AGAGAACCGT GCCGAGGCCC AGCGCTGGGC CACTGCCCTC ACCTGTCTGC 
751 TCCGAGGACT GCCACTGCCC GGGGATGGGG AGATCACCCC TGACCTGCTA 
801 CCTCGGCCGC CCCGGTTGCT TCTATTGGTC AATCCCTTTG GGGGTCGGGG 
851 CCTGGCCTGG CAGTGGTGTA AGAACCACGT GCTTCCCATG ATCTCTGAAG 
901 CTGGGCTGTC CTTCAACCTC ATCCAGACAG AACGACAGAA CCACGCCCGG 
951 GAGCTGGTCC AGGGGCTGAG CCTGAGTGAG TGGGATGGCA TCGTCACGGT 
1001 CTCGGGAGAC GGGCTGCTCC ATGAGGTGCT GAACGGGCTC CTAGATCGCC 
1051 CTGACTGGGA GGAAGCTGTG AAGATGCCTG TGGGCATCCT CCCCTGCGGC 
1101 TCGGGCAACG CGCTGGCCGG AGCAGTGAAC CAGCACGGGG GATTTGAGCC 
1151 AGCGCTGGGC CTCGACCTGT TGCTCAACTG CTCACTGTTG CTGTGCCGGG 
1201 GTGGTGGCCA CCCACTGGAC CTGCTCTCCG TGACGCTGGC CTCGGGCTCC 
1251 CGCTGTTTCT CCTTCCTGTC TGTGGCCTGG GGCTTCGTGT CA6ATGTGGA 
1301 TATCCAGAGC GAGCGCTTCA GGGCCTTGG6 CAGTGCCCGC TTCACACTGG 
1351 GCACGGTGCT GGGCCTCGCC ACACTGCACA CCTACCGCGG ACGCCTCTCC 
1401 TACCTCCCCG CCACTGTGGA ACCTGCCTCG CCCACCCCTG CCCATAGCCT 
1451 GCCTCGTGCC AAGTCGGAGC TGACCCTAAC CCCAGACCCA GCCCCGCCCA 
1501 TGGCCCACTC ACCCCTGCAT CGTTCTGTGT CTGACCTGCC TCTTCCCCTG 
1551 CCCCAGCCTG CCCTGGCCTC TCCTGGCTCG CCAGAACCCC TGCCCATCCT 
1601 GTCCCTCAAC GGTGGGGGCC CAGAGCTGGC TGGGGACTGG GGTGGGGCTG 
1651 GGGATGCTCC GCTGTCCCCG GACCCACTGC TGTCTTCACC TCCTGGCTCT 
1701 CCCAAGGCAG CTCTACACTC ACCCGTCTCC GAAGGGGCCC CCGTAATTCC 
1751 CCCATCCTCT GGGCTCCCAC TTCCCACCCC TGATGCCCGG GTAGGGGCCT 
1801 CCACCTGCGG CCCGCCCGAC CACCTGCTGC CTCCGCTAGG CACCCCGCTG 
1851 CCCCCAGACT GGGTGACGCT GGAGGGGGAC TTTGTGCTCA TGTTGGCCAT 
1901 CTCGCCCAGC CACCTAGGCG CTGACCTGGT GGCAGCTCCG CATGCGCGCT 
1951 TCGACGACGG CCTGGTGCAC CTGTGCTGGG TGCGTAGCGG CATCTCGCGG 
2001 GCTGCGCTGC TGCGCCTTTT CTTGGCCATG GAGCGTGGTA GCCACTTCAG 
2051 CCTGGGCTGT CCGCAGCTGG GCTACGCCGC GGCCCGTGCC TTCCGCCTAG 
2101 AGCCGCTCAC ACCACGCGGC GTGCTCACAG TGGACGGGGA GCAGGTGGAG 
2151 TATGGGCCGC TACAGGCACA GATGCACCCT GGCATCGGTA CACTGCTCAC 
2201 TGGGCCTCCT GGCTCCCCGG GGCGGGAGCC CTGAAACTAA ACAAGCTTGG 
2251 TACCCGCCGG GGGCGGGGCC TACATTCCAA TGGGGCGGAG CCTGAGCTAG 
2301 GGGGTGTGGC CTGGCTGCTA GAGTTGTGGT GGCAGGGGCC CTGGCCCCGT 
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2351 CTCAGGATTG CGCTCGCTTT CATGGGACCA GACGTGATGC TGGAAGGTGG 
2401 GCGTCGTCAC GGTTAAAGAG AAATGGGCTC GTCCCGAGGG TAGTGCCTGA 
2451 TCAATGAGGG CGGGGCCTGG CGTCTGATCT GGGGCCGCCC TTACGGGGCA 
2501 GGGCTCAGTC CTGACGCTTG CCACCTGCTC CTACCCGGCC AGGATGGCTG 
2551 AGGGCGGAGT CTATTTTACG CGTCGCCCAA TGACAGGACC TGGAATGTAC 
2601 TGGCTGGGGT AGGCCTCAGT GAGTCGGCCG GTCAGGGCCC GCAGCCTCGC 
2651 CCCATCCACT CCGGTGCCTC CATTTAGCTG GCCAATCAGC CCAGGAGGGG 
2701 CAGGTTCCCC GGGGCCGGCG CTAGGATTTG CACTAATGTT CCTCTCCCCG 
2751 CGGGTGGGGG CGGGGAAATT CATATCCCCT GTTCGTCTCA TGCGCGTCCT 
2801 CCGTCCCCAA TCTAAAAAGC AATTGAAAAG GTCTATGCAA TAAAGGCAGT 
2851 CGCTTCATTC CTCTCAAAAA AAAAA 


BLAST Results 


NO BLAST result 


Medline entries 


99045661: 

Tumor necrosis factor-alpha induces adhesion molecule 
expression through the sphingosine kinase pathway. 

98395082: 

Molecular cloning and functional characterization 
of murine sphingosine kinase. 

98241633: 

Purification and characterization of rat kidney sphingosine kina: 
99178622: 

Sphingosine 1-phosphate: a prototype of a new class of second 
messengers . 


Peptide information for frame 3 


1 MNGHLEAEEQ QDQRPDQELT GSWGHGPRST LVRAKAMAPP PPPLAASTSL 
51 LHGEFGSYPA RGPRFALTLT SQALHIQRLR PKPEARPRGG LVPLAEVSGC 
101 CTLRSRSPSD SAAyFClYTY PRGRRGARRR ATRTFRADGA ATYEENRAEA 
151 QRWATALTCL LRGLPLPGDG EITPDLLPRP PRLLLLVNPF GGRGLAWQWC 
201 KNHVLPMISE AGLSFNLIQT ERQNHARELV QGLSLSEWDG IVTVSGDGLL 
251 HEVLNGLLDR PDWEEAVKMP VGILPCGSGN ALAGAVNQHG GFEPALGLDL 
301 LLNCSLLLCR GGGHPLDLLS VTLASGSRCF SFLSVAWGFV SDVDIQSERF 
351 RALGSARFTL GTVLGLATLH TYRGRLSYLP ATVEPASPTP AHSLPRAKSE 
401 LTLTPDPAPP MAHSPLHRSV SDLPLPLPQP ALASPGSPEP LPILSLNGGG 
451 PELAGDWGGA GDAPLSPDPL LSSPPGSPKA ALHSPVSEGA PVIPPSSGLP 
501 LPTPDARVGA STCGPPDHLL PPLGTPLPPD MVTLEGDFVL MLAISPSHLG 
551 ADLVAAPHAR FDDGLVHLCW VRSGISRAAL LRLFLAMERG SHFSLGCPQL 
601 GYAAARAFRL EPLTPRGVLT VDGEQVEYGP LQAQMHPGIG TLLTGPPGCP 
651 GREP 


ORF from 270 bp to 2231 bp; peptide length: 654 
Category: similarity to known protein 


BLASTP hits 
Entry SPAC4A8_7 from database TREMBL: 

gene: "SPAC4A8 .07c"; product: "hypothetical protein"; S.pombe 
chromosome I cosmid c4A8. 

Score - 301, P = 7.9e-32, identities = 68/190, positives = 109/190 

Entry CEC34C6_3 from database TREMBLNEW: 

product: "C34C6.5"; Caenorhabditis elegans cosmid C34C6 

>TREMBL:CEC34C6_3 product: -C34C6.5-; Caenorhabditis elegans cosmid 

score - 273, P * 9.0e-29, identities = 78/265, positives - 142/265 
Entry S67059 from database PIR: 

^i?S5nf^j:S!L?f''^^^'' Y0R171C - yeast (Saccharomyces cerevisiae) 
>TREMBL:SC5 5021_9 gene: •'03615"; product: «03615p-; Saccharomyces 
cerevisiae cosmid pUOA1258 from chromosome 15R, >TREMBL:SCYOR170W 2 
S. cerevisiae chromosome XV reading frame ORF YORnOw 
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Score = 253, P 2.0e-25, identities = 70/234, positives « 116/234 
Entry S51398 from database PIR: 

hypothetical protein YLR2 60w - yeast (Saccharomyces cerevisiae) 
>TREMBL:SCL8 479^4 gene: ''YLR260W"; product: "Ylr260wp"; Saccharomyces 
cerevisiae chromosome XII cosmid 8479. 

Score = 251, p = l.Oe-24, identities = 62/198, positives « 103/198 


Alert BLASTP hits for DKFZphfbr2_82m6, frame 3 

TREMBL:AF068749_1 gene: "SPHKlb"; product: "sphingosine kinase"; Mus 

musculus sphingosine kinase (SPHKlb) loRNA, complete cds,, N = 2, Score 
« 615, P - 1.2e-92 

TREMBL:AF068748_1 gene: "SPHKla"; product; "sphingosine kinase"; Mus 

musculus sphingosine kinase (SPHKla) mRNA, partial cds., N =» 2, Score = 
616, P = 2e-92 

TREMBL:ATF18E5_16 gene: "FIBES . 1 60" / product: "putative protein"; 

Arabidopsis thaliana DNA chromosome 4, BAG clone F18E5 (ESSAII 
project), N - 2, Score ^ 370, P « 6.8e-33 


>TREMBL:AF068748_1 gene: "SPHKla"; product: "sphingosine kinase"; 
musculus sphingosine kinase (SPHKla) mRNA, partial cds. 
Length « 504 

HSPs: 


Mus 


Score = 616 (92.4 bits). Expect = 2.0e-92, Sum P(2) = 2.0e-92 
Identities = 128/260 (49%), Positives = 173/260 (66%) 


Query: 154 ATALTCLLRGLPLPGDGEITPDLLPRPPRLLLLVNPFGGRGLAWQWCKNHVLPMISEAGL 213 

A C L + E LLPRP R+L-t-L+NP GG+G AQ ++VP + EA + 

Sbjct: 110 APVAPCQREPRDLAMEPECPRGLLPRPCRVLVLLNPQGGKGKALQLFQSRVQPFLEEAEI 169 

Query: 214 SFMLIQTERQNHARELVQGLSLSEWDGIVTVSGDGLLHEVLNGLLDRPDWEEAVKMPVGI 273 

+F LI TER+NHARELV L WD + +SGDGL+HEV+NGL++RPDWE A++ P+ 
Sbjct: 170 TFKLILTERKNHARELVCAEELGHWDALAVMSGDGLMHEWNGLMERPDWETAIQKPLCS 229 

Query: 274 LPCGSGNALAGAVNQHGGFEPALGLDLLLNCSLLLCRGGGHPLDLLSVTLASGSRCFSFL 333 

LP GSGNALA +VN + G+E DLL+NC+LLLCR P++LLS+ ASG R +S L 

Sbjct: 230 LPGGSGNALAASVNHYAGYEQVTNEDLLINCTLLLCRRRLSPMNLLSLHTASGLRLYSVL 289 

Query: 334 SVAWGFVSDVDIQSERFRALGSARFTLGTVLGLATLHTYRGRLSYLPA-TVEPASPTPAH 392 

S++WGFV+DVD++SE++R LG RFT+GT LA+L Y+G+L+YLP TV AS PA 
Sbjct: 290 SLSWGFVADVDLESEKYRRLGEIRFTVGTFFRLASLRIYQGQLAYLPVGTV~ASKRPAS 347 

Query: 393 SL-PRAKSELTLTPOPAPPMAH 413 

+L + + L P P +H 
Sbjct: 348 TLVQKGPVDTHLVPLEEPVPSH 369 

Score « 324 (48.6 bits). Expect - 2.0e-92, Sum P(2) - 2.0e-92 
Identities = 72/160 (45%), Positives = 100/160 (62%) 


499 LPLPTPDARVGASTC GPPDHLLPPLGTPLPPDWVTL-EGDFVLMLAISPSHLGADLV 554 

LP+ T ++ AST GP D L PL P+P W + E DF+L+L + +HL ++L 
335 LPVGTVASKRPASTLVQKGPVDTHLVPLEEPVPSHWTWPEQDFLLVLVLLHTHLSSELF 394 


Query: 

Sbjct: 

Query: 555 AAPHARFDDGLVHLCWVRSGISRAALLRLFLAMERGSHFSLGCPQLGYAAARAFRLEPLT 614 

AAP R + G++HL +VR+G+SRAALLRLFLAM++G H L CP L + AFRLEP + 
Sbjct: 395 AAPMGRCEAGVMHLFYVRAGVSRAALLRLFLAMQKGKHMELDCPYLVHVPWAFRLEPRS 454 

Query: 615 PRGVLTVDGEQVEYGPLQAQMHPGIGTLLTGPPGCP-GRE 653 

RGV -t-VOGE + +Q Q+HP ++ G P GR+ 
Sbjct: 455 QRGVFSVDGELMVCEAVQC^VHPNYLWMVCGSRDAPSGRD 494 


Score = 37 (5.6 bits). Expect = 3.6e-62, Sum P(2) 
Identities » 8/20 (40%), Positives = 9/20 (45%) 

Query: 459 GAGDAPLSPDPLLSSPPGSP 478 

G+ DAP D PP P 

Sbjct: 485 GSRDAPSGRDSRRGPPPEEP 504 


3.6e-62 


Pedant information for DKFZphfbr2_82m6, frame 3 


Report for DKFZphfbr2_82m6. 3 
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[LENGTH] 654 

(MWJ . 69207.45 

[pI] 6.47 

[HOMOLJ TREMBL:AF068749_1 gene: "SPHKlb"; product: "sphingosine kinase"; Mus muscuius 

sphingosine kinase (SPHKlb) mRNA, complete cds. 2e-50 

[FUNCATJ 01.06.01 lipid, fatty-acid and sterol biosynthesis [S. cerevisiae, YLR260wl 
4e-20 

[PROSITE] AMIDATION 1 

I PROS I TE] CAMP_PHOSPHO_SITE \ 

IPROSITE] MYRISTYL 12 

[PROSITEl CK2_PHOSPHO_SITE 6 

(PROSITEj TYR_PHOSPHO SITE 1 

[PROSITE] GLYCOSAMINOGLYCAN 1 

(PROSITEl PKC_PHOSPHO_SITE 8 

(PROSITEl ASN_GLYCOSYLATIOM 1 

(KWJ AlphaBeta 

[KWl LOW_COMPLEXITY 20.18 % 

SEQ MNGHLEAEEQQDQRPDQELTGSWGHGPRSTLVRAKAMAPPPPPiAASTSLLHGEFGSYPA 


FRD 


. xxxxxxxxxxxxx . 


SEG 

PRD ccchhhhhhhhcccccceeecccccccceeehhhhhccccccceeeceeeeccccccccc 

S EQ RGPR FALTLTSOALH IQRLRPKPEARPRGGLVPLAEVSGCCTLRSRS PSDSAA YFC I YT Y 

SEG 

PRD cccceeehhhhhhhhhhhhhccccccccccceeeeeeeceeeeeecccccceeeeeeeec 

SEQ PRGRRGARRRATRTFRADGAATYEENRAEAQRWATALTCLLRGLPLPGDGEITPDLLPRP 

SEG , xxxxxxxxxxxxxxxxxxxxx xxxxx 

PRD ccccchhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhhccccccccccccccccccc 

SEQ PRLLLLVNPFGGRGLAWQWCKNHVLPMISEAGLSFNLIQTERQNHARELVQGLSLSEWDG 

SEG xxxxxx 

PRD ceeeeeeecccccchhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhhh^ 

SEG ^^^SGDGLLHEVLKGLLDRPDWEEAVKMPVGILPCGSGNALAGAVMQHGGFEPALGLDL 

PRD eeeecccccceeeccccccccchhhhhccceeeccccccccccccccccccccchhhhhh 

SEQ LLNCSLLLCRGGGHPLDLLSVTLASGSRCFSFLSVAWGFVSDVDIQSERFRALGSARFTL 

SEG xxxxxxxxxxxxx 

PRD hhhhhhccccccccccceeeeeeccccceeeeeeeeccccceeeehhhhhhhhhhhhhhc 

SEQ GTVLGLATLHTYRGRLSYLPATVEPAS PTPAHSLPRAKSELTLTPDPAPPMAHSPLHRSV 

SEG 

PRD hhhhhhhhhhhhcccccccccccccccccccccccccccccccccccccccccccccccc 


SEQ SDLPLPLPQPALASPGSPEPLPILSLNGGGPELAGDWGGAGDAPLSPDPLLSSPPGSPKA 

SEG . . xxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccceeeeeccccccccccccccccccccccccccccccccce 

SEQ ALHSPVSEGAPVIPPSSGLPLPTPDARVGASTCGPPDHLLPPLGTPLPPDWVTLEGDFVL 

SEG XX xxxxxxxxxxxxxxx 

eeccccccccccccccccccccccccccccccccccccccccccccccccccccccccee 


S EQ MLAI S PSHLGADLVAAPHARFDDGLVHLCWVRSGI SRAALLRLFLAMERGSHFSLGC PQL 

SEG , 

PRD eeeeecccccccccccccccccccceeeeeeeccchhhhhhhhhhhhhcccceeecccch 

SEQ GYAAARAFRLEPLTPRGVLTVDGEQVEYGPLQAQMHPGIGTLLTGPPGCPGREP 

xxxxxxxxxxxxxxx . . . 

PRD hhhhhhhhhhccccccceeeeccceeecccccccccccccceeecccccccccc 


Prosite for DKF2phfbr2_82m6.3 


PSOOOOl 

303- 

->307 

PS00002 

245- 

->249 

PS00004 

129->133 

PSOQ005 

102- 

->105 

PS00005 

134- 

->137 

PS00005 

220- 

->223 

PS00005 

347- 

->350 

PS00005 

355- 

->358 

PS00005 

371- 

->374 

PS00005 

477- 

->480 

PS00005 

614- 

->617 

PS00006 

107- 

->111 


ASNGLYCOSYLATION 
GLYCOSAMINOGLYCAN 
CAMP_PHOS PHO_SITE 
PKC_PH0SPHO SITE 
PKCPHOSPHO^SITE 
PKC_PHOSPHO SITE 
PKC_PHOSPHO~SITE 
PKC_PHOSPHO SITE 
PKC_PHOSPHO"SITE 
PKC_PH0SPHO_SITE 
PKC_PHOSPHO_S I TE 
CK2 PHOSPHO SITE 


PDOCOGOOl 
PDOC00002 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
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PS00006 

142->146 

CK2 PHOSPHO 

SITE 

PDOC00006 

PS00006 

234->238 

CK2~PHOSPHO' 

SITE 

PQQQQ0006 

PS00006 

236->240 

CK2~PH0SPH0~ 

SITE 

PDOC00006 

PS00006 

341->345 

CK2~PH0SPH0" 

SITE 

PDOC00006 

PS00006 

419->423 

C K 2 *" P HOS PHO~ 

SITE 

PDOC00006 

PS00007 

106->115 

TyR~PHOSPHO' 

SITE 

PDOC00007 

PS00008 

56->62 

MYRISTYL 


PDOC00008 

PS00008 

212->218 

MYRISTYL 


POOC00008 

PS00008 

232->238 

MYRISTYL 


PDOC00008 

PSO0OO8 

272->278 

MYRISTYL 


PDOC00008 

PS00008 

277->283 

MYRISTYL 


PDOC00008 

PS00008 

279->285 

MYRISTYL 


PDOC00008 

PS00008 

361'>367 

MYRISTYL 


PDOC00008 

PS00008 

476->482 

MYRISTYL 


PDOC00008 

PS00008 

509->515 

MYRISTYL 


PDOC00008 

PS00008 

574->580 

MYRISTYL 


PDOC00008 

PS00008 

590->596 

MYRISTYL 


PDOC00008 

PSO0O08 

640->646 

MYRISTYL 


PDOC00008 

PS00009 

122->126 

AMIDATION 


PDOC00009 


(No Pfam data available for DKFZphfbr2_82iD6.3) 
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DKrZphfkd2_lj9 


group: kidney derived 

XL^lf p^otiin^*^ encodes a novel 105 amino acid protein with high similarity to Xenopus laevis 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

gene^*' Protein can find application in studying the expression profile of kidney-specific 

strong similarity to XLCL2 protein, African clawed frog 

complete cDNA, complete cds, EST hits 

Sequenced by LMU 

Locus : unknown 

insert length: 2955 bp 

Poly A stretch at pos. 2935, polyadenylation signal at pos. 2915 

1 GGGGGGGGCT GAGTGCTCAG TGGAGAGCGG GGAGTTGTGT CCACCTTGCC 
51 GACGTCGCTA GCCGTGGGGC TGTCCTGGGA AGGCGGACGG CGAGCGCCCG 
101 GTGTCCGCAC TCGGCCGCCT GCCGTGCCCG TCTGCGCCCG TGTCATCCTC 
151 ACTCGGGACG CAGGGACCGT TTTTAAATCA CAGGGGCGTG TGTCAGCCTG 
201 CCCTAGGACT TCATGTCTAT ATATTTCCCC ATTCACTGCC CCGACTATCT 
251 GAGATCGGCC AAGATGACTG AGGTGATGAT GAACACCCAG CCCATGGAGG 
301 AGATCGGCCT CAGCCCCCGC AAGGATGGCC TTTCCTACCA GATCTTCCCA 
351 GACCCGTCAG ATTTTGACCG CCGCTGCAAA CTGAAGGACC GTCTGCCCTC 
401 CATAGTGGTG GAACCCACAG AAGGGGAGGT GGAGAGCGGG GAGCTCCGGT 
451 GGCCCCCTGA GGAGTTCCTG GTCCAGGAGG ATGAGCAAGA TAACTGCGAA 
501 GAGACAGCGA AAGAAAATAA AGAGCAGTAG AGTCCCTGTG GACTCCCATG 
551 GGTCATACCA GCCAGCATCT GTTCCTGAAC TGTGTTTTTC CCATCATGAC 
601 GGAAGAAGAG AGTGAGCCGC AATTGTTCTG AAAATGTCAA ACGAGGCTTC 
651 TGTTTTGCAC CTGCAGATCA CCGAGTTGGT TTTCTTTTCT TTTCTTGCCT 
701 TTTTTTTTTT TTTGAAATTT GCCGAGCAGT GGAGCCCTCT GACAATTTGC 
751 AAGGCCCTCT GAGAAAGGAA GCTGCTTAGA GCCAGGGGGT TAGTGGGTGA 
801 GGGGAGCGAG TGCTGTTTTT GAGATCATTA TCTGAACTCA GGCAGCCTAG 
851 TAGAGGCAGT GGTGGGATTC CAATGGGTCT TGGTGGGTGG GAGGTGGGGC 
901 ATGTGCAAAG CAAGCAAGGA ACATTTGGGG TAAGAAAACA AACATGAGGC 
951 AAAAGAAAAA ATACATGTTT TTAAGAAAAC ATTGAGCAGA GAACTGCAGC 
1001 CAGGATGCGC TCAGCAGACA TTCACTCTGG CCGCTGGGAC ATCAGAAAAC 
1051 AAAGTCTTCA TCTCTCTCTC CAGTTTCACC CACCCCACCC TTTGCTTTCA 
1101 TTTCAGGTGT GTTGGTCTAT ATGACAGGGA GGAGAGTAAA GGAGAGCAGG 
1151 AGCAATTGGC TGCCTGCAAA GCCAGCTGGA GGTGAAGTGC AGGAAAGGAA 
1201 AGGTCACCCC ATTCTACTCC ATGGCCTCTC TGCTCCCAGC TGTGGTAGGC 
1251 TCACATAGCC AGTGTGATCG GTTTTTAAGA GGCAGTGCTT TTCAGCTTTT 
1301 CTCCCTGATA TATCCATTTT GCTTCCCAGC ACTTTTTAGG AGTAGTGAGA 
1351 GCACTTCCTG CCCTTGTTGG AAGCCCCAGG GTGGACACTC AGCACGAAGG 
1401 TCTCTCCCTT AACTGCTGCC CTTCCAAGAC TTGCTCCCGA GATGGAGTGG 
1451 GCGTGGTCTT CCAGGCTGGC CCTTCCTTCT CCTCACCGCC ACCTTCCCTG 
1501 CCCCAGCCCC AGCAGCCATG GGTACATGGG TCCCCAGCTC ACCTATGGAT 
1551 TCCCGCCAGT CTGCCCAGCT GCAGTACTCA CGCCCCATGG GGGATCTTGG 
1601 TGTGTTTTTC TTGTGGGAGC CTAGTGGAGA GCAGACGTGG CTTTTTATGT 
1651 GTCTTGTTGG GGAGGTGACT TGCATGGTGG GGACAAGGCT GTCGTGGCAA 
1701 CCTTGGGATC GAGTTTGAGA CTAAAGGATG TCATGAGATC CCTGGCTTCT 
1751 CCCCATGTTG TTCCCGGACA AGGGCAGAAG GGAGGCATGG CAAGGGACCT 
1801 CTGCTGTCCT TACTCAACAG TGGTCCTCAT CCCTCCCCAC CTCCCACTGC 
1851 TTCCTGCAAG GGCACCAGTT GTATGAGAAA GTTGGCCTTT GGACTTAGGA 
1901 TTTCTTATTG TAGCTAAGAG CCATCTGAAG CAGCAGGTTG CAGGACAAAT 
1951 GCTTCAGTCC GCCGAGAGCA GTACCGTGTG GCCAAGAGGT GGACTCAGAG 
2001 CCTTCCTTGA GCTAAACTCG GCCAACCAAG GCACGCAGCA TGTCCCCTCA 
2051 GGTCTCCAGT CAGTCCAGGT TGACCCTCAG TTCTGGACGT GTGTATATAG 
2101 CTGTATTTAA TACCTCAAGG TCATTGTGGC TCTGGGGATG CCAGGGCAGG 
2151 AGGACGAGGG TGCGCTGTGG ACACAGCAGT CCGCGGAATT CCGTTCTGGG 
2201 AAGCCAATGG TCGCCGGCAC CCCTTGCTTC CTCCCTCTGT TGTCTGCCTG 
2251 TGTGACACAC ATCAATGGCA ATAACTTCTT CCAACTCCTC GCAGAAGTGG 
2301 GAGAGGCCGG CAGCCTGCAC CGAGAGGGGC TTTCCTCTCT CTTGCTCCCC 
2351 GCTTCGTTCT GTTTTGGCTG CAGAGAGTGG TTCATCCATA CTCTCATTCC 
24 01 CTCGCCTCCC CTTGTGGACG GGGGTCTTGC CTTTTCAATT CCTGTGTTTT 
2451 GGTGTCTTCC CTTATCTGCT ACCCTGAATC ACCTGTCCTG GTCTTGCTGT 
2501 GTGATGGGAA CATGCTTGTA AACTGCGTAA CAAATCTACT TTGTGTATGT 
2551 GTCTGTTTAT GGGGGTGGTT TATTATTTTT GCTGGTCCCT AGACCACTTT 
2601 GTATGACCGT TTGCAGTCTG AGCAGGCCAG GGGCTGACAG CTAATGTCAG 
2651 GACCCTCAGC GGTGGAGCCT GCTGGGGGGA CCCAGCTGCT CTTGGACAAG 
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2701 TGGCTGAGCT CCTATCTGGC CTCCTCTTTT txTTTTTTTT CAAGTAATTT 
2751 GTGTGTATTT CTAACTGATT GTATTGAAAA AATTCCTAGT ATTTCAGTAA 
2801 AAATGCCTGT TGTGAGATGA ACCTCCTGTA ACTTCTATCT GTTCTTTTTT 
2851 GAGGCTCAGG GAGAAACTAG CATTTTTTTT TTTCCAAACT ACTTTTTGTC 
2901 ACTGTGACAG TTGTAAATAA AGTTTGAAAA TGCTCAAAAA AAAAAAAAAA 
2951 AAAAC 


BLAST Results 


Entry HSG19750 from database EMBL: 
human STS. A001X24. 
Score = 1050, P = l-9e-39, identities = 212/213 

Entry HSG20267 from database CMBL: 
human STS A005C12. 
Score = 610, P - 4.1e-19, identities « 122/122 


Medline entries 


No Medline entry 


Peptide information for frame 3 


ORF from 213 bp to 527 bp; peptide length: 105 
Category: strong similarity to known protein 
Classification: unset 


1 MSIYFPIHCP DYLRSAKMTE VMMNTQPMEE IGLSPRKDGL SYQIFPDPSD 
51 FDRRCKLKDR LPSIVVEPTE GEVESGELRW PPEEFLVQED EQDNCEETAK 
101 ENKEQ 

BLASTP hits 


No BLASTP hits available 


Alert BLASTP hits for DKFZphf kd2_l j 9, 

PIR: 552241 XLCL2 protein - African clawed frog, N 
8e-42 


PIR:S52241 XI.CL2 protein - African clawed frog, N 
8.2e-42 


frame 3 
- 1, Score = 443, P = 

« 1, Score =443, P = 


>PIR:S52241 XLCL2 protein - African clawed frog 
Length 102 


HSPs: 


Score = 443 (66.5 bits). Expect = 8.0e-42, p ~ 8.0e-42 
Identities = 80/104 (76%), Positives * 95/104 (91%) 


Query: 1 MSIYFPIHCPDYLRSAKMTEVMMNTQPMEEIGLSPRKDGLSYQIFPDPSDFDRRCKLKDR 60 

MS+++PIHC DYLRSA+MTEV+MNTQ M+EIGLSPRKD SYQIFPDPSDF+R CKLKDR 
Sbjct: 1 MSVFYPIHCTDYLRSAEMTEVIMNTQSMDEIGLSPRKD— SYQIFPOPSDFERCCKLKDR 58 

Query: 61 LPSIVVEPTEGEVESGELRWPPEEFLVQEDEQDNCEETAKENKE 104 

LPSIWEPTEG+VESGELRWPPEEF+V ED++ C ++T KEN++ 
Sbjct: 59 LPSIWEPTEGDVESGELRWPPEEFVVDEDKEGTCDQTKKENEQ 102 

Pedant information for DKFZphf kd2_l j9, frame 3 


Report for DKFZphf kd2_lj9 . 3 


(LENGTH! 105 

[MW] 12269.78 

(pll 4.40 

(HOMOLJ PTR:S52241 XLCL2 protein - African clawed frog 5e-44 
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[KW) Alpha^Beta 

SEQ MSIYFPIHCPDYLRSAKMTEVMMNTQPMEEIGLSPRKDGLSYQIFPDPSDFDRRCKLKDR 
PRD cccccccccccchhhhhhhhhhhhcccccccccccccccceeeecccccccchhhhhhhc 

SEQ LPSIWEPTEGEVESGELRWPPEEFLVQEDEQDNCEETAKENKEQ 
PRD ccceeeecccccccccccccccccceeeccccchhhhhhhhhccc 

(No Prosite data available for DKFZphfkd2_lj9.3) 
{No Pfam data available for DKFZphf kd2_lj9.3) 
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DKFZphfkd2_24al5 


group: transmembrane protein 

DKFZphf Icd2_24al5 encodes a novel amino acid protein with similarity to C. elegans cosmid 
R07G3 . 

The novel protein contains 1 transmembrane region. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of kidney-specific 
genes and as a new marker for kidney cells. 


similarity to C. elegans R07G3.8 
membrane regions : 1 

Summary DKFZphf kd2_24al5 encodes a novel 323 amino acid protein, with 
similarity to C. elegans R07G3.8. 


similarity to C. elegans R07G3.8 

complete cDNA, complete cds, EST hits 

Sequenced by GBF 

Locus: unknown 

Insert length: 1513 bp 

Poly A stretch at pos. 1494, no polyadenylation signal found 

1 GGGGTACTCG GCGGCGGCGG AGCGGGCGGC AGAGCAGGGC GGCGGCGACT 
51 CGCAGGGTAC CACCATCTTA AGGACAGAAA AGCTACAGGA CTCTAGGAGG 
101 CCACCGTCCT GATTTGGGAA GTCCAACTTA CTTTGGCCAG ACAGCAGCTA 
151 AGCTGGTTCA TCCCATCAGC CTGGATTGGT GAAACTGAAT CACAGGAGAT 
201 ATTTCCAGGT TTGCTGGGAT GGGAAACCTG CTCAAAGTCC TTACCAGGGA 
251 AATTGAAAAC TATCCACACT TTTTCCTGGA TTTTGAAAAT GCTCAGCCTA 
301 CAGAAGGAGA GAGAGAAATC TGGAACCAGA tCAGCGCCGT CCTTCAGGAT 
351 TCTGAGAGCA TCCTTGCAGA CCTGCAGGCT TACAAAGGCG CAGGCCCAGA 
401 GATCCGAGAT GCAATTCAAA ATCCCAATGA CATTCAGCTT CAAGAAAAAG 
4 51 CTTGGAATGC GGTGTGCCCT CTTGTTGTGA GGCTAAAGAG ATTTTACGAG 
501 TTTTCCATTA GACTAGAAAA AGCTCTTCAG AGTTTATTGG AATCTCTGAC 
551 TTGTCCACCC TACACACCAA CCCAACACCT GGAAAGGGAA CAGGCCCTGG 
601 CAAAGGAGTT TGCCGAAATT TTACATTTTA CCCTTCGATT CGATGAGCTG 
651 AAGATGAGGA ACCCGGCTAT TCAGAATGAC TTCAGCTACT ACAGAAGAAC 
701 AATCAGTCGC AACCGCATCA ACAACATGCA CCTAGACATT GAGAATGAAG 
751 TCAATAATGA GATGGCCAAT CGAATGTCCC TCTTCTATGC AGAAGCCACG 
801 CCAATGCTGA AAACCCTTAG CAATGCCACA ATGCACTTTG TCTCTGAAAA 
851 CAAAACTCTG CCAATAGAGA ACACCACAGA CTGCCTCAGC ACAATGACAA 
901 GTGTCTGTAA AGTCATGCTG GAAACTCCGG AGTACAGAAG TAGGTTTACG 
951 AGTGAAGAGA CCCTGATGTT CTGCATGAGG GTGATGGTGG GAGTCATCAT 
1001 CCTCTATGAC CATGTCCACC CTGTGGGAGC TTTCTGCAAG ACATCCAAGA 
1051 TCGATATGAA AGGCTGCATA AAAGTTTTGA AGGAGCAGGC CCCAGACAGT 
1101 GTGGAGGGGC TGCTAAATGC CCTCAGGTTC ACTACAAAGC ACTTGAACGA 
1151 TGAATCAACT TCCAAACAGA TTCGAGCAAT GCTTCAGTAG AGCTCTGCTC 
1201 AAAGAAGAGG ATCTATGTGC TGACCTCAGA AGATGTATAT GTTTACATAA 
1251 TTTAATACAG ATTGATGTTA ATACTTGTGT ATTTACATAA CCGTTTCCTT 
1301 CTTGTCACTG AAATATATGG ACCTTAATTT GTATCCTGAC TGACTCAACC 
1351 CAGCAGAGCA TAAATTGACT TGAGAGCCTT ACCTTTGATG TCTGAAATGA 
1401 AACCCCCTTC TCCAAAGGCA AAATTCGGAG ACTTTGATCT TTGCTACTGG 
1451 AGTCCTTTAA CAACATCTAT AACGATAAAA AATTCCTAAT TGTCAAAAAA 
1501 AAAAAAAAAA AAA 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 3 
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ORF from 219 bp to 1187 bp; peptide length: 323 
Category: similarity to unknown protein 


1 MGNLLKVLTR EIENYPHFFL DFENAQPTEG EREIWNQISA VLQDSESILA 

51 DLQAYKGAGP EIRDAIQNPN DIQLQEKAWN AVCPLVVRLK RFYEFSIRLE 

101 KALQSLLESL TCPPYTPTQH LEREQALAKE FAEILHFTLR FDELKMRNPA 

151 IQNDFSYYRR TISRNRINNM HLDIENEVNN EMANRMSLFY AEATPMLKTL 

201 SNATMHFVSE NKTLPIENTT DCLSTMTSVC KVMLETPEYR SRFTSEETLM 

251 FCMRVMVGVI ILYDHVHPVG AFCKTSKIDM KGCIKVLKEQ APDSVEGLLN 

301 ALRFTTKHLN DESTSKQIRA MLQ 

BLASTP hits 
Entry CER07G3_7 from database TREMBL: 

gene: "R07G3.8"; Caenorhabditis elegans cosmid R07G3. 

Score - 544, P •= 1.4e-52, identities = 119/323, positives - 186/323 


Alert BLASTP hits for DKFZphf kd2_24al5, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphf kd2_24al5, frame 3 


Report for DKFZphf kd2_24al5 . 3 


[LENGTH] 323 

IMW] 37313.06 

[pU 5.71 

[HOMOLl TREMBL :CER07G3_7 gene: 

tPROSITEl MYRISTYL 1 

[PROSITEJ CK2_PH0SPH0_SITE 

[PROS IT E] TYR_PH0SPHO_SITE 

[PROSITE] PKC_PHOSPHO_SITE 

[PROSITE] ASN_GLYCOSYLATION 

[KWJ TRANSMEMBRANE 1 


-R07G3.8"; Caenorhabditis eXegans cosmid R07G3. 4e-54 


SEQ 
PRD 
MEM 


MGNLLKVLTREIENYPHFFLDFENAQPTEGEREIWNQISAVLQDSESILADLQAYKGAGP 
ccccchhhhhhhhcccceeecccccccchhhhhhhhhhhhhhhcchhhhhhhhhhccccc 


SEQ 
PRD 
MEM 

SEQ 
PRD 
MEM 

SEQ 
PRD 
MEM 

SEQ 
PRD 
MEM 


EI RDAIQNPNDIQLQEKAWNAVCPLVVRLKRFYEFS I RLEKALQSLLESLTCPPYTPTQH 
hhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccchhh 


LEREQALAKEFAEILHFTLRFDELKMRNPAIQNDFSYYRRTISRNRINNMHLDIENEVNN 
hhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccchhhhhhccchhhhhhhhhhhhhhhh 


EMANRMSLFYAEATPMLKTLSNATMHFVSENKTLPIENTTDCLSTMTSVCECVMLETPEYR 
hhhhhhhhhhhhccchhhhhhhhceeecccccccccccccceeeeehhhhhhhhcccccc 

SRFTSEETLMFCMRVMVGVI I LYDHVHPVGAFCKTSKIDMKGCI KVLKEQAPDSVEGLLN 
cccccchhhhhhhhhhhheeeeeeeccccccccccccccchhhhhhhhhccccchhhhhh 
MMMMMMMMMMMMMMMMMMMMM 


SEQ 
PRD 
MEM 


ALRFTTKHLNDESTSKQI RAMLQ 
hhhhhhcccccccchhhhhhccc 


Prosite for DKFZphfkd2_24al5.3 


PSOOOOl 
PSOOOOl 
PSOOOOl 
PS00005 
PS00005 
PS00005 
PS00005 


202->206 
211->215 
218->222 
96->99 
138->141 
275->278 
3O5->308 


ASN_GLYCOSyLATI ON 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_S ITE 

PKC PHOSPHO SITE 


PDOCOOOOl 
PDOCOOOOl 
PDOCOOOOl 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC000O5 
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PS00005 314->317 PKC_PHOSPH0_SITE PDOC00005 

PS00006 28->32 CK2_PH0SPH0_SITE PDCX:00006 

PSOO0O6 105->109 CK2 PHOSPHO_SITE PDOC00006 

PS00006 24A->ZA8 CK2*'PH0SPH0_SITE PDOC0O006 

PSOO0O6 276->280 CK22pH0SPH0 SITE PDOC00006 

PS00007 231->240 T Y R_ PHOSPHORS I TE PDOC0O0O7 

PS00008 297->303 MYRISTYL ~ PDOC00008 


(No Pfam data available for DKFZphf kd2_24al5.3) 
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DKFZphfkd2_24bl5 


group: metabolism 

DKFZphfkd2_24bl5 encodes a novel 612 amino acid protein with similarity to bacterial and yeast 
phosphoglucomutase and phosphomannomutases . 

The novel protein contains a phosphoserine signature typical for phosphoglucomutase (EC 
5-4.2.2) or phosphomannomutase (EC 5.4.2.8). Thus, the protein seems to be taking part in the 
conversion of hexose phosphates. , 

The new protein can find application in modulation of hexose metabolism pathways and as a new 
enzyme for biotechnologic production processes. 


similarity to phosphomannomutases 
complete cDNA, complete cds, EST hits 

potential start at bp 30 matches kozak consensus PyCNatgG, 
Sequenced by GBP 

Locus: map-"158.8 cR from top of Chr4 linkage group- 
Insert length: 2204 bp 

Poly A stretch at pos. 2186, no polyadenylation signal found 


1 GGGCTCTGCA GCGGTAGCAC AAGCTCAGCG ATGGCGGCTC CAGAAGGCAG 
51 CGGTCTAGGC GAGGACGCCC GGCTGGACCA GGAGACCGCC CAGTGGCTGC 
101 GCTGGGACAA GAATTCCTTA ACTTTGGAGG CAGTGAAACG ACTAATAGCA 
151 GAAGGTAATA AAGAAGAACT ACGAAAATGT TTTGGGGCCC GAATGGAGTT 
201 TGGGACAGCT GGCCTCCGAG CTGCTATGGG ACCTGGAATT TCTCGTATGA 
251 ATGACTTGAC CATCATCCAG ACTACACAGG GATTTTGCAG ATACCTGGAA 
301 AAACAATTCA GTGACTTAAA GCAGAAAGGC ATCGTGATCA GTTTTGACGC 
351 CCGAGCTCAT CCATCCAGTG GGGGTAGCAG CAGAAGGTTT GCCCGACTTG 
401 CTGCAACCAC ATTTATCAGT CAGGGGATTC CTGTGTACCT CTTTTCTGAT 
451 ATAACGCCAA CCCCCTTTGT GCCCTTCACA GTATCACATT TGAAACTTTG 
501 TGCTGGAATC ATGATAACTG CATCTCACAA TCCAAAGCAG GATAATGGTT 
551 ATAAGGTCTA TTGGGATAAT GGAGCTCAGA TCATTTCTCC TCACGATAAA 
601 GGGATTTCTC AAGCTATTGA AGAAAATCTA GAACCGTGGC CTCAAGCTTG 
651 GGACGATTCT TTAATTGATA GCAGTCCACT TCTCCACAAT CCGAGTGCTT 
701 CCATCAATAA TGACTACTTT GAAGACCTTA AAAAGTACTG TTTCCACAGG 
751 AGCGTGAACA GGGAGACAAA GGTGAAGTTT GTGCACACCT CTGTCCATGG 
801 GGTGGGTCAT AGCTTTGTGC AGTCAGCTTT CAAGGCTTTT GACCTTGTTC 
851 CTCCTGAGGC TGTTCCTGAA CAGAGAGATC CGGATCCTGA GTTTCCAACA 
901 GTGAAATACC CGAATCCCGA AGAGGGGAAA GGTGTCTTGA CTTTGTCTTT 
951 TGCTTTGGCT GACAAAACCA AGGCCAGAAT TGTTTTAGCT AACGACCCGG 
1001 ATGCTGATAG ACTTGCTGTG GCAGAAAAGC AAGACAGTGG TGAATGGAGG 
1051 GTGTTTTCAG GCAATGAGTT GGGGGCCCTC CTGGGCTGGT GGCTTTTTAC 
1101 ATCTTGGAAA GAGAAGAACC AGGATCGCAG TGCTCTCAAA GACACGTACA 
1151 TGTTGTCCAG CACCGTCTCC TCCAAAATCT TGCGGGCCAT TGCCTTAAAG 
1201 GAAGGTTTTC ATTTTGAGGA AACATTAACT GGCTTTAAGT GGATGGGAAA 
1251 CAGAGCCrAAA CAGCTAATAG ACCAGGGGAA AACTGTTTTA TTTGCATTTG 
1301 AAGAAGCTAT TGGATACATG TGCTGCCCTT TTGTTCTGGA CAAAGATGGA 
1351 GTCAGTGCCG CTGTCATAAG TGCAGAGTTG GCTAGCTTCC TAGCAACCAA 
1401 GAATTTGTCT TTGTCTCAGC AACTAAAGGC CATTTATGTG GAGTATGGCT 
1451 ACCATATTAC TA7VAGCTTCC TATTTTATCT GCCATGATCA AGAAACCATT 
1501 AAGAAATTAT TTGAAAACCT CAGAAACTAC GATGGAAAAA ATAATTATCC 
1551 AAAAGCTTGT GGCAAATTTG AAATTTCTGC CATTAGGGAC CTTACAACTG 
1601 GCTATGATGA TAGCCAACCT GATAAAAAAG CTGTTCTTCC CACTAGTAAA 
1651 AGCAGCCAAA TGATCACCTT CACCTTTGCT AATGGAGGCG TGGCCACCAT 
1701 GCGCACCAGT GGGACAGAGC CCAAAATCAA GTACTATGCA GAGCTGTGTG 
1751 CCCCACCTGG GAACAGTGAT CCTGAGCAGC TGAAGAAGGA ACTGAATGAA 
1801 CTGGTCAGTG CTATTGAAGA ACATTTTTTC CAGCCACAGA AGTACAATCT 
1851 GCAGCCAAAA GCAGACTAAA ATAGTCCAGC CTTGGGTATA CTTGCATTTA 
1901 CCTACAATTA AGCTGGGTTT AACTTGTTAA GCAATATTTT TAAGGGCCAA 
1951 ATGATTCAAA ACATCACAGG TATTTATGTG TTTTACAAAG ACCTACATTC 
2001 CTCATTGTTT CATGTTTGAC CTTTAAGGTG AAAAAAGAAA ATGGCCAAAC 
2051 CCAACAAACT AACATTCCTA CTAAAAAGTT GAGCTTGGAC ATATTTTGAA 
2101 TTTTTGTAAG TGAAGATTTT TAAACTGACT AACTTAAAAA AATAGATTGT 
2151 AATTGATGTG CCTTAATTTG CATAAATCAT AAATGTAAAA AAAAAAAAAA 
2201 AAAA 


BLAST Results 


Entry HS705145 from database EMBL: 
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human STS wi-6820. 
Score = 1261, P = 3.6e-52, identities = 253/254 


Medline entries 


No Medline entry 


Peptide information for frame 1 


ORF from 31 bp to 1B66 bp; peptide length: 612 
Category: strong similarity to known protein 


1 MAAPEGSGLG EDARLDQETA QWLRWDKNSL TLEAVKRLIA EGNKEELRKC 
51 FGARMEFGTA GLRAAMGPGI SRMNDLTIIQ TTQGEXTRYLE KQFSDLKQKG 
101 IVISFDARAH PSSGGSSRRF ARLAATTFIS QGIPVYLFSD ITPTPFVPFT 
151 VSHLKLCAGI MITASHNPKQ DNGYKVYWDN GAQIISPHDK GISQAIEENL 
201 EPWPOAWDDS LIDSSPLLHN PSASINNDYF EDLKKYCFHR SVNRETKVKF 
251 VHTSVHGVGH SFVQSAFKAF DLVPPEAVPE QRDPDPEFPT VKYPNPEEGK 
301 GVLTLSFALA DKTKARIVLA NDPDADRLAV AEKQDSGEWR VFSGNELGAL 
351 LGWWLFTSWK EKNQDRSALK DTYMLSSTVS SKILRAIALK EGFHFEETLT 
401 GFKWMGNRAK QLIDQGKTVL FAFEEAIGYM CCPFVLDKDG VSAAVISAEL 
4 51 ASFLATKNLS LSQQLKAIYV EYGYHITKAS YFICHDQETI KKLFENLRNY 
501 DGKNNYPKAC GKFEISAIRD LTTGYDDSQP DKKAVLPTSK SS<»1ITFTFA 
551 NGGVATMRTS GTEPKIKYYA EMIAPPGNSD PEQLKKELNE LVSAIEEHFF 
601 QPQKYNLQPK AD 

BLAST? hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfkd2_24bl5, frame 1 

TREMBL:CEY43F4B 5 gene: *y43F4B.5"; Caenorhabditis elegans cosmid 
Y43r4B, N = 1, Score = 1431, P = 1.6e-146 

TREMBL:SPCC1840_5 gene: "SPCC1840 . 05c"; product: "similarity to 
pnosphomannomutases"; S.pombe chromosome III cosmid cl840., N = l, 
score « 1210, P * 4.2e-123 

PIR:S54585 hypothetical protein YMR278w - yeast ( Saccharomyces 
cerevisiae), N = 1, Score = 1046, P = le-105 

PrR:A71299 probable phosphomannomutase (manB) - syphilis spirochete, N 
= 1, Score = 697, p = 9.7e-69 


>TREMBL:CEY43F4B_5 gene: "Y43F4B.5-; Caenorhabditis elegans cosmid Y43F4B 
Length « 595 

HSPs: 

Score " 1431 (214.7 bits). Expect = 1.6e-146, P = l-6e*146 
Identities « 285/598 (47%), Positives = 393/598 (65%) 

Query: 13 ARLDQETAQWLRWDKNSLTLEAVKRLIAEGNKEELRKCFGARMEFGTAGLRAAMGPGISR 72 

A+LD++ A WL WDKN +++L+ E N + L+ R+ FGTAG+R+ M G R 

Sbjct: 6 AKLDKQVADWLAWDKNDKNRNEIQKLVDEKNVDALKARMDTRLVFGTAGVRSPMQAGFGR 65 

Query: 73 MNDLTIIQTTQGFCRYLEKQFSDLKQKGIVISFDARAHPSSGGSSRRFARLAATTFISQG 132 

+NDLTIIQ T GF R++ + K G+ I FD R + SRRFA L+A F+ 

Sbjct: 66 LNDLTIIQITHGFARHMLNVYGQPKN-GVAIGFDGRYN SRRFAELSANVFVRNN 118 

Query: 133 IPVYLFSDITPTPFVPFTVSHLKLCAGIMITASHNPKQDNGYKVYWDNGAQIISPHDKGI 192 

XPVyLFS+++PTP V + L AG++ITASHNPK+DNGYK YW NGAQII PHD I 
Sbjct: 119 IPVYLFSEVSPTPWSWATIKLGCDAGLIITASHNPKEDNGYKAYWSNGAQIIGPHDTEI 178 

Query: 193 SQAIEENLEPWPQAWDDSLIDSSPLLHNPSASINNDYFEDLKKYCFHRSVNRETKVKFVH 252 

+ E +P + wo S + SSPL H+ 1+ YFE K F R +N T +KF + 
Sbjct: 179 VRIKEAEPQPRDEYWDLSELKSSPLFHSADWID-PYFEVEKSLNFTREINGSTPLKFTY 237 

Query: 253 TSVHGVGHSFVQSAFKAFDLVPPE--AVPEQRDPDPEFPTVKYPNPEEGKGVLTLSPALA 310 

++ HG+G+ + + F F +V EQ+DP+P+FPT+ +PNPEEG+ VLTL+ A 

Sbjct: 238 SAFHGIGYHYTKRMFAEFGFPASSFISVAEQQDPNPDFPTIPFPNPEEGRKVLTLAMETA 297 
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Query; 311 DKTKARIVLANDPDADRLAVAEKQDSGEWRVFSGNELGALLGWWLFTSWKEKNQDRSALK 370 

DK + ++LANDPDADR+ +AEKQ GEWRVF+GNE+GAL+ WW++T+W++ N + A K 
Sbjct: 298 DKNGSTVILANDPDADRI(»!AEKQKDGEWRVFTGNEMGALITWWIWTNWRKANPNADASK 357 

Query: 371 DTYMLSSTVSSKILRAIALKEGFHFEETLTGFKWMGNRAKQLIDQGKTVLFAFEEAIGYM 430 

Y+L+S VSS+I++ I A EOF E TLTGFKWMGNRA++L G V+ A+EE+IGYM 
Sbjct: 358 -VYILNSAVSSQIVKTIADAEGFKNETTLTGFKWMGNRAEELRADGNQVILAWEESIGYM 416 

Query: 431 CCP-FVLDKDGVSAAVISAELASFLATKNLSLSQQLKAIYVEYGYHITKASYFICHDQET 489 

P +DKDGVSAA + AE+A+FL + SL QL A+Y YG+H+ +++y++ E 
Sbjct: 417 —PGHTMDKDGVSAAAVFAEIAAFLHABGKSLQDQLYALYNRYGFHLVRSTYWMVPAPEV 474 

Query: 490 IKKLFENLRNYDGKNNYPKACGKFEISAIRDLTTGYDDSQPDKKAVLPTSKSSQMITFTF 549 

KKLF LR D K +P G+ E++++RDLT GYD+S+PD K VLP S SS+M+TF 
Sbjct: 475 TKKLFSTLRA-DLK— FPTKIGEAEVASVRDLTIGYDNSKPDNKPVLPLSTSSEMVTFFL 531 

Query: 550 ANGGVATMRTSGTEPKIKYYAELCAPPGNS— DPEQLKKELNELVSAIEEHFFQPQKYNL 607 

G V T+R SGTEPKIKYY EL PG + D E + E+++L + ■I-PQ++ L 

Sbjct: 532 KTGSVTTLRASGTEPKIKYYIELITAPGKTQNDLESVISEMDQLEKDWATLLRPQQFGL 591 

Query: 608 QPK 610 
P+ 

Sbjct: 592 IPR 594 


Pedant information for DKFZphf kd2_24bl5, frame 1 


Report for DKFZphf kd2_24bl5 , 1 


[LENGTH) 612 

IMW) 68311.58 

[pi] 6.28 

[HOMOLI TREMBL:CEY43F4B_5 gene: ''Y43F4B.5"; Caenorhabditis elegans cosraid Y43F4B le-157 

IFUNCAT] 01,05.01 carbohydrate utilization (S. cerevisiae, YMR278w) le-111 

(FUNCATJ g carbohydrate metabolism and transport [H. influenzae, HI0740) 3e-66 

[FUNCATJ c energy conversion [M. genitalium, MG053I 4e-50 

[FUNCATl m outer membrane and cell wall [H. influenzae, HI1463] 2e-04 

[BLOCKS] BL00607D cAMP phosphodiesterases class-II proteins 

I BLOCKS] BL00710 Phosphoglucomutase and phosphomannomutase phosphoserine signa 

[EC] 5.4.2.8 Phosphomannomutase 3e-56 

[EC] 5.4.2.2 Phosphoglucomutase le-09 

[PIRKW] isomerase 3e-56 

{PIRKW] intramolecular transferase 3e-56 

(SUPFAM] Methanobacterium thermoautotrophicum phosphomannomutase le-06 

[SUPFAM] probable phosphor ylating protein ureC 9e-06 

t PROS I TE ) PGM_PMM 1 

[PROSITEI MYRISTYL 10 

(PROSITE} LIPOCALIN 2 

[PROSITE] CK2_PH0SPH0_SITE 9 

t PROSITE] GLYCOSAMINOGLYCAN 1 

( PROSITE] PKC_PHOSPHO_SITE 8 

[PROSITE] ASN_GLYCOSYLATI0N 1 

(PFAMJ Phosphoglucomutase and phosphomannomutase phosphoserine 

IKW) Alpha^Beta 


SEQ MAAPEGSGLGEDARLDQETAQWLRWDKNSLTLEAVKRLIAEGNKEELRKCFGARMEFGTA 

PRD ccccccccccchhhhhhhhhhhhhhhccchhhhhhhhhhhhcchhhhhhhhhhhhccccc 

SEQ GLRAAMGPGISRMNDLTIIQTTQGFCRYLEKQFSDLKQKGIVISFDARAHPSSGGSSRRF 

PRD cccccccccccccceeeeeehhhhhhhhhhhhcccccceeeeeecccccccccccchhhh 

SEQ ARLAATTFISQGI PVYLFSDITPTPFVPFTVSHLKLCAGIMITASHNPKQDNGYKVYWDN 

PRD hhhhhhhhhhccceeeeeccccccccchhhhhhhcccceeeeeeccccccccceeeeecc 

SEQ GAQIISPHDKGISQAIEENLEPWPQAWDDSLIDSSPLLHNPSASINNDYFEDLKKYCFHR 

PRD ccccccccchhhhhhhhhhhhhhcccccccccccccccccccchhhhhhhhhhhhhhhcc 

SEQ SVNRETKVKFVHTSVHGVGHSFVQSAFKAFDLVPPEAVPEQRDPDPEFPTVKYPNPEEGK 

PRD ccccccceeeeeeeccccccchhhhhhhhhcccccccccccccccccccccccccccchh 

SEQ GVLTLS FALADKTKARI VLANDPDADRLAVAEKQDSGEWRVFSGNELGALLGWWLFTSWK 

PRD hhhhhhhhhhhhhcceeeeeccccccceeeeecccccceeeecccchhhhhhhhhhhhhh 

SEQ EKNQDRSALKDTYMLSSTVSSKILRAIALKEGFHFEETLTGFKWMGNRAKQLIDQGKTVL 

PRD hcccccccccceeeeeeeehhhhhhhhhhhcccceeeeeccccchhhhhhhhhhccceee 
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SEQ FAFEEAIGYMCCPFVLDKDGVSAAVISAELASFLATKNLSLSQQLKAIYVEyCYHITKAS 

PRD hhhhhccccccccccccccchhhhhhhhhhhhhhhhhccchhhhhhhhhhhhcccccccc 

SEQ YFICHDQETIKKLFENLRNYDGKNNYPKACGKFEISAIRDLTTGYDDSQPDKKAVLPTSK 

PRD eeeccchhhhhhhhhhhhhhhcccccccccchhhhhhhcccccccccccccccccccccc 

SEQ SSQMITFTFANGGVATMRTSGTEPKIKYYAELCAPPGNSDPEQLKKELNELVSAIEEHFF 

PRD ccceeeeeecccceeeeecccccccceeeeeeccccccchhhhhhhhhhhhhhhhhhhhh 

SEQ QPQKYNLQPKAD 

PRD cccccccccccc 


Prosite for DKFZphf kd2_24bl5, 1 


PSOOOOl 
PS00002 
PS000O5 
PS00005 
PS00005 
PS00005 
PSO00O5 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00008 
PS00008 
PS00008 
PS00008 
PSO00O8 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00710 
PS00213 
PSO0213 


458->462 
7->ll 

116- >119 

117- >120 
290->293 
358->361 
380->383 
489->492 
538->541 
556->559 
186->190 
210->214 

343- >347 
358->362 
523->527 
528->532 
560->564 
579->583 
593->597 

6->12 
61->67 
100->106 
159->165 
191->197 
257->263 

344- >350 
348->354 
440->446 
552->558 
159->174 
346->358 
344->358 


ASN_GLYCOSYLATION 

GL YCOSAMI NOGLYCAN 

PKC_PHOS PHOS I TE 

PKC_PHOSPH0_SITE 

PKC_PHOSPHO SITE 

PKC_PHOSPHO"SITE 

PKC_PHOSPHO SITE 

PKC_PHOS PHO^S ITE 

PKC_PHOSPHO_SITE 

PKC PHOSPHO SITE 

CK2~PH0S PHO'S ITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PHO$PHO_SITE 

C K2_PH0S PH0_S I TE 

CK2_PHOSPHO SITE 

CK2_PH0S PHO^SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

PGM PMM 

LIPOCALIN 

LIPOCALIN 


PDOCOOOOl 
PDOC00002 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOCO0O06 
PDOC00006 
PDOC00006 
PDOC00O06 
PDOC00006 
PDOC00006 

PDcx:oooo6 

PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00589 
PDOC00187 
PDOC00187 


Pfam for DKF2phfkd2_24bl5 . 1 


HMM_NAME Phosphoglucomutase and phosphomannomutase phosphoserine 

HMM *GvnVIdIGQNGMMPTPMIYFaIRTYKhxncmggGIMITaSHNPGGPDnDN 
G+ V + ++PTP + F + H+++ +GIMITASHNP ON 
Query 132 GIPVYLFS— DITPTPFVPFTVS HLKLCAGIMITASHNP— KQ-DN 172 

HMM GIK* 
G+K 

Query 173 GYK 175 
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DKFZphfkci2_24e23 


group: kidney derived 

DKFZphfkd2_24e23 encodes a novel 198 amino acid protein without similarity to 
known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of 
kidney-specific genes. 


unknown 

complete cDNA, complete cds, 1 EST hit, 
many ATGs in front of the ORF 

Sequenced by GBF 

Locus: unknown 

Insert length: 1723 bp 

Poly A stretch at pos. 1695, no polyadenylation signal found 


1 GGGGGATTTT CGATCATGAC AACGATAGCA ATTGATATAC CTTCAAAATA 
51 CGTGTCCAGT GAGTGTTGAT TGTGTGTGGT TTCTCTAGGA GACCGTGTTC 
101 ATGCAACACA GCATTATTTC ACCGCCTTTA CCCCAGCTTC TTCATACACA 
151 TGCACTTGTC AAGGGCTCTT TGGCTGAAGA GAAGTTAGAA GTTTCCAGAT 
201 ATGGAGGGGT ATTTTCAGCA GATATGCCCA CCGCCATGGT TTTGTCAGCT 
251 CTGTAGGGTG GTCTTGCACC CTGCTCACTG CTGGCATCAC CTGAGCCTAT 
301 GGCAGATACC CAGTGCTGCC CGCCACCATG TGAATTCATC AGCTCTGCAG 
351 GCACAGACCT TGCACTAGGA ATGGGCTGGG ACGCCACCCT CTGCCTCTTA 
4 01 CCATTCACTG GGTTTGGCAA GTGTGCTGGG ATCTGGAATC ACATGGATGA 
451 GGAACCCGAT AATGGTGACG ACCGAGGTAG CAGGCGAACC ACTGGCCAGG 
501 GCAGGAAGTG GGCAGCTCAC GGGACTATGG CTGCACCGCG GGTTCATACC 
551 GACTACCATC CTGGAGGTGG GAGCGCATGC TCATCTGTAA AAGTCCGGTC 
601 CCACGTTGGA CACACCGGGG TCTTCTTCTT TGTTGACCAG GATCCTCTGG 
651 CAGTGTCTTT AACAAGCCAG AGTCTGATCC CACCGCTCAT AAAGCCAGGG 
701 TTGTTGAAAG CTTGGGGCTT CCTCCTCCTC TGTGCGCAGC CCTCAGCAAA 
751 CGGTCACAGC CTGTGCTGTC TGCTGTACAC CGACTTGGTA TCATCCCATG 
801 AACTGTCCCC CTTTCGTGCT CTGTGCTTAG GGCCCTCTGA TGCCCCATCT 
851 GCCTGCGCTT CCTGCAACTG TTTAGCAAGC ACCTATTATC TATAGGGTGC 
901 TGGGGTGCTG GGCGAGGCCA ATCGCTCCTA TTACTTTCTG CCCTGGGGAC 
951 GTCCTGTTTT CCCACCTACC CCTGTAACGC CTCTGCTCTG CCTTCCCATC 
1001 TGCGGGCCTA ACGCCATCCC ACAAGGGCTG GGCTGTCCGT TCAGAAGAGA 
1051 AACTGGGAAG GGGCCTTGAG GACCTGTGTC CAGGCAGGGT GGACAAGGGC 
1101 TTTGTGCAGG GAGCTCCTCT CCCATCTTTG TGTCCTGACA GCCGTGACCG 
1151 TGACCCCTCA AAGCAGAGCC AGTAGTGATC AGTATCCTGC TGCTTCAAGC 
1201 CTGCACGGTC CTCTTCTCCT CTCCGCACAT CTGCATGCCT GTCAAACCCA 
1251 GAGTAGTTTG GGGCCTGGTA AACAGAGGGA AGTTGGCTGG AGGAGGCCAG 
1301 TCAGGAGTGC AAGAACCCCG CGTACTCTGT CCCACGTGGA TAAAGTCTCT 
1351 AATTCCAGTC TGAGGTGAAT TCTTAGAGAG TGCTTTCATT TAATGTTTGC 
1401 TTTATGCATT TCCCCTGCAG CTGTGACTAA TTGTGGAACA GCATACATTT 
1451 TGTTTTGAGA CTCTCTTGAG ATTTTTCTGG CAGTGTAAGG TCTACACCAT 
1501 TTTCCTCTCA GCATCAGAGA AGGCAGAAAG CAAGAGAAAG GAATGCAATG 
1551 TGAGCAAGGC CAGGCACACT TGTGCTACTG CAGTTGGCAA GAATGGAGTC 
1601 TAATCCCAGC ACTTTGGGAG GCCGAGGCGG GTGGATCACC TGAGGTCAGG 
1651 AATTTGAGAC CAACCTGGCC AACATGTTGA AACCTCGTCT GTACTAAAAA 
1701 TACAAAAAAA AAAAAAAAAA AAA 


BLAST Results 


No BLAST result 


Medline entries 


NO Medline entry 


Peptide information for frame 2 
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ORF from 299 bp to 892 bp; peptide length: 198 
Category: putative protein 


1 MADTQCCPPP CEFISSAGTD LALGMGWDAT LCLLPFTGFG KCAGIWNHMD 
51 EEPDNGDDRG SRRTTGQGRK WAAHGTMAAP RVHTDYHPGG GSACSSVKVR 
101 SHVGHTGVFF FVDQDPLAVS LTSQSLIPPL IKPGLLKAWG FLLLCAQP5A 
151 NGHSLCCLLY TDLVSSHELS PFRALCLGPS DAPSACASCN CLASTYYL 

BLASTP hits 

No BLASTP. hits available 

Alert BLASTP hits for DKFZphf ]cd2_24e23, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphf kd2_24e23, frame 2 


Report for DKFZphf kd2_24e23 .2 


(LENGTH) 

198 


[MW] 

20948.98 


[pll 

6.01 


fPROSITE) 

MYRISTYL 5 


(PROSITEJ 

AMIDATION 1 


(PROSITE] 

CAMP PHOSPHO SITE 

1 

[PROSITEJ 

CK2 PHOSPHO SITE 

1 

( PR0SITE3 

PKC PHOSPHO SITE 

2 

IKW} 

All Beta 


(KWJ 

LOW^COMPLEXITY 

6.06 


S £Q HADTOCC PPPCEFI SSAGTDLALGMGWDATLCLLPFTGFGKCAG IWNHMOEEPDNGDDRG 

SEG 

PRD ccccccccccccccccccccccccccccceeeeeccccccceeeeccccccccccccccc 

SEQ SRRTTGQGRKWAAHGTMAAPRVHTDYHPGGGSACSSVKVRSHVGHTGVFFFVDQDPLAVS 

SEG 

PRD cccccccccccccccccccceeeeecccccccccceeeeeeeccccceeeeeccccceee 

SEQ LTSQSLI PPLIKPGLLKAWGFLLLCAQPSANGHSLCCLLYTDLVSSHELS PFRALCLGPS 

SEG xxxxxxxxxxxx 

PRO eccccccccccccchhhhhhhhhhhccccccccceeeeeeeee ccccccccceeeecccc 

SEQ DAPSACASCNCLASTYYL 

SEG 

PRD cccccccccccccccccc 


Prosite for DKFZphf kd2_24e23 . 2 


PS00004 

62->66 

CAMP PHOSPHO SITE 

PDOC00004 

PS0O005 

61->64 

PKC PHOSPHO 

SITE 

PDOC00005 

PS00005 

96->99 

PKC~PHOSPHO' 

'site 

PDOC00005 

PS00006 

165->169 

CK2 PHOSPHO" 

"site 

PDOC00006 

PS00008 

18->24 

MYRISTYL 


PDOC00008 

PS00008 

60->66 

MYRISTYL 


PDOC00008 

PS00008 

89->95 

MYRISTYL 


PDOC00008 

PS00008 

91->97 

MYRISTYL 


PDOC00008 

PS00008 

134->140 

MYRISTYL 


PDOC00008 

PS0O0Q9 

67->71 

AMIDATION 


PDOC00009 


(No Pfam data available for DKFEphfkd2_24e23.2) 
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DKFZphfkd2_24n20 


group: intracellular transport and trafficking 

DKFZphfkd2_24n20. 3 encodes a novel 366 amino acid protein with similarity to human eps8 
binding protein e3Bl and spectrins. 

The new protein contains an Src homology domain 3 and is similar to human eps8 SH3 domain 
binding protein 1 (e3Bl) and spectrins. Eps8 is a substrate of receptor tyrosine kinases 
involved in mitogenic signaling. Spectrin is part of the submembrane cytoskeletal network in 
the human erythrocyte ghost. Nonerythroid spectrins are proposed to have roles in cell 
adhesion/ establishment of cell polarity, and attachment of other cytoskeletal structures to 
the plasma membrane. The new protein seems to be part of the signalling pathway between 
tyrosine kinases and the membrane/cyto skeleton. 

The new protein can find application in modulating cell adhesion/motility and membrane/cyto 
skeleton structure and dynamics. 


strong similarity to eps8 binding protein e3Bl 
complete cDNA, complete cds, few EST hits 

potential start at Bp 300, but there are ATGs in other frames in 
5' region of the cDNA 

Sequenced by GBF 

Locus : /map=" 17" 

Insert length: 1719 bp 

Poly A stretch at pos. 1699, polyadenylation signal at pos. 1680 


1 GGGGACAGCT GCCCCGACCT TGGCTTCCTC TGCTGGGTGG GATTGGGGGC 
51 TGGGCCCCCA AATGGGCCCC TGGCTTCCCC CTTCCTCTGG GCAGGGGACA 
101 GAGAGACACA GGCTCGGGGA GCAGGACTGA CTTCCTCTTG TCCCGGAATG 
151 AGCATGCCTG CCCTTTGCAA GCAGGTTTGG GTCTCACGCA GAGGAAACCA 
201 AAAGCAATAA GAGGGAGGGA AGGCAGAGCA ACCAATCAAG GGCAGGGTGA 
251 GACTCA7VAAC GAGCGGGCTC CCTGGGGAGC CAGACAGAGG CTGGGGGTGA 
301 TGGCGGAGCT ACAGCAGCTG CAGGAGTTTG AGATCCCCAC TGGCCGGGAG 
351 GCTCTGAGGG GCAACCACAG TGCCCTGCTG CGGGTCGCTG ACTACTGCGA 
401 GGACAACTAT GTGCAGGCCA CAGACAAGCA GAAGGCGCTG GAGGAGACCA 
451 TGGCCTTCAC TACCCAGGCA CTGGCCAGCG TGGCCTACCA GGTGGGCAAC 
501 CTGGCCGGGC ACACTCTGCG CATGTTGGAC CTGCAGGGGG CCGCCCTGCG 
551 GCAGGTGGAA GCCCGTGTAA GCACGCTGGG CCAGATGGTG AACATGCATA 
601 TGGAGAAGGT GGCCCGAAGG GAGATCGGCA CCTTAGCCAC TGTCCAGCGG 
651 CTGCCCCCCG GCCAGAAGGT CATCGCCCCA GAGAACCTAC CCCCTCTCAC 
701 GCCCTACTGC AGGAGACCCC TCAACTTTGG CTGCCTGGAC GACATTGGCC 
751 ATGGGATCAA GGACCTCAGC ACGCAGCTGT CAAGAACAGG CACCCTGTCT 
801 CGAAAGAGCA TCAAGGCCCC TGCCACACCC GCCTCCGCCA CCTTGGGGAG 
851 ACCGCCCCGG ATTCCCGAGC CAGTGCACCT GCCGGTGGTG CCCGACGGCA 
901 GACTCTCCGC CGCCTCCTCT GCGTCTTCCC TGGCCTCGGC CGGCAGCGCC 
951 GAAGGTGTCG GTGGGGCCCC CACGCCCAAG GGGCAGGCAG CACCTCCAGC 
1001 CCCACCTCTC CCCAGCTCCT TGGACCCACC TCCTCCACCA GCAGCCGTCG 
1051 AGGTGTTCCA GCGGCCTCCC ACGCTGGAGG AGTTGTCCCC ACCCCCACCG 
1101 GACGAAGAGC TGCCCCTGCC ACTGGACCTG CCTCCTCCTC CACCCCTGGA 
1151 TGGAGATGAA TTGGGGCTGC CTCCACCCCC ACCAGGATTT GGGCCTGATG 
1201 AGCCCAGCTG GGTGCCTGCC TCATACTTGG AGAAAGTGGT GACACTGTAC 
1251 CCATACACCA GCCAGAAGGA CAATGAGCTC TCCTTCTCTG AGGGCACTGT 
1301 CATCTGTGTC ACTCGCCGCT ACTCCGATGG CTGGTGCGAG GGCGTCAGCT 
1351 CGGAGGGGAC TGGATTCTTC CCTGGGAACT ATGTGGAGCC CAGCTGCTGA 
1401 CAGCCCAGGG CTCTCTGGGC AGCTGATGTC TGCACTGAGT GGGTTTCATG 
1451 AGCCCCAAGC CAAAACCAGC TCCAGTCACA GCTGGACTGG GTCTGCCCAC 
1501 CTCTTGGGCT GTGAGCTGTG TTCTGTCCTT CCTCCCATCG GAGGGAGAAG 
1551 GGGTCCTGGG GAGAGAGAAT TTATCCAGAG GCCTGCTGCA GATGGGGAAG 
1601 AGCTGGAAAC CAAGAAGTTT GTCAACAGAG GACCCCTACT CCATGCAGGA 
1651 CAGGGTCTCC TGCTGCAAGT CCCAACTTTG AATAAAACAG ATGATGTCCA 
1701 AAAAAAAAAA AAAAAAAAA 


BLAST Results 


Entry AC004797 from database EMBL: 

Homo sapiens chromosome 17, clone hRPC.62_0_9< complete sequence. 
Score » 2316, P = 5.9e-255, identities » 464/465 
7 exons Bp 93317-110902 
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Medline entries 


97163405: 

Isolation and characterization of e3Bl, an eps8 binding 
protein that regulates cell growth. 

98256293: 

Identification of a candidate human spectrin Src homology 3 
domain -binding protein suggests a general mechanism of 
association of tyrosine kinases with the spectrin-based 
membrane skeleton. 


Peptide information for frame 3 


ORF from 300 bp to 1397 bp; peptide length: 366 
Category: strong similarity to known protein 


1 MAELCXJLQEF EIPTGREALR GNHSALLRVA DYCEDNYVQA TDKQKALEET 
51 MAFTTQALAS VAYQVGNLAG HTLRMLDLQG AALRQVEARV STLGQMVNMH 
101 MEKVARREIG TLATVQRLPP GQKVIAPENL PPLTPYCRRP LNFGCLDDIG 
151 HGIKDLSTQL SRTGTLSRKS IKAPATPASA TLGRPPRIPE PVHLPWPDG 
201 RLSAASSASS LASAGSAEGV GGAPTPKGQA APPAPPLPSS LDPPPPPAAV 
251 EVFQRPPTLE ELSPPPPDEE LPLPLDLPPP PPLDGDELGL PPPPPGFGPD 
301 EPSWVPASYL EKVVTLYPYT SQKDNELSFS EGTVICVTRR YSDGWCEGVS 
351 SEGTGFFPGN YVEPSC 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_24n20, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphf kd2_24n20, frame 3 


Report for DKFZphf kd2_24n20 . 3 


[LENGTH] 366 

(MW) 38947.21 

(plj 4.93 

[HOMOL) TREMBL:U87166_1 gene: 


SSH3BP1"; product: "spectrin SH3 domain binding protein 


1"; Homo sapiens spectrin SH3 domain binding protein 1 (SSH3BP1) mRNA, complete cds. 3e-48 

[FUNCAT] 10.99 other signal-transduction activities (S. cerevisiae, YGR136wJ 9e-06 

[FUNCAT) 30.10 nuclear organization [S. cerevisiae, YGR136w) 9e-06 

[FUNCATl 99 unclassified proteins [S. cerevisiae, YPRi54wj 3e-05 

IFUNCATJ 30,04 organization of cytoskeleton (S. cerevisiae, YDR388wl 2e-04 

[FUNCAT) 03.04 budding, cell polarity and filament formation (S. cerevisiae, YDR388wl 
2e-04 

[FUNCAT] 06.10 assembly of protein complexes [S. cerevisiae, yDR162c) 4e-04 

[BLOCKS] BL50002B Src homology 3 (SH3) domain proteins profile 

[SUPFAM] SH3 homology 6e-17 

[PROSITEJ MYRISTYL 6 

(PROSITEJ CAMP_PHOSPHO_SITE 1 

[PROSITEl CK2_PHOSPH0_SITE 6 

(PROSITEJ PKC PHOSPH0_SITE 8 

[ PROS I TE I ASN~GLYCOS YLAT I ON 1 

[PFAM] Src homology domain 3 

[KWJ Irregular 

[KW) 3D 

[KW) LOW COMPLEXITY 24.04 % 


SEQ MAELCX)LQEFEIPTGREALRGNHSALLRVADYCEDNYVQATDKQKALEETMAFTTQALAS 

SEG 

laboA !!!!!!!!!! 

SEQ VAYQVGNLAGHTLRMLDLQGAALRQVEARVSTLGQMVNMHMEKVARREIGTLATVQRLPP 

SEG 

laboA 
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SEQ GQKVIAPENLPPLTPYCRRPLNFGCLDDIGHGIKDLSTQLSRTGTLSRKSIKAPATPASA 

SEG 

laboA ' ' 

SEQ TLGRPPRI PEPVHLPVVPDGRLSAASSASSLASAGSAEGVGGAPTPKGQAAPPAPPLPSS 

SEG xxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxx 

laboA 

SEQ LDPPPPPAAVEVFQRPPTLEELSPPPPDEELPLPLDLPPPPPLDGDELGLPPPPPGFGPD 

SEG xxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

laboA 

SEQ EPSWVPASYLEKVVTLYPYTSQKDNELSFSEGTVICVTRRYSDGWCEGVSSEGTGFFPGN 

SEG XX." 

1 aboA EECCCBCCCTTTBCCBTTTEEBEEEEETTTTEEEEEETTEEEEEEGG 

SEQ YVBPSC 

SEG 

laboA GEEE.. 


. Prosite for DKFZphf kd2_24n20. 3 


PSOOOOl 

22->26 

PS00004 

339->343 

PS00005 

14->17 

PS00005 

41->44 

PS00005 

72->75 

PS00005 

167->170 

PS00005 

170->173 

PS00005 

225->228 

PS00005 

321->324 

PS00005 

338->341 

PS00006 

14->18 

PS00006 

239->243 

PS00006 

258->262 

PS00006 

308->312 

PS00006 

321->325 

PS00006 

328->332 

PS00008 

21->27 

PS00008 

66->72 

PS00008 

94->100 

PS00008 

110->116 

PS00008 

215-'>221 

PS00008 

332->338 


ASN_GLYC0SYLATION 

CAMP_PHOS PHO_SI TE 

PKC PHOSPHO SITE 

PKC[^PHOSPHO]^SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

CK2_PH0S PHO~S ITE 

CK2_PH0SPH0_SITE 

CK2_PHOS PHO_S ITE 

CK2_PH0SPHO_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_S ITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 


PDOCOOOOl 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
POOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 


Pfam for DKFZphf kd2_24n20 .3 


HMM_NAME Src homology domain 3 

HMM ♦pyVI ALYDYqAqdpDELSFkEGDI I il lEdsDD . WWrgRnnnTNGQEGW 
++V+ LY+Y++Q ++ELSF EG +1 + + D W++G + +G+ 
Query 311 EKVVTLYPYTSQKDNELSFSEGTVICVTRRYSDGWCEGVSSE GTGF 

HMM IPSNYVEPi* 
+P NYVEP 

Query 357 FPGNYVEPS 365 
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DKFZphfkd2_24p5 


group: intracellular transport and trafficking 

DKFZphfkd2_24p5 encodes a novel 811 amino acid protein which is a novel splice variant of 
human ankyrin G. 

The ankyrin 3 gene encodes a novel ankyrin, which is expressed in multiple tissues, with ver 
high expression at the axonal initial segment and nodes of Ranvier of neurons in the central 
and peripheral nervous systems, Ankyrin G shows several tissue-specific alternative mRNA 
processing. The different ankyrin G proteins participate in maintenance/targeting of ion 
channels and cell adhesion molecules to nodes of Ranvier and axonal initial segments. 

The new protein can find application in modulating the structure and membrane topology of 
Ranvier nodes and other neuronal cell membranes. 


Human ankyrin G {ANK-3) new splice variant 
splice variant 

potential frame shift at 2720 was checked 
see BLASTX 

Sequenced by EMBL 

Locus: /map="10q21" 

Insert length: 3470 bp 

Poly A stretch at pos. 3459, no polyadenylation signal found 


1 AGCTTTAAAA GGATGTCTGC GAAGTGGTCA AAAGGATCTT AACCTCAATT 
51 AAGTGGGGTT TTTTAAAAAG ATTTTTTGGG GGGCCTGAAA TTTTGAAAAT 
101 CTTCGAACTC TGAGTGGGGA AAGATGTATA ATTCCTCAAT TGCCTACGAG 
151 GATATCAAGA TGCTGAGAGG AATTCAGCGG TGGTGAAGAG AGTGGATACA 
201 AACCAGGGAT TGGTTTCCTT GAGCTGTTTT GGAGGTTGAT TCTAAATCAC 
251 TGCTTAAGGA ATTCCTGGAA ACATCAGGAA AACATTTGAT CATCCAAGCC 
301 TAGTGGAAAT GGCTTTACCG CAGAGTGAAG ATGCAATGAC CGGGGACACA 
351 GACAAATATC TTGGGCCACA GGACCTTAAG GAATTGGGTG ATGATTCCCT 
401 GCCTGCAGAG GGTTACATGG GCTTTAGTCT CGGAGCGCGT TCTGCCAGCC 
451 TCCGCTCCTT CAGTTCGGAT GGGTCTTACA CCTTGAACAG AAGCTCCTAT 
501 GCACGGGACA GCATGATGAT TGAAGAACTC CTCGTGCCAT CCAAAGAGCA 
551 GCATCTAACA TTCACAAGGG AATTTGATTC AGATTCTCTT AGACATTACA 
601 GCTGGGCTGC AGACACCTTA GACAATGTCA ATCTTGTTCC AAGCCCCATT 
651 CATTCTGGGT TTCTGGTTAG CTTTATGGTG GACGCGAGAG GGGGCTCCAT 
701 GAGAGGAAGC CGTCATCACG GGATGAGAAT CATCATTCCT CCACGCAAGT 
751 GTACGGCCCC CACTCGAATC ACCTGCCGTT TGGTAAAGAG ACATAAACTG 
801 GCCAACCCAC CCCCCATGGT GGAAGGAGAG GGATTAGCCA GTAGGCTGGT 
851 AGAAATGGGT CCTGCAGGGG CACAATTTTT AGGCCCTGTC ATAGTGGAAA 
901 TCCCTCACTT TGGGTCCATG AGAGGAAAAG AGAGAGAACT CATTGTTCTT 
951 CGAAGTGAAA ATGGTGAAAC TTGGAAGGAG CATCAGTTTG ACAGCAAAAA 
1001 TGAAGATTTA ACCGAGTTAC TTAATGGCAT GGATGAAGAA CTTGATAGCC 
1051 CAGAAGAGTT AGGGAAAAAG CGTATCTGCA GGATTATCAC GAAAGATTTC 
1101 CCCCAGTATT TTGCAGTGGT TTCCCGGATT AAGCAGGAAA GCAACCAGAT 
1151 TGGTCCTGAA GGTGGAATTC TGAGCAGCAC CACAGTGCCC CTTGTTCAAG 
1201 CATCTTTCCC AGAGGGTGCC CTAACTAAAA GAATTCGAGT GGGCCTCCAG 
1251 GCCCAGCCTG TTCCAGATGA AATTGTGAAA AAGATCCTTG GAAACAAAGC 
1301 AACTTTTAGC CCAATTGTCA CTGTGGAACC AAGAAGACGG AAATTCCATA 
1351 AACCAATCAC AATGACCATT CCGGTGCCCC CGCCCTCAGG AGAAGGTGTA 
1401 TCCAATGGAT ACAAAGGGGA CACTACACCC AATCTGCGTC TTCTCTGTAG 
1451 CATTACAGGG GGCACTTCGC CTGCTCAGTG GGAAGACATC ACAGGAACAA 
1501 CTCCTTTGAC GTTTATAAAA GATTGTGTCT CCTTTACAAC CAATGTTTCA 
1551 GCCAGATTTT GGCTTGCAGA CTGCCATCAA GTTTTAGAAA CTGTGGGGTT 
1601 AGCCACGCAA CTGTACAGAG AATTGATATG TGTTCCATAT ATGGCCAAGT 
1651 TTGTTGTTTT TGCCAAAATG AATGATCCCG TAGAATCTTC CTTGCGATGT 
1701 TTCTGCATGA CAGATGACAA AGTGGACAAA ACTTTAGAGC AACAAGAGAA 
1751 TTTTGAGGAA GTCGCAAGAA GCAAAGATAT TGAGGTTCTG GAAGGAAAAC 
1801 CTATTTATGT TGATTGTTAT GGAAATTTGG CCCCACTTAC CAAAGGAGGA 
1851 CAGCAACTTG TTTTTAACTT TTATTCTTTC AAAGAAAATA GACTGCCATT 
1901 TTCCATCAAG ATTAGAGACA CCAGCCAAGA GCCCTGTGGT CGTCTGTCTT 
1951 TTCTGAAAGA ACCAAAGACA ACAAAAGGAC TGCCTCAAAC AGCGGTTTGC 
2001 AACTTAAATA TCACTCTGCC AGCACATAAA AAGATTGAGA AAACAGATGG 
2051 ACGACAGAGC TTCGCATCCT TAGCTTTACG TAAGCGCTAC AGCTACTTGA 
2101 CTGAGCCTGG AATGAGTCCA CAGAGTCCAT GTGAACGGAC AGATATCAGG 
2151 ATGGCAATAG TAGCCGATCA CCTGGGACTT AGTTGGACAG AACTGGCAAG 
2201 GGAACTGAAT TTTTCAGTGG ATGAAATCAA TCAAATACGT GTGGAAAATC 
2251 CAAATTCTTT AATTTCTCAG AGCTTCATGT TTTTAAAAAA ATGGGTTACC 
2301 AGAGACGGAA AAAATGCCAC AACTGATGCC TTAACTTCGG TCTTGACAAA 
2351 AATTAATCGA ATAGATATAG TGACACTGCT AGAAGGACCA ATATTTGATT 
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2401 ATGGAAATAT TTCAGGCACC AGAAGTTTTG CAGATGAGAA CAATGTTTTC 
24 51 CATGACCCTG TTGATGGTTA TCCTTCCCTT CAAGTGGAAC TGGAAACCCC 
2501. CACAGGGTTG CACTACACAC CACCTACCCC TTTCCAGCAA GATGATTATT 
2551 TTAGTGATAT CTCTAGCATA GAATCTCCCC TTAGAACCCC TAGTAGACTG 
2601 AGTGATGGGC TAGTGCCTTC CCAGGGGAAC ATAGAGCATT CCGCAGATGG 
2651 ACCTCCAGTC GTAACTGCAG AAGACGCTTC CTTAGAAGAC AGCAAACTGG 
27 01 AAGACTCAGT GCCTTTAACA GAAATGCCTG AAGCAGTGAT GTAGATGAGA 
2751 GCCAGTTGGA GAATGTATGT CTGAGTTGGC AGAATGAGAC ATCAAGTGGA 
2801 AACCTAGAGT CCTGCGCTCA AGCTCGAAGA GTAACTGGTG GGTTACTAGA 
2851 TCGACTGGAT GACAGCCCTG ACCAGTGTAG AGATTCCATT ACCTCATATC 
2901 TCAAAGGAGA AGCTGGCAAA TTTGAAGCAA ATGGAAGCCA TACAGAAATC 
2951 ACTCCAGAAG CAAAGACAAA ATCTTACTTT CCAGAATCCC AAAATGATGT 
3001 AGGAAAACAG AGTACCAAGG AAACTCTGAA ACCAAAAATA CATGGATCTG 
3051 GTCATGTTGA AGAACCAGCA TCACCACTAG CAGCATATCA GAAATCTCTA 
3101 GAAGAAACCA GCAAGCTTAT AATAGAAGAG ACTAAACCCT GTGTGCCTGT 
3151 CAGTATGAAA AAGATGAGTA GGACTTCTCC AGCAGATGGC AAGCCAAGGC 
3201 TTAGCCTCCA TGAAGAAGAG GGGTCCAGTG GGTCTGAGCA AAAGCAGGGA 
3251 GAAGGTTTTA AGGTGAAAAC GAAGAAAGAA ATCCGGCATG TGGAAAAGAA 
3301 GAGCCACTCG TAACAGCGAA CGGTCAGTCA AGGATCATAA GTTTTTACTG 
3351 CCAGTATTGA GAAATTCGTG GAAGAAATGT CAGCAGGAAG TAAAAATTCA 
3401 CCGAGAAGTG TGTGTGTGTT CGCTGCTTCC ACACATTAAT GGCATGATTT 
3451 TTTTTATGCA AAAAAAAAAA 


BLAST Results 


Entry MMANK3A_1 from database TREaiBL: 

Ank3"; product: "ankyrin 3"; Mus mu... +3 4022 0.0 2 

Entry HS13616 from database EMBL: 

Human ankyrin G (ANK-3) mRNA, complete cds. 

Length 14,770 

Plus Strand HSPs; 

Score •= 8505 (1276.1 bits). Expect « 0.0, Sum P{3) - 0.0 
Identities = 1799/1873 (96%) 


Medline entries 


95394457: 

Chromosomal localization of the ankyrinG gene 
(ANK3/Ank3) to human 10q21 and mouse 10. 

95138209: 

A new ankyrin gene with neural-specific isoforms localized at the 
axonal initial segment and node of Ranvier 


Peptide information for frame 3 


ORF from 309 bp to 2741 bp; peptide length: 811 
Category: known protein 
Classification: unset 


1 MALPQSEDAM TGDTDKYLGP QDLKELGDDS LPAEGYMGFS LGARSASLRS 
51 FSSDGSYTLN RSSYARDSMM lEELLVPSKE QHLTFTREFD SDSLRHYSWA 
101 ADTLDNVNLV PSPIHSGFLV SFMVDARGGS MRGSRHHGMR IIIPPRKCTA 
151 PTRITCRLVK RHKLANPPPM VEGEGLASRL VEMGPAGAQF LGPVIVEIPH 
201 FGSMRGKERE LIVLRSENGE TWKEHQFDSK NEDLTELLNG MDEELDSPEE 
251 LGKKRICRII TKDFPQYFAV VSRIKQESNQ IGPEGGILSS TTVPLVQASF 
301 PEGALTKRIR VGLQAQPVPD EIVKKILGMK ATFSPIVTVE PRRRKFHKPI 
351 TMTIPVPPPS GEGVSNGYKG DTTPNLRLLC SITGGTSPAQ WEDITGTTPL 
401 TFIKDCVSFT TNVSARFWLA DCHQVLETVG LATQLYRELI CVPYMAKFVV 
451 FAKMNDPVES SLRCFCMTDD KVDKTLEQQE NFEEVARSKD lEVLEGKPIY 
501 VDCYGNLAPL TKGGQQLVFN FYSFKENRLP FSIKIRDTSQ EPCGRLSFLK 
551 EPKTTKGLPQ TAVCNLNITL PAHKKIEKTD GRQSFASLAL RKRYSYLTEP 
601 GMSPQSPCER TDIRMAIVAD HLGLSWTELA RELNFSVDEI NQIRVENPNS 
651 LISQSFMFLK KWVTRDGKNA TTDALTSVLT KINRIDIVTL LEGPIFDYGN 
701 ISGTRSFADE NNVFHDPVDG YPSLQVELET PTGLHYTPPT PFQQDDYFSD 
751 ISSIESPLRT PSRLSDGLVP SQGNIEHSAD GPPWTAEDA SLEDSKLEDS 
801 VPLTEMPEAV M 

BLASTP hits 
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No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_24p5, frame 3 

TREMBL:MMANK3A_1 gene: "Ank3"; product: "ankyrin 3"; Mus musculus 
epithelial ankyrin 3 (Ank3} 5kb isoform mRNA, complete cds., N = 1, 
Score = 4022, P = 0 

TREMBL:MMANK3B_3 gene: "AnkS"; product: "ankyrin 3"; Mus musculus 
epithelial ankyrin 3 (7kb isoform) mRNA, complete cds., N 1, Score = 
4005, P = 0 

TREMBL:MMANK3B_4 gene: "Ank3"; product: "ankyrin 3"; Mus musculus 
epithelial ankyrin 3 (7kb isoform) mRNA, complete cds., N = 1, Score = 
4005, P = 0 


>TREMBL:MMANK3A_1 gene: '•Ank3''; product: "ankyrin 3"; Mus musculus 
epithelial ankyrin 3 (Ank3) 5kb isoform mRNA, complete cds. 
Length = 1,094 

HSPs : 

Score = 4022 (603.5 bits). Expect = O-Oe-fOO, P = O.Oe+00 
Identities ^ 769/805 (95%), Positives = 783/805 (97%) 

Query: 1 MALPQSEDAMTGDTDKYLGPQDLKELGDDSLPAEGYMGFSLGARSASLRSFSSDGSYTLN 60 

MALP SEDA+TGDTDKYLGPQDLKELGDDSLPAEGY+GFSLGARSASLRSFSSD SYTLN 
Sbjct: 1 MALPHSEDAITGDTDKYLGPQDLKELGDDSLPAEGYVGFSLGARSASLRSFSSDRSYTLN 60 

Query: 61 RSSYARDSMMIEELLVPSKEQHLTFTREFDSDSLRHYSWAADTLDNVNLVPSPIHSGFLV 120 

RSSYARDSMMIEELLVPSKEQHLTFTREFDSDSLRHYSWAADTLDNVNLV SP+HSGFLV 
Sbjct: 61 RSSYARDSMMIEELLVPSKEQHLTFTREFDSDSIiRHYSWAADTLDNVNLVSSPVHSGFLV 120 

Query: 121 SFMVDARGGSMRGSRHHGMRIIIPPRKCTAPTRITCRLVKRHKLANPPPMVEGEGLASRL 180 

SEMVDARGGSMRGSRHHGMRI 1 1 PPRKCTAPTRITCRLVKRHKLANPPPMVEGEGLASRL 
Sbjct: 121 SFMVDARGGSMRGSRHHCaiRIIIPPRKCTAPTRITCRLVKRHKLANPPPMVEGEGLASRL 180 

Query: 181 VEMGPAGAQFLGPVIVEIPHFGSMRGKERELIVLRSENGETWKEHQFDSKNEDLTELLNG 240 

VEMGPAGAQFLGPVIVEIPHFGSMRGKERELIVLRSENGETWKEHQFDSKNEDL ELLNG 
Sbjct: • 181 VEMGPAGAQFLGPVIVEIPHFGSMRGKERELIVLRSENGETWKEHQFDSKNEDLAELLNG 240 

Query: 241 MDEELDSPEELGKKRICRIITKDFPQYFAWSRIKQESNQIGPEGGILSSTTVPLVQASF 300 

MDEELDSPEELG KRICRIITKDFPQYFAVVSRIKQESNQIGPEGGILSSTTVPLVQASF 
Sbjct: 241 MDEELDSPEELGTKRICRIITKDFPQYFAVVSRIKQESNQIGPEGGILSSTTVPLVQASF 300 

Query: 301 PEGALTKRIRVGLQAQPVPDEIVKKILGNKATFSPIVTVEPRRRKFHKPITMTIPVPPPS 360 

PEGALTKRIRVGLQAQPVP+E VKKILGNKATFSPIVTVEPRRRKFHKPITMTIPVPPPS 
Sbjct: 301 PEGALTKRIRVGLQAQPVPEETVKKILGNKATFSPIVTVEPRRRKFHKPITMTIPVPPPS 360 

Query: 361 GEGVSNGYKGDTTPNLRLLCSITGGTSPAQWEDITGTTPLTFIKDCVSFTTNVSARFWLA 420 

GEGVSNGYKGD TPNLRLLCSITGGTSPAQWEDITGTTPLTFIKDCVSETTNVSARFWLA 
Sbjct: 361 GEGVSNGYKGDATPNLRLLCSITGGTSPAQWEDITGTTPLTFIKDCVSFTTNVSARFWLA 420 

Query: 421 DCHQVLETVGLATQLYRELICVPYMAKFVVFAKMNDPVESSLRCF(afTDDKVDKTLEQQE 480 

DCHQVLETVGLA+QLYRELICVPYMAKFVVFAK NDPVESSLRCFCMTDD+VDKTLEQQE 
Sbjct: 421 DCHQVLETVGLASQLYRELICVPYMAKFWFAKTNDPVESSLRCFCMTDDRVDKTLEQQE 480 

Query: 481 NFEEVARSKDIEVLEGKPIYVDCYGNLAPLTKGGQQLVFNFYSFKENRLPFSIKIRDTSQ 540 

NFEEVARSKDIEVLEGKPI YVDCYGNLAPLTKGGQQLVFNFYSFKENRLPFSI KI RDTSQ 
Sbjct: 481 NFEEVARSKDIEVLEGKPIYVDCYGNLAPLTKGGQQLVFNFYSFKENRLPFSIKIRDTSQ 540 

Query: 541 EPCGRLSFLKEPKTTKGLPQTAVCNLNITLPAHKKIEKTDGRQSFASLALRKRYSYLTEP 600 

EPCGRLSFLKEPKTTKGLPQTAVCNLNITLPAHKK EK D RQSFASLALRKRYSYLTEP 
Sbjct: 541 EPCGRLSFLKEPKTTKGLPQTAVCNLNITLPAHKKAEKADRRQSFASLALRKRYSYLTEP 600 

Query: 601 GMSPQSPCERTDIRMAIVADHLGLSWTELARELNFSVDEINQIRVENPNSLISQSFMFLK 660 

MSPQSPCERTDIRMAIVADHLGLSWTELARELNFSVDEINQIRVENPNSLISQSFM LK 
Sbjct: 601 SMSPQSPCERTDIRMAIVADHLGLSWTELARELNFSVDEINQIRVENPNSLISQSFMLLK 660 

Query: 661 KWVTRDGKNATTDALTSVLTKINRIDIVTLLEGPIFDYGNISGTRSFADENNVFHDPVDG 720 

KWVTRDGKNATTDALTSVLTKINRIDIVTLLEGPIFDYGNISGTRSFADENNVFHDPVDG 
Sbjct: 661 KWVTRDGKNATTDALTSVLTKINRIDIVTLLEGPIFDYGNISGTRSFADENNVFHDPVDG 720 

Query: 721 YPSLQVELETPTGLHYTPPTPFQQDDYFSDISSIESPLRTPSRLSDGLVPSQGNIEHSAD 780 

+PS QVELETP GL++TPP PFQQDD+FSDISSIESP RTPSRLSDGLVPSQGNIEH 
Sbjct: 721 HPSFQVELETPMGLYWTPPNPFQQDDHFSDISSIESPFRTPSRLSDGLVPSQGNIEHPTG 780 

Query: 781 gppvvtaedasledskledsvplte 805 

GPPVVTAED SLEDSK++DSV +T+ 
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Sbjct: 781 GPPVVTAEDTSLEDSKMDDSVTVTD 805 

Pedant information for DKFZphf kd2_24p5, frame 3 
Report for DKF2phfkd2_24p5. 3 

(LENGTH) 811 

tMWJ 90104.66 

[pIJ 5.40 

[BLOCKS] BL50017B Death domain proteins profile 

(PIRKW} phosphoprotein 0.0 

(PIRKW) alternative splicing 0.0 

[PIRKW] peripheral membrane protein 0.0 

(PIRKW] cytoskeleton 0.0 

[SUPFAM] ankyrin 0.0 

[SUPFAMJ ankyrin repeat homology 0.0 

[SUPFAM] unassigned ankyrin repeat proteins 0 0 

IKWI TRANSMEMBRANE 2 

JKWI LOW COMPLEXITY 1.73 % 


SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 


MALPQSEDAMTGDTDKYLGPQDLKELGDDSLPAEGYMGFSLGARSASLRSFSSDGSYTLN 
cccccccccccccccccccccccccccccccccccccccccccccceeeeeccc^ 

RSSYARDSMMIEELLVPSKEQHLTFTREFDSDSLRHYSWAADTLDNVNLVPSPIHSGFLV 

PRD cccchhhhhhhhheeeehhhhhhhhhhhccccccccccccccccc^^ 

MMMMMMMMMMMM 

SEQ S FMVDARGGSMRGS RHHGMRI 1 1 PPRKCTAPTRITCRL VKRHKLANPPPMVEGEGLASRL 

SEG xxxxxxxxxxxxxx 

PRD eeeeeccccccccccccceeeecccccccccceeeeehhhhhcccccccccccc^^^ 
MEM MMMMMMMMMMMMMMMM ^-^-'-v.^^-^cc 

SEQ 
SEG 
PRD 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMM. 

MDEELDSPEELGKKRICRIITKDFPQYFRWSRIKQESNQIGPEGGILSSTTVPLVQASF 
cccccchhhhhhhhheeeeeeccccceeeeehhhhhcccccccccccccceeeeeeec^ 


VEMGPAGAQFLGPVIVEIPHFGSMRGKERELIVLRSENGETWKEHQFDSKNEDLTELLNG 
eecccccceeeceeeeeeccccccccccceeeeeeccccceeeeeccccccchhhhh^^ 


SEQ 
SEG 
PRD 
MEM 


SEQ PEGALTKRIRVGLQAQPVPDEIVKKILGNKATFSPIVTVEPRRRKFHKPXTMTZPVPPPS 

SEG 

PRD ccchhhhhhhhhhhhhccccceeeeccccccccccceeeccccccccccceeeecc^ 

MEM 

SEQ GEGVSNGYKGDTTPNLRLLCSITGGTSPAQWEDITGTTPLTFIKDCVSFTTNVSARFWLA 

SEG 

PRD 

MEM 


SEQ 

SEG 
PRD 
MEM 


ccccccccccccccceeeeeeeeccccccccccccccceeeeeeccccccccccceeeec 

DCHQVLETVGLATQLYRELICVPYMAKFWFAKMNDPVESSLRCFCMTDDKVDKTLEQQE 
cchhhhhhhhhhhhhhhhhhhhcchhhhhheeecccchhhhhhhhcccccjihh^^ 


SEG ^'^^^^^'^^^^^^^V^EGKPIYVDCYGNLAPLTKGGQQLVFNFYSFKENRLPFSIKIRDTSQ 


PRD 
MEM 


cceeecccceeeeeeccceeeeecccccccchhhhhhhhhchhhhhhhcc 


eeeeeecccc 


SEQ EPCGRLSFLKEPKTTKGLPQTAVCNLNITLPAHKKIEKTDGRQSFASLALRKRYSYLTEP 

SEG * 

PRD 
MEM 


ccccceeeeccccccccccccccccccccccccccccccccchhhhhhhhhhhhheeecc 


SEQ GMSPQSPCERTDIRMAIVADHLGLSWTELARELNFSVDEINQIRVENPNSLISQSTOFLK 

SEG 

PRD ccccccchhhhhhhhhhhhhhhccchhhhhhhhhhhhhhcceeeeecccc^ 
MEM 
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SEQ KWVTRDGKNATTDALTSVLTKINRIDIVTLLEGPIFDYGNISGTRSFADENNVFHDPVDG 

SEG 

PRD hhhhcccccccchhhhhhhhhhcceeeeeeeccccccccccccccccccccccccccccc 

MEM 


SEQ YPSLQVELETPTGLHYTPPTPFQQDDYFSDISSIESPLRTPSRLSDGLVPSQGNIEHSAD 

SEG 

PRD cccceeeeeccccccccccccccccccccceeeccccccccccccccccccccccccccc 

MEM 


SEQ GPPWTAEDASLEDSKLEDSVPLTEMPEAVM 

SEG 

PRD ccceeeecccccccccccccccccccccccc 

MEM 


(No Prosite data available for DKFZphfkd2_24p5. 3) 
(No Pfam data available for DKFZphfkd2_24p5.3) 
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PCT/IBOO/014% 


group: transmembrane protein 

DKFZphfkd2_3il3 encodes a novel 406 amino acid protein with C. elegans cosroid Y37D8A and A. 
thaliana H71412 hypothetical protein. 

The novel protein contains 3 transmembrane regions. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of kidney-specific 
genes and as a new marker for kidney cells. 


similarity to A. thaliana and C. elegans; 

membrane regions : 3 

complete cDNA, complete cds, EST hits 
Sequenced by BMF2 
Locus: /map's^l?*' 
Insert length: 2052 bp 

Poly A stretch at pos. 2032, no polyadenylation signal found 


1 AGTGACGTGA GCGGGTTCCG GTTGTCTGGA GCCCAGCGGC GGGTGTGAGA 
51 GTCCGTAAGG AGCAGCTTCC AGGATCCTGA GATCCGGAGC AGCCGGGGTC 
101 GGAGCGGCTC CTCAAGAGTT ACTGATCTAT GAAATGGCAG AGAATGGAAA 
151 AAATTGTGAC CAGAGACGTG TAGCAATGAA CAAGGAACAT CATAATGGAA 
201 ATTTCACAGA CCCCTCTTCA GTGAATGAAA AGAAGAGGAG GGAGCGGGAA 
251 GAAAGGCAGA ATATTGTCCT GTGGAGACAG CCGCTCATTA CCTTGCAGTA 
301 TTTTTCTCTG GAAATCCTTG TAATCTTGAA GGAATGGACC TCAAAATTAT 
351 GGCATCGTCA AAGCATTGTG GTGTCTTTTT TACTGCTGCT TGCTGTGCTT 
401 ATAGCTACGT ATTATGTTGA AGGAGTGCAT CAACAGTATG TGCAACGTAT 
451 AGAGAAACAG TTTCTTTTGT ATGCCTACTG GATAGGCTTA GGAATTTTGT 
501 CTTCTGTTGG GCTTGGAACA GGGCTGCACA CCTTTCTGCT TTATCTGGGT 
551 CCACATATAG CCTCAGTTAC ATTAGCTGCT TATGAATGCA ATTCAGTTAA 
601 TTTTCCCGAA CCACCCTATC CTGATCAGAT TATTTGTCCA GATGAAGAGG 
651 GCACTGAAGG AACCATTTTT TTGTGGAGTA TCATCTCAAA AGTTAGGATT 
701 GAAGCCTGCA TGTGGGGTAT CGGTACAGCA ATCGGAGAGC TGCCTCCATA 
751 TTTCATGGCC AGAGCAGCTC GCCTCTCAGG TGCTGAACCA GATGATGAAG 
801 AGTATCAGGA ATTTGAAGAG ATGCTGGAAC ATGCAGAGTC TGCACAAGAC 
851 TTTGCCTCCC GGGCCAAACT GGCAGTTCAA AAACTAGTAC AGAAAGTTGG 
901 ATTTTTTGGA ATTTTGGCCT GTGCTTCAAT TCCAAATCCT TTATTTGATC 
951 TGGCTGGAAT AACGTGTGGA CACTTTCTGG TACCTTTTTG GACCTTCTTT 
1001 GGTGCAACCC TAATTGGAAA AGCAATAATA AAAATGCATA TCCAGAAAAT 
1051 TTTTGTTATA ATAACATTCA GCAAGCACAT AGTGGAGCAA ATGGTGGCTT 
1101 TCATTGGTGC TGTCCCCGGC ATAGGTCCAT CTCTGCAGAA GCCATTTCAG 
1151 GAGTACCTGG AGGCTCAACG GCAGAAGCTT CACCACAAAA GCGAAATGGG 
1201 CACACCACAG GGAGAAAACT GGTTGTCCTG GATGTTTGAA AAGTTGGTCG 
1251 TTGTCATGGT GTGTTACTTC ATCCTATCTA TCATTAACTC CATGGCACAA 
1301 AGTTATGCCA AACGAATCCA GCAGCGGTTG AACTCAGAGG AGAAAACTAA 
1351 ATAAGTAGAG AAAGTTTTAA ACTGCAGAAA TTGGAGTGGA TGGGTTCTGC 
1401 CTTAAATTGG GAGGACTCCA AGCCGGGAAG GAAAATTCCC TTTTCCAACC 
1451 TGTATCAATT TTTACAACTT TTTTCCTGAA AGCAGTTTAG TCCATACTTT 
1501 GCACTGACAT ACTTTTTCCT TCTGTGCTAA GGTAAGGTAT CCACCCTCGA 
1551 TGCAATCCAC CTTGTGTTTT CTTAGGGTGG AATGTGATGT TCAGCAGCAA 
1601 ACTTGCAACA GACTGGCCTT CTGTTTGTTA CTTTCAAAAG GCCCACATGA 
1651 TACAATTAGA GAATTCCCAC CGCACAAAAA AAGTTCCTAA GTATGTTAAA 
1701 TATGTCAAGC TTTTTAGGCT TGTCACAAAT GATTGCTTTG TTTTCCTAAG 
1751 TCATCAAAAT GTATATAAAT TATCTAGATT GGATAACAGT CTTGCATGTT 
1801 TATCATGTTA CAATTTAATA TTCCATCCTG CCCAACCCTT CCTCTCCCAT 
1851 CCTCAAAAAA GGGCCATTTT ATGATGCATT GCACACCCTC TGGGGAAATT 
1901 GATCTTTAAA TTTTGAGACA GTATAAGGAA AATCTGGTTG GTGTCTTACA 
1951 AGTGAGCTGA CACCATTTTT TATTCTGTGT ATTTAGGATG AAGTCTTGAA 
2001 AAAAACTTTA TAAAGACATC TTTAATCATT CCAAAAAAAA AAAATAAAAA 
2051 AA 


BLAST Results 


Entry AC004686 from database EMBL: 

*** SEQUENCING IN PROGRESS *** Homo sapiens Chromosome 17, clone 
hRPC.1073_F_I5; HTGS phase 1, 8 unordered pieces. 
Score 4142, P « 6.1e-199, identities = 830/832 
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Medline entries 

No Medline entry 

Peptide information for frame 2 


ORF from 134 bp to 1351 bp; peptide length: 406 
Category: similarity to unknown protein 


1 MAENGKNCDQ RRVAMNKEHH NGNFTDPSSV NEKKRREREE RQNIVLWRQP 
51 LITLQYFSLE ILVILKEWTS KLWHRQSIW SFLLLLAVLI ATYYVEGVHQ 
101 QYVQRIEKQF LLYAYWIGLG ILSSVGLGTG LHTFLLYLGP HIASVTLAAY 
151 ECNSVNFPEP PYPDQUCPD EEGTEGTIFL WSIISKVRIE ACMWGIGTAI 
201 GELPPYFMAR AARLSGAEPD DEEYQEFEEM LEHAESAQDF ASRAKLAVQK 
251 LVQKVGFFGI LACASIPNPL FDLAGITCGH FLVPFWTFFG ATLIGKAIIK 
301 MHIQKIFVII TFSKHIVEQM VAFIGAVPGI GPSLQKPFQE YLEAQRQKLH 
351 HKSEMGTPQG ENWLSWMFEK LVWMVCYFI LSIINSMAQS YAKRIQQRLN 
401 SECKTK 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_3il3, frame 2 

TREMBL:CEY37D8A_20 gene: "Y37D8A.22-; Caenorhabditis elegans cosmid 
Y37D8A, N = 1, Score = 905, P = 8.8e-91 

TREMBL:ATAC98_2 gene: "YUP8H12.2-; Arabidopsis thaliana chromosome 1 
YAC yUP8H12 complete sequence., N » 1, Score - 470, P - l.le-44 

PIR:H71412 hypothetical protein - Arabidopsis thaliana, N « 1, Score - 
293, P « 6e-24 

>TREMBL:CEY37D8A_20 gene: "Y37D8A.22"; Caenorhabditis elegans cosmid 
Y37D8A 

Length = 457 

HSPs: 

Score = 905 (135.8 bits). Expect = 8.8e-91, P = 8.8e-91 
Identities = 167/317 (52%), Positives ^ 228/317 (71%) 

REERQNIVLWRQPLITLQYFSLEILVILKEWTSKLWHRQSIVVSFLLLLAVLIATYYVEG 
R ER+ IV WR+P I + Y +EI + E K+ +++++ + + + + Y+ G 


HQ++VQ lEK L +++W+ LG+LSS+GLG+GLHTFL+YLGPHIA+VT+AAYEC S++F 
AHQEHVQTIEKHILWWSWWVLLGVLSSIGLGSGLHTFLIYLGPHIAAVTMAAYECQSLDF 

PEPPYPDQI ICPDEEGTEGTI FLWSIISKVRIEACMWGIGTAIGELPPYFMARAARLSGA 
P+PPYP+ I CP + + F W I++KVR+E+ +WG GTA+GELPPYFMARAAR+SG 
PQPPYPESIQCPSTKSSIAVTF-WQIVAKVRVESLLWGAGTALGELPPYFMARAARISGQ 

EPDDEEYQEFEEMLE-HAESAQD FASRAKLAVQKLVQKVGFFGILACASIPNPLFD 

EPDDEEY+EF E++ ES D RAK V+ + ++GF GIL ASIPNPLFD 

EPDDEEYREFLELMNADKESDADQKLSIVERAKSWVEHNIHRLGFPGILLFASIPNPLFD 

LAGITCGHFLVPFWTFFGATLI GKAI I KMHI QKI FVII TFSKH I VEQMVAFIGAVPGI GP 
LAGITCGHFLVPFW+FFGATLIGKA++KMH+Q FVI+ FS H E V + +P +GP 
LAGITCGHFLVPFWSFFGATLI GKALVKMH VQMGFVI LAFS DHHAENFVKI LEKI PAVGP 

SLQKPFQEYLEAQRQKLH 350 

+++P + LE QR+ LH 
YIRQPISDLLEKQRKALH 409 

Pedant information for DKFZphf kd2_3il3, frame 2 

Report for DKFZphfkd2 3il3.2 


Query: 

38 

Sbjct: 

93 

Query: 

98 

Sbjct: 

153 

Query: 

158 

Sbjct: 

213 

Query: 

218 

Sbjct: 

272 

Query: 

273 

Sbjct: 

332 

Query: 

333 

Sbjct: 

392 
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[LENGTHJ 406 

[MWI 46298.17 

[pIJ 6.47 

jHOMOLj TREMBL:CEY37D8A_20 gene: 

[PROSITE) MYRISTYL 10 

(PROSITEJ CK2_PH0SPHO SITE 

[ PROSITE] PKC_PHOSPHO~SITE 

[PROSITE] ASN_GLYCOSYLATION 

f KW j TRANSMEMBRANE 3 

[KWJ LOW^COMPLEXITY 


"Y37D8A.22"; Caenorhabditis elegans cosmid Y37D8A le- 


3 
1 
1 

9.85 % 


SEQ 
SEG 
PRO 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
HEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 


MAENGKNCDQRRVAMNKEHHNGNFTDPSSVNEKKRREREERQNIVLWRQPLITLQYFSLE 
xxxxxxxxxx 

ccccccchhhhhhhhhhhccccccccccccchhhhhhhhhhhhhhhhccccchhhhhhlih 
MMMMMMMMMMMMMMMMMMMMMMI'ffilMMM 

ILVILKEWTSKLWHRQSIVVSFLLLLAVLIATYYVEGVHQQYVQRIEKQFLLyAYWIGLG 
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh44e^ 

™* MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

ILSSVGLGTGLHTFLLYLGPHIASVTLAAYECNSVNFPEPPYPDQIICPDEEGTEGTIFL 
xxxxxxxxxxx ••••...,,.,,«...,._ 

hccccccccceeeeeeeccchhhhhhhhhhhccccccccccccccccccc^ 

WSIISKVRIEACMWGIGTAIGELPPYFMARAARLSGAEPDDEEYQEFEEMLEHAESAQDF 
eehhhhhhhhhhhhhccccccccccchhhhhhhhcccccc^^ 

ASRAKLAVQKLVQKVGFFGILACASIPNPLFDLAGITCGHFLVPFWTFFGATLIGKAUK 

hhhhhhhhhhhhhhhcceeeeeeeecccccccccccccccceeeeeeehhhh^^ 
MMMMMMMMMMMMMMMMMMMMM 


MHIQKI FVIITFSKHI VEQMVAFIGAVPGIGPSLQKPFQEYLEAQRQKLHHKSEMGTPQG 
hhhhheeeeeeechhhhhhhhhhhhccccccccchhhhhhhhhhhhhhh^ 

BNWLSWMFEKLWVMVCYFILSIINSMAQSYAKRIQQRLNSEEKTK 
cchhhhhhhhhheeehhhhhhhhhhhhhhhhhhhhhhhhhhhcccc 


Prosite for DKFZphf kci2_3il3 . 2 


PSOOOOl 

23->27 

ASN GLYCOSYLATION 

PS00005 

69->72 

PKC PHOSPHO 

SITE 

PS00006 

29->33 

CK2 PHOSPHO" 

"site 

PS00006 

215->219 

CK2 PHOSPHO" 

'site 

PS00006 

236->240 

CK2 PHOSPHO" 

"site 

PS00008 

120'>126 

MYRISTYL 


PS00008 

126->132 

MYRISTYL 


PS00008 

173->179 

MYRISTYL 


PS00008 

195->201 

MYRISTYL 


PS00008 

197->203 

MYRISTYL 


PS00008 

259->265 

MYRISTYL 


PS00008 

275->281 

MYRISTYL 


PS00008 

325->33l 

MYRISTYL 


PS00008 

329->335 

MYRISTYL 


PS00008 

356->362 

MYRISTYL 



PDOCOOOOl 
PDOC00005 
PDOC00006 
PDOCG0006 
PDOC00006 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 


(No Pfam data available for DKFZphf kd2_3il3 .2) 
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DKFZphfkd2 3ol7 


group: metabolism 

DKFZphfkd2_3ol7 encodes a novel 72 amino acid protein with similarity to bos taurus NADH- 
ubiquinone oxidoreductase B33 subunit (EC 1.6.5.3) (EC 1.6.99.3). 

NAOH: ubiquinone oxidoreductase is the first enzyme in the respiratory electron transport chain 
of mitochondria, it is a a membrane -bound multi-subunit protein. The bovine heart enzyme 
contains about 40 different polypeptides. The novel protein is the human orthologue of bovine 
h22\ 

The new protein can find application in modulation of the respiratory electron transport chain 
pathways of mitochondria. 


strong similarity to bovine NADH-UBIQUINONE OXIDOREDUCTASE B22 subunit 

complete cDNA, complete cds, EST hits, 

in frame stop codon at -274 will be checked 

ESTs HS1291620/AA883920 show no stop codon at this side 

Sequenced by BMFZ 

Locus: unlcnown 

Insert length: 693 bp 

Poly A stretch at pos. 670, polyadenylation signal at pos. 659 


1 CAGCAGGCGT GCAGTTTCCC GGCTCTCCGC GCGGCCGGGG AAGGTCAGCG 
51 CCGTAATGGC GTTCTTGGCG TCGGGACCCT ACCTGACCCA TCAGCAAAAG 
101 GTGTTGCGGC TTTATAAGCG GGCGCTACGC CACCTCGAGT CGTGGTGCGT 
151 CCAGAGAGAC AAATACCGAT ACTTTGCTTG TTTGATGAGA GCCCGGTTTG 
201 AAGAACATAA GAATGAAAAG GATATGGCGA AGGCCACCCA GCTGCTGAAG 
251 GAGGCCGAGG AAGAATTCTG GTAACGTCAG CATCCACAGC CATACATCTT 
301 CCCTGACTCT CCTGGGGGCA CCTCCTATGA GAGATACGAT TGCTACAAGG 
351 TCCCAGAATG GTGCTTAGAT GACTGGCATC CTTCTGAGAA GGCAATGTAT 
401 CCTGATTACT TTGCCAAGAG AGAACAGTGG AAGAAACTGC GGAGGGAAAG 
451 CTGGGAACGA GAGGTTAAGC AGCTGCAGGA GGAAACGCCA CCTGGTGGTC 
501 CTTTAACTGA AGCTTTGCCC CCTGCCCGAA AGGAAGGTGA TTTGCCCCCA 
551 CTGTGGTGGT ATATTGTGAC CAGACCCCGG GAGCGGCCCA TGTAGAAAGA 
601 GAGAGACCTC ATCTTTCATG CTTGCAAGTG AAATATGTTA CAGAACATGC 
651 ACTTGCCCTA ATAAAAAATC AGTAAAAAAA AAAAAAAAAA AAA 


BLAST Results 


Entry S28256 from database PIR: 

NADH dehydrogenase (ubiquinone) (EC 1.6.5.3) chain CI-B22 - bovine 
>TREMBL:MIBTCIB22_1 gene: "01-822"; product: "NADH-ubiquinone 

oxidoreductase complex B22 subunit"; B. taurus mitochondrion cI-B22 
mRNA for B22 subunit of the NADH-ubiquinone oxidoreductase complex 
Score - 933, P « 5.2e-93, identities = 163/179, positives = 172/179, 
frame +2 


Medline entries 


92389317 

Sequences of 20 subunits of NADH: ubiquinone oxidoreductase from RT bovine heart mitochondria. 
Application of a novel strategy for RT sequencing proteins using the polymerase chain reaction 

Peptide information for frame 2 


ORF from 56 bp to 271 bp; peptide length: 72 
Category: strong similarity to )cnown protein 


1 MAFLASGPYL THQQKVLRLY KRALRHLESW CVQRDKYRYF ACLMRARFEE 

51 HKNEKDMAKA TQLLKEAEEE FW*RQHPQPY IFPDSPGGTS YERYDCYKVP 

101 EWCLDDWHPS EKAMYPDYFA KREQWKKLRR ESWEREVKQL QEETPPGGPL 

151 TEALPPARKE GDLPPLWWYI VTRPRERPM 
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BLAST P hits 

Sequences producing significant alignments: (bits) Value 

sp|Q02369|NI2M_BOVIN|OD36CE1728lrB735 (ND0FB9. .) NADH -UBIQUINONE. . . 141 7e-34 
tr|U41534|Q18036|D34BCCB6E8FBCD5F (C16A3. 4 ) SIMILAR TO NADH-UBIQ. . . 53 3e-07 

>sp|Q02369|Nl2M_BOVINl0D36CE17281FB735 (NDUFB9, . ) NADH-UBIQUINONE 
OXIDOREDUCTASE B22 SUBUNIT (EC 1.6.5.3) (EC 1.6.99.3) 
(COMPLEX I-B22) (CI-B22) . [BOS TAURUS] 
Length = 178 

Score 141 bits (351), Expect = 7e-34 
Identities = 63/71 (88%), Positives = 68/71 (95%) 

Query: 2 AFLASGPYLTHQQKVLRLYKRALRHLESWCVQRDKyRYFACLNRARFEEHKNEKDMAKAT 61 

AFL+SG yLTHQQKVLRLYKRALRHLESWC+ RDKYRYFACL+RARF+EHKNEKDM KAT 
Sbjct: 1 AFLSSGAYLTHQQKVLRLYKRALRHLESWCIHRDKYRYFACLLRARFDEHKNEKDMVKAT 60 

Query: 62 QLLKEAEEEFW 72 

QLL+EAEEEFW 
Sbjct: 61 QLLREAEEEFW 71 


>tr|O41534|Q18036|D34BCCB6E8FBCD5F (C16A3 . 4) SIMILAR TO 

NADH-UBIQUINONE OXIDOREDUCTASE B22 . [CAENORHABDITIS 

ELEGANS) 
Length « 163 

score = 52.7 bits (124), Expect = 3e-07 

Identities = 25/64 (39%), Positives = 41/64 (64%), Gaps = 1/64 (1%) 

Query: 10 LTHQQKVLRLYKRALRHLESWCVQRD-KYRYFACLMRARFEEHKNEKDMAKATQLLKEAE 68 

L+H+QKV RLYKR LR +++W + + R+ C++RARF-*- + +E D K+ LL + 
Sbjct: 12 LSHRQKVTRLYKRCLREVDNWYGGNNLEVRFQKCIIRARFDANADBVDTRKSQILLADGC 71 

Query: 69 EEFW 72 
+ W 

Sbjct: 72 RQLW 75 

Alert BLASTP hits for DKFZphf kd2_3ol7, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphf )cd2_3ol7, frame 2 


Report for DKFZphf kd2_3ol7 .2 


[LENGTH] 

IMW] 

Ipl] 

[HOMOL] 

2e-34 

[KW] 


72 

8839.28 
9.26 

PIR:S28256 NADH dehydrogenase (ubiquinone) (EC 1.6.5.3) chainCI-B22 - bovine 
All_Alpha 


SEQ MAFLASGPYLTHQQKVLRLYKRALRHLESWCVQRDKYRYFACLMRARFEEHKNEKDMAKA 

PRD ccccccccchhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhhhhcchhhhhhh 

SEQ TQLLKEAEEEFW 

PRD hhhhhhhhhccc 


(No Prosite data available for DKFZphf kd2_3ol7.2> 
(No Pfam data available for DKFZphf kd2_3ol7 . 2 ) 
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DKF2phfkd2_4 6d6 


group: kidney derived 

DKFZphf kd2_4 6a6 encodes a novel 315 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of kidney-specific 
genes. 


unknown 

couple te cdNA, complete cds, EST hits 
Sequenced by MediGenomix 

Locus: /map-*'228.6 cR from top of ChrlS linkage group" 
Insert length: 2774 bp 

Poly A stretch at pos . 2751, polyadenylation signal at pos. 2732 


1 CTCGCGAGCG CAGCTATGGC TGCTGGCGTA CCCTGTGCGT TAGTCACCAG 
51 CTGCTCCTCC GTCTTCTCAG GAGACCAGCT GGTCCAACAT ACCCTTGGAA 
101 CAGAAGATCT TATTGTGGAA GTGACTTCCA ATGATGCTGT GAGATTTTAT 
151 CCCTGGACCA TTGATAATAA ATACTATTCA GCAGACATCA ATCTATGTGT 
201 GGTGCCAAAC AAATTTCTTG TTACTGCAGA GATTGCAGAA TCTGTCCAAG 
251 CATTTGTGGT TTACTTTGAC AGCACACGAA AATCGGGCCT TGATAGTGTC 
301 TCCTCATGGC TTCCACTGGC AAAAGCATGG TTACCTGAGG TGATGATCTT 
351 GGTCTGCGAT AGAGTGTCTG AAGATGGTAT AAACCGACAA AAAGCTCAAG 
401 AATGGAGCCT CAAACATGGC TTTGAATTGG TAGAACTTAG TCCAGAGGAG 
451 TTGCCTGAGG AGGATGATGA CTTCCCAGAA TCTACAGGAG TAAAGCGAAT 
501 TGTCCAAGCC CTGAATGCCA ATGTGTGGTC CAATGTAGTG ATGAAGAATG 
551 ATAGGAACCA AGGCTTTAGC CTTCTCAACT CATTGACTGG AACAAACCAT 
601 AGCATTGGGT CAGCAGATCC CTGTCACCCA GAGCAACCCC ATTTGCCAGC 
651 AGCAGATAGT ACTGAATCCC TCTCTGATCA TCGGGGTGGT GCATCTAACA 
701 CAACAGATGC CCAGGTTGAT AGCATTGTGG ATCCCATGTT AGATCTGGAT 
751 ATTCAAGAAT TAGCCAGTCT TACCACTGGA GGAGGAGATG TGGAGAATTT 
801 TGAAAGACCC TTTTCAAAGT TAAAGGAAAT GAAAGACAAG GCTGCGACGC 
851 TTCCTCATGA GCAAAGAAAA GTGCATGCAG AAAAGGTGGC CAAAGCATTC 
901 TGGATGGCAA TCGGGGGAGA CAGAGATGAA ATTGAAGGCC TTTCATCTGA 
951 TGGAGAGCAC TGAATTATTC ATACTAGGGT TTGACCAACA AAGATGCTAG 
1001 CTGTCTCTGA GATACCTCTC TACTCAGCCC AGTCATATTT TGCCAAAATT 
1051 GCCCTTATCA TGTTGGCTGC CTGACTTGTT TATAGGGTCC CCTTAATTTT 
1101 AGTTTTTAGT AGGAGGTTAA GGAGAAATCT TTTTTTTCCT CAGTATATTG 
1151 TAAGAGAGTG AGGAATACAG TGATAGTAAT GAGTGAGGAT TTCTTAAATA 
1201 TACTTTTTTT TTGTTCTAGG AATGAGGGTA GGATAAATCT CAGAGGTCTG 
1251 TGTGATTTAC TCAAGTTGAA GACAACCTCC AGGCCATTCC TGGTCAACCT 
1301 TTTAAGTAGC ATTTCCAGCA TTCACACTTG ATACTGCACA TCAGGAGTTG 
1351 TGTCACCTTT CCTGGGTGAT TTGGGTTTTC TCCATTCAAG GAGCTTGTAG 
1401 CTCTGAGCTA TGATGCTTTT ATTGGGAGGA AAGGAGGCAG CTGCAGAATT 
1451 GATGTGAGCT ATGTGGGGCC GAAGTCTCAG CCCGCAGCTA AGTCTCTACC 
1501 TAAGAAAATG CCTCTGGGCA TTCTTTTGAA GTATAGTGTC TGAGCTCATG 
1551 CTAGAAAGAA TCAAAAAGCC AGTGTGGATT TTTAGGCTGT AATAAATGAG 
1601 GCAAAGGATT TCTATTCCAG TGGGAAGGAA ACCTCTCTAC TGAGTTGTGG 
1651 GGGATATGTT GTATGTTAGA GAGAACCTTA AGGAGTCCTT GTATGGGCCA 
1701 TGGAGACAGT ATGTGATAAC ATACCGTGAT TTTCATGAAG AAATTCTTCT 
1751 GTCCTAGAGT TCTCCCCTGC TGCTTGAGAT GCCAGAGCTG TGTTGTTGCA 
1801 CACCTGCAAA ACAAGGCACA TTTCCCCCTT TCTCTTTAAA GCCAAAGAGA 
1851 GATCACTGCC AAAGTGGGAG CACTAAGGGG TGGGTGGGGA AGTGAAATGT 
1901 TAGGCGATGA ATTCCTGAGC ACCTTGTTTT TCTTCCAAGG TTCGTAGCTC 
1951 CTCTCTGCCC TTCCAAGCCT GTAACCTCGG AGGACTATCT TTTGTTCTCT 
2001 ATCCTTTGTC TTGTTAGAGT GGGTCAGCCC CAGAGGAACT GATAAGCAAA 
2051 TGGCAAGTTT TTAAAGGAAG AGTGGAAAGT ACTGCAAATA AAAATCCTTA 
2101 TTTGTTTTTG TAGACTTTGT AATGCATATC ATTAGCCCTC ACTGTGATCA 
2151 TTACTGCTGT GGCTCTGAAC TGGCACATAG TACAGTGGAT GGAAGGTGCC 
2201 CGCACACCAG CTGAGAACTG GTTCTGGCCT AGGTGGGCTC TAGAACCATT 
2251 TACACAGCAT GAAAGAAACA GGTTGGGTTA GGAGCAGAAA GAAATAAGGC 
2301 TCACACCCCT CCAGACACTA CCTTAT/VAGC ACTGCAGAAC CTGAAACAGA 
2351 TGGCAGAAGG AATGGAATGC TACAGGGGCC AGCAGGAGTG ACCACAGGGA 
2401 GGGGACAGCT CAGTGACTGG AGCATTCAGG AAGAGGCTTT CCAGGGAACA 
2451 CTGGACATTG CTTAGTGACC TTTTGTTCCT TTTTTTTTTT TTTTCTTTTA 
2501 CTGTTCTGAA AGACTTTGAG TC7GTGGTTC ACCACCAGCC CATCAGTGTT 
2551 TCTTTGAGGT GATTGCATTA GGGAAGTTGG CTCTGGGATT GCAAAAAAAA 
2601 AAAAAAGGTG GAACATGTTT TCCTTAAAAG ATGGAAGGTT TTAGAAAATA 
2651 TACTAGGCCA TCTGGTTAGA AAAAACAGAC CAGACTAGAA AAAGCTGTGA 
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2701 ATTTGATTTT GTAGATTAAA CAAAGCCAGA TGATTAAAAT GTGATTTATT 
2751 TATAAAAAAA AAAAAAAAAA AAAA 


BLAST Results 


Entry HS4 63358 from database CMBL: 
human STS WI-14364. 
Length = 472 
Minus Strand HSPs: 

Score = 1605 (240.8 bits). Expect = 5.0e-68, P « 5.0e-68 
Identities - 347/361 (96%) 


Medline entries 


No Medline entry 


Peptide information for frame 1 


ORF from 16 bp to 960 bp; peptide length: 315 
Category: putative protein 
Classification: unset 


1 MAAGVPCALV TSCSSVFSGD QLVQHTLGTE DLIVEVTSND AVRFYPWTID 
51 NKYYSADINL CWPNKFLVT AEIAESVQAF WYFDSTRKS GLDSVSSWLP 
101 LAKAWLPEVM ILVCDRVSED GINRQKAQEW SLKHGFELVE LSPEELPEED 
151 DDFPESTGVK RIVQALNANV WSNWMKNDR NQGFSLLNSL TGTNHSIGSA 
201 DPCHPEQPHL PAADSTESLS DHRGGASNTT DAQVDSIVDP MLDLDIQELA 
251 SLTTGGGDVE NFERPFSKLK EMKDKAATLP HEQRKVHAEK VAKAFWMAIG 
301 GDRDEIEGLS 5DGEH 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for OKFZphf kd2_4 6a6, frame 1 

PIR:T04362 probable GTP-binding protein yptm3 - maize, N « 1, Score 
87, P = 0.21 

PIR:S71585 GTP-binding protein GB2 - Arabidopsis thaliana, N = 1, Score 
- 86, P 0.27 


>PIR:T04362 probable GTP-binding protein yptm3 - maize 
Length - 210 

HSPs : 

Score = 87 (13.1 bits), Expect = 2.4e-01, P « 2.1e-01 
Identities - 34/160 (21%), Positives - 67/160 (41%) 

Query: 48 TIDNKYYSADINLCWPNKFL-VTAEIAESVQAFVVYFDSTRKSGLDSVSSWLPLTVKAWL 106 

TIDNK I F +T ++ +D TR+ + ++SWL A+ 

Sbjct: 49 TIDNKPIKLQIWDTAGQESFRSITRSYYRGAAGALLVYDITRRETFNHLASWLEDAROHA 108 

Query: 107 PE VMIL— VCDRVSEDGINRQKAQEWSLKHGFELVELSPEELPEEDDDFPESTGVKR 161 

VM++ CD ++ ++ ++++ +HG +E S + ++ F ++ G 

Sbjct; 109 NANMTVMLIGNKCDLSHRRAVSYEEGEQFAKEHGLVFMEASAKTAQNVEEAFIKTAGT— 166 

Query: 162 IVQALNANVWSNWMKNDRNQGFSLLNSLTGTNHSIGSADPC 203 

I + + ++ N G+++ NS G S AC 

Sbjct: 167 IYKKIQDGIFDVSNESNGIKVGYAVPNSSGGGAGSSSQAGGC 208 


Pedant information for DKFZphf kd2_4 6a6, frame 1 


Report for DKFZphfkd2_4 6a6.1 


I LENGTH] 315 


403 


wo 01/12659 


PCT/IBOO/01496 


(MW) 34505.54 

IpIJ 4.55 

(KW) Alpha_Beta 

[KWJ LOW_COMPLEXITY 6.67 % 

SEQ MAAGVPCALVTSCSSVFSGDQLVQHTLGTEDLIVEVTSNDAVRFYPWTIDNKYYSADINL 

SEG 

PRD cccccceeeeecccccccccceeeeccccceeeeeeccccceeeecccccccccccccee 

SEQ CVVPNKFLVTAEIAESVQAFVVYFDSTRKSGLDSVSSWLPLAKAWLPEVMILVCDRVSED 

SEG 

PRD eeecccchhhhhhhhhhheeeeeeecccccccccccccccccccccccceeeeccccccc 

SEQ GINRQKAQCWSLKHGrELVELSPEELPEEDDDFPESTGVKRIVQALNANVWSNVVMKNDR 

SEG xxxxxxxxxxxxxxxxxxxxx 

PRD cchhhhhhhhhhcccceeeeccccccccccccccccccchhhhhhhhcccceeeeeeccc 

SEQ NQGFSLLNSLTGTNHSIGSADPCHPEQPHLPAADSTESLSDHRGGASNTTDAQVDSIVDP 

SEG 

PRD ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccch 

SEQ MLDLDIQELASLTTGGGDVENFERPFSKLKEMKDKAATLPHEQRKVHAEKVAKAFWMAIG 

SEG 

PRD hhhhhhhhhhhcccccccccccchhhhhhhhhhhhhhhccchhhhhhhhhhhhhhhhhhc 

SEQ GDRDEIEGLSSOGEH 
SEG 

PRD ccccccccccccccc 

(No Prosite data available for DKFZphf kci2_46a6. 1> 
(No Pfam data available for DKFZphf kd2_46a6. 1} 
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DKF2phfkd2_4 6blO 


group: kidney derived 

DKFZphfkd2_46bl0.1 encodes a novel 315 amino acid protein with similarity to C.eleqans cosmide 
F25B5.3 

The novel protein contains a HTH-LYSR- family PROSITE pattern. Proteins of the lysR family are 
bacterial transcriptional regulatory proteins which bind DNA using a helix-turn-helix motif. 
Most of these proteins are transcription activators and usually negatively regulate their own 
expression. They all possess a potential 'helix- turn-helix' DNA-binding motif in their N- 
terminal section. The * helix- turn-helix ' motif is missing in DKF2phfkd2 46a6.1. 
No informative BLAST results, no predictive PFAM or SCOP motive. ~ 

The new protein can find application in studying the expression profile of kidney-specific 
genes. 

similarity to C.elegans F25B5.3 

complete cDNA, complete cds, EST hits 

Sequenced by MediGenomix 

Locus: unknown 

Insert length: 1285 bp 

Poly A stretch at pos. 1266, no polyadenylation signal found 

1 CAGTCTACGC GAGCTGCCTG TTTTTTTCCT GCTTGGACGC GCATGAGGGC 
51 CCCGTCCATG GACCGCGCGG CCGTGGCGAG GGTGGGCGCG GTAGCGAGCG 

101 CCAGCGTGTG CGCCCTGGTG GCGGGGGTGG TGCTGGCTCA GTACATATTC 
^151 ACCTTGAAGA GGAAGACGGG GCGGAAGACC AAGATCATCG AGATGATGCC 

201 AGAATTCCAG AAAAGTTCAG TTCGAATCAA GAACCCTACA AGAGTAGAAG 

251 AAATTATCTG TGGTCTTATC AAAGGAGGAG CTGCCAAACT TCAGATAATA 

301 ACGGACTTTG ATATGACACT CAGTAGATTT TCATATAAAG GGAAAAGATG 

351 CCCAACATGT CATAATATCA TTGACAACTG TAAGCTGGTT ACGGATGAAT 

401 GTAGAAAAAA GTTATTGCAA CTAAAGGAAA AATATTACGC TATTGAAGTT 

451 GATCCTGTTC TTACTGTAGA AGAGAAGTAC CCTTATATGG TGGAATGGTA 

501 TACTAAATCA CATGGTTTGC TTGTTCAGCA AGCTTTACCA AAAGCTAAAC 

551 TTAAAGAAAT TGTGGCAGAA TCTGACGTTA TGCTCAAAGA AGGATATGAG 

601 AATTTCTTTG ATAAGCTCCA ACAACATAGC ATCCCCGTGT TCATATTTTC 

651 GGCTGGAATC GGCGATGTAC TAGAGGAAGT TATTCGTCAA GCTGGTGTTT 

701 ATCATCCCAA TGTCAAAGTT GTGTCCAATT TTATGGATTT TGATGAAACT 

751 GGGGTGCTCA AAGGATTTAA AGGAGAACTA ATTCATGTAT TTAACAAACA 

801 TGATGGTGCC TTGAGGAATA CAGAATATTT CAATCAACTA AAAGACAATA 

851 GTAACATAAT TCTTCTGGGA GACTCCCAAG GAGACTTAAG AATGGCAGAT 

901 GGAGTGGCCA ATGTTGAGCA CATTCTGAAA ATTGGATATC TAAATGATAG 

951 AGTGGATGAG CTTTTAGAAA AGTACATGGA CTCTTATGAT ATTGTTTTAG 
1001 TACAAGATGA ATCATTAGAA GTAGCCAACT CTATTTTACA GAAGATTCTA 
1051 TAAACAAGCA TTCTCCAAGA AGACCTCTCT CCTGTGGGTG CAATTGAACT 
1101 GTTCATCCGT TCATCTTGCT GAGAGACTTA TTTATAATAT ATCCTTACTC 
1151 TCGAAGTGTT CCCTTTGTAT AACTGAAGTA TTTTCAGATA TGGTGAATGC 
1201 ATTGACTGGA AGCTCCTTTT CTCCACCTCT CTCAACACAC TCCTCACCGT 
1251 ATCTTTTAAC CCATTTAAAA AAAAAAAAAA AAAAA 

BLAST Results 

No BLAST result 

Medline entries 

No Medline entry 


Peptide information for frame 1 


ORF from 43 bp to 1050 bp; peptide length: 336 
Category: similarity to unknown protein 
Classification: unset 

Prosite motifs: HTH^LYSR FAMILY (16-47) 
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1 MRAPSMDRAA VARVGAVASA SVCALVAGVV LAQYIFTLKR KTGRKTKIIE 
51 MMPEFQKSSV RIKNPTRVEE IICGLIKGGA AKLQIITDFD MTLSRFSYKG 
101 KRCPTCHNII DNCKLVTDEC RKKLLQLKEK YYAIEVDPVL TVEEKYPYMV 
151 EWYTKSHGLL VQQALPKAKL KEIVAESDVM LKEGYENFFD KLQQHSIPVF 
201 IFSAGIGDVL EEVIRQAGVY HPNVKVVSNF MDFDETGVLK GFKGELIHVF 
251 NKHDGALRKT EYFNQLKDNS NIILLGDSQG DLRMADGVAN VEHILKIGYL 
301 NDRVDELLEK YMDSYDIVLV QDESLEVANS ILQKIL 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_46bl0, frame 1 

SWISS PR0T:YQT3_CAEEL HYPOTHETICAL 42.0 KD PROTEIN F25B5 . 3 IN CHROMOSOME 
III-, N = 1, Score ^ 524, P = 2.2e-50 

TREMBL:AC005499_,12 gene: -T6A23.12-; Arabidopsis thaliana chromosome 
II SAC T6A23 genomic sequence, complete sequence., N = 2, Score «= 194, 


>SWISSPR0T:YQT3_CAEEL HYPOTHETICAL 42,0 KD PROTEIN F25B5.3 IN CHROMOSOME 
Length = 376 

HSPs: 

Score = 524 (78.6 bits), Expect = 2.2e-50, P = 2.2e-50 
Identities = 112/300 (37%), Positives = 174/300 (58%) 

Query: 44 RKTKIIEMMPEFQ~KSSVRIKNPTRVEEIICGLIKGGAAKLQIITDFDMTLSRFSYK-G 100 

"^^"^ 4+ ++ + + + + +PT V + ++ GGA K +I+DrD TLSRF+ + G 
Sbjct: 73 KKTDWPLLMNYLLGEEQILVADPTAVAAKLRKMWGGAGKTWISDFDYTLSRFANEQG 132 

Query: 101 KRCPTCHNIID-NCKLVTDECRKKLLQLKEKYYAIEVDPVLTVEEKYPYMVEWYTKSHGL 159 

T H + D N + E +K + LK KYY IE P LT+EEK P+M +W+ SH L 
Sb]Ct: 133 ERLSTTHGVFDDNVMRLKPELGQKFVDLKNKYYPIEFSPNLTMEEKIPHMEKWWGTSHSL 192 

Query: 160 LVQQALPKAKLKEIVAESDVMLKEGYENFFDKLQQHSIPVFIFSAGIGDVLEEVIRQA-G 218 

+V + K +++ V +S ++ K+G E+F + L H+IP+ IFSAGIG+++E ++Q G 
Sbjct: 193 IVNEKFSKNTIEDFVRQSRIVFKOGAEDFIEALDAHNIPLVIFSAGIGNIIEYFLQQKLG 252 

Query: 219 VYHPNVKWSNFMDFDETGVLKGFKGELIHVFNKHDGAL-RNTEYFNQLKDNSNIILLGD 277 

N +SN + FDE F LIH F K+ + + T +F+ + N+ILLGD 

SbjCt: 253 AIPRNTHFISNMILFDEDDNACAFSEPLIHTFCKNSSVIQKETSFFHDIAGRVNVILLGD 312 

Query: 278 SQGDLRMADGVANVEHILKIGYLNDRVDEL— LEKYMDSYDIVLVQDESLEVANSILQKI 335 

S GD+ M GV LK+GY N +D+ L+ Y + YDIVL+ D +L VA 1+ I 

Sbjct: 313 SMGDIHMDVGVERDGPTLKVGYYNGSLDDTAALQHYEEVYOIVLIHDPTLNVAQKIVDII 372 


Pedant information for DKF2phfkd2_46bl0, frame 1 


Report for DKFZphfkd2_46bl0. 1 


(LENGTH) 

[MWJ 

(pll 

(HOMOLl 

3e-51 

(PROSITE] 

(KW) 

(KWJ 


336 

37948.37 
6.67 

SWISSPR0T:YQT3_CAEEL hypothetical 42.0 KD PROTEIN F25B5.3 IN CHROMOSOME III. 
HTH_LYSR_FAMILY 1 

transmembrane 2 

LOW_COMPLEXITY 7.44 % 


SEQ 
SEG 
PRO 
MCM 

SEQ 
SEG 
PRD 
MEM 


MRAPSMDRAAVARVGAVASASVCALVAGWLAQYIFTLKRKTGRKTKIIEMMPEFQKSSV 
xxxxxxxxxxxxxxxxxxxxxxxxx 

cccchhhhhcchhhhhhheeehhhhhhhhhhhhhhhhhhhhhccceeeehhhhhh^^ 
MMMMMMHMMMMMMMMMMMMM«!MMMMMMMM 

RIKNPTRVEEIICGLIKGGAAKLQIITDFDMTLSRFSYKGKRCPTCHNIIDNCKLVTDEC 

eecccchhhhhhhhhhccccceeeeecccccceeeecccccccccccccccccchhhhhh 
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SEQ RKKLLQLKEKYYAIEVDPVLTVEEKYPYMVEWYTKSHGLLVQQALPKAKLKEIVAESDVM 

SEG 

PRD hhhhhhhhhhhheeeccccccccccchhhhhhccccchhhhhhccchhhhhhhhhhhhcc 

MEM 

SEQ LKEGYENFFDKLCXiHSIPVri FSAGIGDVLEEVIRQAGVYHPNVKWSNFMDFDETGVLK 

SEG 

PRD ccccchhhhhhhhhcccceeeeecccchhhhhhhhhhcccccceeeeeecccccccccee 

MEM MMMMMMMMMMMMMMMMMMM 

SEQ GFKGELIHVFNKHDGALRNTEYFNQLKDNSNIILLGDSQGDLRMADGVANVEHILKIGYL 

SEG 

PRD eccceeeeeeecccccccccchhhhhhhhceeeeecccccccccccccccccceeeeeec 
MEM 

SEQ NOR VDELLEKYMDS Y DI VL VQDESLE VANS ILQK I L 

SEG 

PRO cchhhhhhhhhhhhheeeeeecchhhhhhhhhhccc 

MEM 


Prosite for DKFZphf kd2_46blO. 1 
PS00044 16->47 HTH_LYSR_FAMI LY PDCX:00043 

(No Pfam data available for DKFZphf kd2_4 6bl0. 1) 
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DKFZphfkd2_46dl3 


group: kidney derived 

DKFZphf kd2_46dl3 encodes a novel 506 amino acid protein with weak similarity to KE03 protein 
The novel protein contains a RGD site. 

No informative BLAST results; No predictive prosite, pfam or SCOP motive 

The new protein can find application in studying the expression profile of kidney-specific 
genes. 


similarity to KE03 protein 

complete cDNA, complete cds, EST hits 

Sequenced by MediGenomix 

Locus: /map^"227.6 cR from top of Chrl linkage group" 
Insert length: 3346 bp 

Poly A stretch at pos. 3328, polyadenylation signal at pos. 3308 


1 CTCTCGCGAG AGGAGCAAGA GGAAGATGGC CGTGCCCTGT TTTTCGGTGT 
51 AAGGCAGCAG ACGGCGGCTG CGACGGCGAG ACTGAGATCC TGGTGTCGTG 
101 GGCACCTGAG TTCTAGCTTC CCCCAGCGAG CGCGCGTCCC TTCGTGCCTA 
151 GGCGAGAGCC GGCTCTTCCC CGGGAGATGC GTTTGTCCCA GGCTCGGGGG 
201 CTCAGTGGGA GTTCATGCTG CGCTGGAGGC TCTTGGCCAC CGCTCTAATC 
251 GCCTTGTGCC GCCGCAGCGC CAGCTCCGTC GCCAGCGGTG AGCCTCCCGA 
301 TTCCCCCCCT TGCCCCTGGC GGCGGCGATG ACCGGGGAGA AGATCCGCTC 
351 ACTGCGGAGG GACCACAAGC CCAGCAAAGA AGAAGGGGAC CTGCTGGAGC 
401 CCGGGGATGA AGAAGCGGCG GCTGCCCTCG GCGGTACCTT TACCAGAAGC 
451 AGGATTGGCA AGGGCGGCAA AGCTTGTCAT AAGATCTTCA GTAACCATCA 
501 CCACCGGCTA CAGCTGAAGG CAGCTCCGGC CTCCTCCAAT CCCCCCGGCG 
551 CCCCGGCTCT GCCGCTGCAC AATTCCTCCG TGACTGCCAA CTCCCAGTCC 
601 CCGGCCCTTC TGGCCGGCAC CAACCCCGTT GCTGTCGTCG CGGATGGAGG 
651 CAGTTGCCCC GCACACTACC CGGTGCACGA GTGCGTCTTC AAGGGGGATG 
701 TGAGGAGACT CTCCTCTCTC ATCCGCACGC ACAATATCGG GCAGAAAGAT 
751 AATCACGGAA ATACTCCTTT ACACCTTGCT GTGATGTTAG GAAATAAAGT 
801 TACAGCTCTT TTGAGGAAGC TTAAGCAGCA ATCCAGGGAA AGTGTTGAAG 
851 AAAAACGACC TCGATTATTA AAAGCCCTGA AAGAGCTAGG TGACTTTTAT 
901 CTAGAACTTC ACTGGGATTT TCAAAGCTGG GTGCCTTTAC TTTCCCGAAT 
951 TCTGCCTTCC GATGCATGTA AAATATACAA ACAAGGTATC AATATCAGGC 
1001 TTGACACAAC TCTCATAGAC TTTACTGACA TGAAGTGCCA ACGAGGGGAT 
1051 CTAAGCTTCA TTTTCAATGG GGATGCGGCG CCCTCTGAAT CTTTTGTAGT 
1101 ATTAGACAAT GAACAAAAAG TTTATCAGCG AATACATCAT GAGGAATCAG 
1151 AGATGGAAAC AGAAGAAGAG GTGGATATTT TAATGAGCAG TGATATTTAC 
1201 TCTGCAACTT TATCAACAAA ATCAATTTCT TTCACGCGTG CCCAGACAGG 
1251 ATGGCTTTTT CGGGAAGATA AAACAGAAAG AGTAGGAAAC TTTTTGGCAG 
1301 ACTTTTACCT GGTGAATGGA CTTGTTATAG AATCAAGGAA AAGAAGAGAA 
1351 CATCTCAGTG AAGAGGATAT TCTTCGAAAT AAGGCCATCA TGGAGAGTTT 
1401 GAGTAAAGGT GGAAACATAA TGGAACAGAA TTTTGAGCCG ATTCGAAGAC 
1451 AGTCTCTTAC ACCGCCTCCT CAGAACACTA TTACATGGGA AGAATATATA 
1501 TCTGCTGAAA ATGGAAAAGC TCCTCATCTG GGTAGAGAAT TGGTGTGCAA 
1551 AGAGAGTAAG AAAACGTTTA AAGCTACGAT AGCCATGAGC CAGGAATTTC 
1601 CCTTAGGGAT AGAGTTATTA TT6AATGTTT TAGAAGTAGT AGCTCCCTTC 
1651 AAGCACTTTA ACAAGCTTAG AGAATTTGTT CAGATGAAGC TTCCTCCAGG 
1701 CTTTCCTGTA AAATTAGATA TACCTGTGTT TCCCACAATC ACAGCCACTG 
1751 TGACTTTTCA GGAGTTTCGA TACGATGAAT TTGATGGCTC CATCTTTACT 
1801 ATACCTGATG ACTACAAGGA AGACCCAAGC CGTTTTCCTG ATCTTTAACT 
1851 GACGTGGAAA AGGATGCCGT CTAACCAAGG AAAGAAAATA CAGAGACCCT 
1901 AGAAGTGGAT CC AAATAGAA GGGACAAATG CTTTCAGTGA AGAAAAGGGA 
1951 ATTACACATT GAATCGACAC ATCAGTAATA CGATACAGTG AAATGGGCCT 
2001 CTAATAAGAA TTTCAGCGAG TTTTCTGATG TGCCATTTTT TGTCTTTTTA 
2051 AAAATATACA TATTATAAAT GTAATAGTTT GACACATTAA TGACCCTAAG 
2101 ACCTGCGTAT GTGAAGCAGC TATGAGTGCT GTGATTTGTT TTTAAAAATT 
2151 TTTACACTTC TTGTTGAAAT ATATATGCAT ATAAATATAT CTATATCTAT 
2201 ATCTATATCT AAAACACTCC TGGACCATTA ACGTAAATTA AATGTCTTAA 
2251 GAGATATGGA GCCCTTTTAA ACTTGTCATC TTTATGCAAG GTGACATTTA 
2301 TAAATATTCC TTCGAGCTTT GTTTTCATAA AATGTAAACT ATGTAACATT 
2351 ATGTATAGTT CAGTAATTTG AATGTTTGTT CAATATAATG AACTAGAAGG 
2401 AATGCAATTT TCTGTAGATG AATGAACCAA ATGGTAACCA TTAAACAATT 
2451 GCATTTATAT GTTGCAATAC ATTTCAGAAG GAGCGTTCAC TCTGCAGGGA 
2501 ATAAGGTACC TCCTTTAGCA CCTTAGTGCA ATTCATTGTG GTGCTATTTG 
2551 TTTTTACCTG AATGTTTGTT ACTAATCTTC CTTTCATAGA ACCTCTATTT 
2601 TTTTTTTTTC TAAACTTGAG TTTGAGTCCT TGTTATGGTC ATCATAAGGT 
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2651 AATGGTTAGC ATGTTTAAAG 

2701 AAAAAAATCC AAATTTTTAA 

2751 GATTATTTTG TTTGTTTTTA 

2801 AAAAAACAAT ACACATATTT 

2861 AATAAGATAA AAACATTTTA 

2901 AAAGTCCAGA AGTATACACA 

2951 TCACAGGAAA ATATTGATTT 

30O1 TACTCATTTT TGCACTTAAA 

3051 AGGTCCAAGT ATGAAAATAA 

3101 AGTATCATGT TGTATTAAAG 

3151 ATCCTGATGA ATGTCTCAAG 

3201 AGTAGCTTAA ACTTTTTTCA 

3251 GACCCTTATT GAAAATATGA 

3301 AAATTTTAAA TAAACATCTT 


ATATTCCTCT TCCAAATCTC AGCACTTTAA 
ACTTGCTTCC TAATAAGTAC ACATCGGTCT 
GTAGAATATG GATGCATTGG TGTCAGTTTT 
TGGACAACCC TACATATTTA ATCCTTTCAA 
TATGCTAACA GAATATATTT GTTACAAGTT 
AGATTGATTA CTCCTATTAT TTTTTTTAAA 
CATTGTCTCC AAAGTGATAA AATCTTGTAT 
ATTTTTCTTA TTTATTCCAA GGTGGTTTGA 
ATTAGGGGGA TTAAT6TATA ACAGTTATAA 
AGCTTACTTA GATTGATGTT TTTAAAATGT 
AATGCATCTG TCAAGTTTTT TAGACTGACC 
GGATTTTAGG TAATTTGAAA GGAGTTTAGA 
TTTAAAAATC CAAAGCATAA ACCGTAAGAA 
TAAAGCTGAA AAAAAAAAAA AAAAAA 


BLAST Results 


Entry HS121353 from database EMBL: 
human STS WI-14729. 
Score = 1697, p - 1.9e-69, identities « 363/379 


Medline entries 


No Medline entry 


Peptide information for frame 1 


ORF from 328 bp to 1845 bp; peptide length: 506 
Category: similarity to unknown protein 


1 MTGEKIRSLR RDHKPSKEEG DLLEPGDEEA AAALGGTFTR SRIGKGGKAC 

51 HKIFSNHHHR LQLKAAPASS NPPGAPALPL HNSSVTANSQ SPALLAGTNP 

101 VAVVADGGSC PAHYPVHECV FKGDVRRLSS LIRTHNIGQK DNHGNTPLHL 

151 AVMLGNKVTA LLRKLKQQSR ESVEEKRPRL LKALKELGDF YLELHWDFQS 

201 WVPLLSRILP SDACKIYKQG INIRLDTTLI DFTDMKCQRG DLSFIFNGDA 

251 APSESFWLD NEQKVYQRIH HEESEMETEE EVDILMSSDI YSATLSTKSI 

301 SFTRAQTGWL FREDKTERVG NFLADFYLVN GLVIESRKRR EHLSEEDILR 

351 NKAIMESLSK GGNIMEQNFE PIRRQSLTPP PQNTITWEEY ISAENGKAPH 

401 LGRELVCKES KKTFKATIAM SQEFPLGIEL LLNVLEVVAP FKHFNKLREF 

451 VQMKLPPGFP VKLDIPVFPT ITATVTFQEF RYDEFDGSIF TIPDDYKEDP 
501 SRFPDL 


BLASTP hits 


Entry CEC01F1_3 from database TREMBL: 

gene: "COlFl.6"; Caenorhabditis elegans cosmid COIFI. 

Score * 371, P = 4.5e-61, identities = 69/138, positives «= 96/138 

Entry CEC18F10_9 from database TREMBL: 

gene: "C18F10.7*; Caenorhabditis elegans cosmid C18F10. 

Score « 383, P « 3.4e-39, identities = 103/349, positives « 182/349 

Entry AFO 64604^1 from database TREMBL: 

product: •'KE03 protein**; Homo sapiens KE03 protein mRNA, partial cds . 
Score « 348, P = 8.3e-32, identities = 95/295, positives ^ 148/295 


Alert BLASTP hits for DKFZphf kd2_46dl3, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphf kd2_46dl 3, frame 1 


Report for DKFZphf kd2_46dl 3. 1 


[LENGTH] 

[MW] 

IPI) 


506 

57003.12 
6.40 
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tHOMOL) 

TREMBL : CEC 1 8F1 0_9 

gene: "C18F10.7"; Caenorhabditis elegans cosmid C18F10, 2e-35 

[blcx:ks] 

BL01288E 


[PROSITEJ 

RGD 1 


(PROSITEJ 

MYRISTYL 7 


(PROSITEJ 

CAMP PHOSPHO SITE 

2 

(PROSITE) 

CK2 PHOSPHO SITE 

9 

(PROSITEJ 

PKC PHOSPHO SITE 

6 

[PROSITEJ 

ASN_GLYCOSyLATION 

1 

IKW] 

Alpha Beca 


{KWJ 

LOW_COMPLEXITY 

7.51 % 


MTGEKIRSLRRDHKPSKEEGDLLEPGDEEAAAALGGTFTRSRIGKGGKACHKIFSNHHHR 
xxxxxxxxxxxx 
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FKGDVRRLSSLIRTHNIGQKDNHGNTPLHLAVMLGNKVTALLRKLKQQSRESVEEKRPRL 
eccchhhhhhhhhhcccccccccccccceeeecccchhhhhhhhhhhhcchhhhhhhjijih 
LKALKELGDFYLELHWDFQSWVPLLSRILPSDACKIYKQGINIRLDTTLIDFTDMKCQRG 

hhhhhhccccceeehhhhhccceeeeccccccceeeeeccceeeeeeeeecccccccccc 

DLSFIFNGDAAPSESFVVLDNEQKVYQRIHHEESEMETEEEVDILMSSDIYSATLSTKSI 
xxxxxxxxxx 

ceeeeeccccceeeeeeeecccceeeehhhhhhhhhhhhhhhhhhhhccceeeecccccc 
SFTRAQTGWLFREDKTERVGNFLADFYLVNGLVIESRKRREHLSEEDILRNKAIMESLSK 


eeeecccceeeecccchhhhhhheeeeeeeeeeeeehhhhhhhhhhhhhhhhhhhhhhhc 
GGNIMEQNFEPIRRQSLTPPPQNTITWEEYISAENGKAPHLGRELVCKESKKTFKATIAM 


cceeeccccccccccccccccccccccccccccccccccccccccchhhhhhWihhhhhh 
SQEFPLGIELLLNVLEVVAPFKHFNKLREFVQMKLPPGFPVKLDIPVFPTITATVTFQEF 
hhcccchhhhhhhhhhhhhhlihhhhhhhhhhhhhcccccceeeeeeeeeeehhhhhhhcc 
RYDEFDGSIFTIPDDYKEOPSRFPDL 


cccccccc ee eccccpcccccccccc 


Prosite for OKFZphf Jcd2_46dl3. 1 


PSOOOOl 

82->86 

ASN 

GLYCOSYLATION 

PDOCOOOOl 

PS00004 

126 

->130 

CAMPPHOSPHO SITE 

POOC00004 

PS00004 

373 

->377 

CAMP PHOSPHO SITE 

PDOC00004 

PS00005 

8->ll 

PKC 

_PHOSPH0 

SITE 

PDOC00005 

PS00005 

296 

->299 

PKC' 

"PHOSPHO" 

"site 

PDOC00005 

PSOOOOS 

316 

->319 

PKC" 

"PHOSPHO" 

"site 

PDOC00005 

PS00005 

336- 

->339 

PKC' 

"PHOSPHO" 

"site 

PDOC000O5 

PSOOOOS 

410- 

->413 

PKC' 

"PHOSPHO" 

'site 

PDOC00005 

PSOOOOS 

413- 

->416 

PKc' 

"PHOSPHO' 

"site 

PDOC00005 

PSOOOOS 

16->20 

CK2" 

'PHOSPHO' 

'site 

PDOC000O6 

PSOOOOS 

172- 

->176 

CK2* 

>HOSPH0" 

'site 

PDOC00006 

PSOOOOS 

228- 

->232 

CK2~ 

]PHOSPHO~ 

"site 

PDOC00006 

PSOOOOS 

274->278 

CK2' 

'PHOSPHO" 

"site 

PDOC00006 

PSOOOOS 

278->282 

CK2' 

'PHOSPHO" 

"site 

PDOCO00O6 

PSOOOOS 

344- 

->348 

CK2 

"pHOSPHO" 

"site 

PDOCOOOOS 

PSOOOOS 

386- 

">390 

CK2' 

"PHOSPHO]^ 

"site 

PDOC00006 

PSOOOOS 

476- 

■>480 

CK2" 

PHOSPHO* 

"site 

PDOCOOOOS 

PSOOOOS 

491- 

•>495 

CK2' 

PHOSPHO" 

"site 

PDOCOOOOS 

PSOOOOS 

35->41 

MYRISTYL 


PDOCOOOOS 

PS00008 

46->52 

MYRISTYL 


PDOCOOOOS 

PSOOOOS 

108- 

>114 

MYRISTYL 


PDOCOOOOS 

PSOOOOS 

138- 

>144 

MYRISTYL 


PDOCOOOOS 

PS00008 

155- 

>161 

MYRISTYL 


PDOCOOOOS 

PSOOOOS 

320- 

>326 

MYRI 

STYL 


PDOCOOOOS 

PS00008 

487- 

>493 

MYRISTYL 


PDOCOOOOS 

PS00016 

239- 

>242 

RGD 



PDOCOOOIS 


(No Pfam data available for DKFZphf kd2_46dl3. 1) 
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DKFZphfkd2_46j20 


group: metabolism 

DKF2phfkd2_346j20 encodes a novel 224 amino acid protein similar to 2-hydroxyhepta-2, 4-diene- 
1,7'dioate isomerase. 

The new protein seems to be the human ortholog of 2-hydroxyhepta-2, 4-diene-l, 7-dioate 
isomerase . 

The new protein can find application in modulating the homoprotocatechuate degradative pathway 
and as a" enzyme for biotechnologic production processes . 


strong similarity to 2-hydroxyhepta-2, 4-diene-l, 7-dioate Isomerase 

complete cDNA, complete cds, EST hits, 

potential start at Bp 16 matches kozak consensus ANCatgG 

strong similarity to proteins of worm plant archea and bacteria 

2-hydroxyhepta-2, 4-diene-l,7-dioate isomerase is part of 

the tyrosine metabolism (degradation of tyrosine late step) EC 5.3.1.- 

complete cds according to similar C.elegans and A.thaliana protein 

Sequenced by MediGenomix 

Locus : unknown 

Insert length: 1706 bp 

Poly A stretch at pos. 1686, polyadenylation signal at pos. 1667 


1 CACTTGATGG GAATCATGGC AGCATCCAGG CCATTGTCCC GCTTCTGGGA 
51 GTGGGGAAAG AACATCGTCT GCGTGGGGAG GAACTACGCG GACCACGTCA 
101 GGGAGATGCG CAGCGCGGTG TTGAGCGAGC CCGTGCTGTT CCTGAAGCCG 
151 TCCACGGCCT ACGCGCCCGA GGGCTCGCCC ATCCTCATGC CCGCGTACAC 
201 TCGCAACCTG CACCACGAGC TGGAGCTGGG CGTGGTGATG GGCT^GCGCT 
251 GCCGCGCAGT CCCCGAGGCT GCGGCCATGG ACTACGTGGG CGGCTATGCC 
301 CTGTGCCTGG ATATGACCGC CCGGGACGTG CAGGACGAGT GCAAGAAGAA 
351 GGGGCTGCCC TGGACTCTGG CGAAGAGCTT CACGGCGTCC TGCCCGGTCA 
401 GCGCGTTCGT GCCCAAGGAG AAGATCCCTG ACCCTCACAA GCTGAAGCTC 
451 TGGCTCAAGG TCAACGGCGA ACTCAGACAG GAGGGTGAGA CATCCTCCAT 
501 GATTTTTTCC ATCCCCTACA TCATCAGCTA TGTTTCTAAG ATCATAACCT . 
551 T6GAAGAAGG AGATATTATC TTGACTGGGA CGCCAAAGGG AGTTGGACCG 
601 GTTAAAGAAA ACGATGA6AT CGAGGCTGGC ATACACGGGC TGGTCAGTAT 
651 GACATTTAAA GTGGAAAAGC CAGAATATTG AGTTATTTCT TAACAAGTTT 
701 CGAGAGAGAA GGGAGCAAGA CAAGAGCAAG CAACGGCTAT TAAATGTCAC 
751 AATCCTTTAA TTAGAAACCA TTTATTGGCC GGACGCGGTG GCTCACGCCT 
801 GTAATCGCAG CACTTTGGGA GGCCGAGGCG GGCGGCTCAC GACGTCAGGA 
851 GATCCAGACC ATCTTGGCTA ACAGGGTGAA ACCCCGTCTC TACTAAAAAT 
901 ACAAAAAATT AGCCGGGCGT GGTGGCGGGC GCCTGTAGTC CCAGCTACTC 
951 TGGAGGCTGA GGCAGGAGAA TCAATTGAAC CCGGGAGGCG GAGCTTACAG 
1001 TGAGCTGAGA TTGCGCCACT GTACTCCTGG GCAACAGCGA GACTCCGTCT 
1051 CAAAAAAAAA AAAAAAAAAA AGAAACCATT TATTTTAAAA ATGATTAGAT 
1101 TGCTATGCCT CAACTCATAG AAGATGAACC CTTCAAGAAA ACGTGAAGTA 
1151 GAACGGGTGG GCCAGAAATG AAAACAGGCA AGTAAAGTAT TTCTTCGGAA 
1201 AACATTTTAT CAAACCAAAT GTTAAAAAGA CTTTCCTTTT GTAAAACTGG 
1251 ATTAGAGAAG ACTTTTCAGT GGGTTATCTC TAGGATGATC AGTAGTTCAG 
1301 CACTTAAAAA CTGCAGAGAA AACTGAAAGT TATGTTCCAG ATAACTTTCC 
1351 GTTGTTTACC AAATTTTCTT AGATTTGGTC ATCATCAGGA AGCATTTGTA 
1401 AAAATAAAAA TCTCCACAAA TTACTGGCCC ATCTCGGACT TGCTGAATCA 
1451 ATTTGATAGG ATTAATCTCC AGTGAAGCTG TGTTTACAGG GCATTCCAAG 
1501 TGATTCTTAT CAGGAAATGT GAAAAACACT CCTGTACATA ATCGGTTAAT 
1551 TTAAAATTTT ACTTAATAAG TGAACAAGTA ATGAAGATTT CACCTGTTTA 
1601 CTTAGGGTAT CTACCCAGAC CCATCGATTC TGAGTTCGGG AGATGATTTT 
1651 GAAATTACTG TTTTCCAAAT AAAGGTGCTC CCTTCCAAAA AAAAAAAAAA 
1701 AAAAAA 


BLAST Results 


No BLAST result 


Medline entries 
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94039092: Purification, nucleotide sequence and some properties of a bifunctional 
isomerase/decarboxylase from the homoprotocatechuate degradative pathway of £scherichla coll 


Peptide information for frame 1 


ORF from 7 bp to 678 bp; peptide length: 224 
Category: strong similarity to known protein 


1 MGIMAASRPL SRFWEWGKNI VCVGRNYADH VREMRSAVLS EPVLFLKPST 

51 AYAPEGSPIL MPAYTRNLHH ELELGVVMGK RCRAVPEAAA MDYVGGYALC 

101 LDMTARDVQD ECKKKGLPWT LAKSFTASCP VSAFVPKEKI PDPHKLKLWL 

151 KVNGELRQEG ETSSMIFSIP YIISYVSKII TLEEGDIILT GTPKGVGPVK 

201 ENDEIEAGIH GLVSMTFKVE KPEY 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf Jcd2_46j20, frame 1 

PIR:S44919 ZK688.3 protein - Caenorhabditis elegans, N *= 1, Score = 
537, P = 8.7e-52 

PIR:D71109 probable 2-hydroxyhepta-2, 4-diene-l, 7-dioate isomerase - 
Pyrococcus horikoshii, N = 1, Score = 529, P = 6.1e-51 

PIR:C71425 hypothetical protein - Arabidopsis thaliana, N = 1, Score = 
519, P = 7e-50 

PIR:A64864 probable 2-hydroxyhepta-2, 4-diene-l, 7-dioate isomerase bll80 
- Escherichia coli, N = 1, Score « 474, P « 4.1e-45 


>PIR:S44919 ZK688.3 protein - Caenorhabditis elegans 
Length =214 

HSPs: 

Score = 537 (80.6 bits). Expect = 8.7e-52, P «= 8.7e-52 
Identities = 99/211 (46%), Positives = 138/211 (65%) 

Query: 10 LSRFWEWGKNIVCVGRNYADHVREMRSAVLSEPVLFLKPSTAYAPEGSPII24PAYTRNLH 69 

L+ F IVCVGRNY DH E+ +A+ +P+LF+K ++ EG PI+ P +IILH 

Sbjct: 4 LAGFRNLATKIVCVGRNYKDHALELGNAIPKKPMLFVKTVNSFIVEGEPIVAPPGCQNLH 63 

Query: 70 HELELGVVMGKRCRAVPEAAAMDYVGGYALCLDMTARDVQDECKKKGLPWTLAKSFTASC 129 

E+ELGW+ K+ + ++ AMDY+GGY + LDMTARD QDE KK G PW LAKSF SC 
Sbjct: 64 QEVELGVVISKKASRISKSDAMDYIGGYTVALDMTARDFQDEAKKAGAPWFLAKSFDGSC 123 

Query: 130 PVSAFVPKEKIPDPHKLKLWLKVNGELRQEGETSSMIFSIPYIISYVSKIITLEEGDIIL 189 

P+ F+P IP+PH ++L+ K+NG+ +Q T MIF IP ++ Y ++ TLE GD++L 
Sbjct: 124 PIGGFLPVSDIPNPHDVELFCKINGKDQQRCRTDVMIFDIPTLLEYTTQPFTLEVGDWL 183 

Query: 190 TGTPKGVGPVKENDEIEAGIHGLVSMTFKVE 220 

TGTP GV + D IE G+ ++ F V+ • 
Sbjct; 184 TGTPAGVTKINSGDVIEFGLTDKLNSKFNVQ 214 


Pedant information for DKFZphf kd2_46j20, frame 1 


Report for DKFZphf kd2_4 6 j 20.1 


(LENGTH1 224 

(MW] 24843.07 

[pl] 6.96 

[HOMOL] PIR:S44919 ZK688 . 3 protein - Caenorhabditis elegans 8e-55 

tFUMCATJ r general function prediction (M. jannaschii, MJ1656] 9e-40 

[FUNCAT] 99 unclassified proteins (S. cerevisiae, YNL168c) 4e-38 

[EC] 5.3.3.10 5-Carboxyinethyl-2-hydroxymuconate delta-lsomerase le-35 

[PIRKW] isomerase le-35 

tPIRKWJ intramolecular oxidoreductase le-35 

(SUPFAMl 2-hydroxyhepta-2,4-diene-l,7-dioate isomerase le-46 

(PROSITEJ MYRISTYL 4 

[PROSITE] AMIDATION 1 
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tPROSITEI CK2 PHOSPHO SITE 2 

IPROSITE] PKC^PHOSPHO^SITE 3 

(KW] Alpha_Beta 


SEQ MGIMAASRPLSRFWBWGKNIVCVGRNYADHVREMRSAVLSEPVLFLKPSTAYAPEGSPIL 

PRD cccccccccchhhhhhcceeeeeecchhhhhhhhhccccccceeeecccccccccccccc 

SEQ MPAYTRNLHHELELGVVMGKRCRAVPEAAAMDYVGGYALCLDMTARDVQDECKKKGLPWT 

PRD cccccchhhhhhheeeccccccccchhhhhhhheeeeeeccchhhhhhhhhhhhcccccc 

SEQ LAKSFTASCPVSAFVPKEKIPDPHKLKLWLKVNGELRQEGETSSMIFSIPYIISYVSKII 

PRD cccccccccccceeeecccccccccceeeeecccccccccccccceeechhhhhhhhhhh 

SEQ TLEEGOI ILTGTPKGVGPVRENDEIEAGIHGLVSMTFKVEKPEY 

PRD hccccceeeeccccccccccccceeeeeeccccccccccccccc 


Prosite for DKFZphf kd2_46j20. 1 


PS00005 

104~>107 

PKC PHOSPHO 

SITE 

PDOC00005 

PS00005 

192->195 

PKC PHOSPHO' 

"site 

PDOC00005 

PS00005 

216->219 

PKC PHOSPHO" 

'site 

PDOC00005 

PS00006 

104->108 

CK2 PHOSPHO' 

"site 

PDOC00006 

PS00006 

181->185 

CK2 PHOSPHO' 

"site 

PDOC00006 

PS00008 

2->8 

MYRISTYL 


PDOC00008 

PS00008 

75->81 

MYRISTYL 


PDOC00008 

PS00008 

116->122 

MYRISTYL 


PDOC00008 

PS00008 

191->197 

MYRISTYL 


PDOC00008 

PS00009 

78->82 

AMIDATION 


PDOC00009 


(No Pfam data available for DKFZphfkd2_46j20.1) 
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DKrZphfkd2_46kl9 


group: transcription factors 

DKFZphfkd2_4 6kl9.3 encodes a novel 130 amino acid protein similar to rat Dcoh, a bifunctionaX 
protein-binding transcriptional co-activator. 

Dcoh is a bifunctional protein, complexed with biopterin. It serves as dimerization cofactor 
of hepatocyte nuclear factor-1 and catalyzes the dehydration of the biopterin cofactor of 
phenylalanine hydroxylase. 

The new protein can find application in modulating/blocking the expression of genes controlled 
by the hepatocyte nuclear factor-1. 


strong similarity to pterin-4-alpha-carbinolaraine dehydratase 

potential start at Bp 102 according to similar proteins, 
both genomic sequences are from chromosome 5, 

Sequenced by MedxGenomix 

Locus: map^^S" 

Insert length: 5641 bp 

Poly A stretch at pos. 5617, polyadenylation signal at pos. 5598 


1 CAGCCCTCGG CAGACGGCCA ATGGCGGCGG TGCTCGGGGC GCTCGGGGCG 
51 ACGCGGCGCT TGTTGGCGGC GCTGCGAGGC CAGAGCCTAG GGCTAGCGGC 
101 CATGTCATCA GGTACTCACA GGTTGATTGC AGAGGAGAGG AACCAAGCTA 
151 TACTTGACCT TAAAGCAGCA GGATGGTCGG AATTAAGTGA GAGAGATGCC 
201 ATCTACAAAG AATTCTCCTT CCACAATTTT AATCAGGCAT TTGGCTTTAT 
251 GTCCCGAGTT GCCCTACAAG CAGAGAAGAT GAATCATCAC CCAGAATGGT 
301 TCAATGTATA CAACAAGGTC CAGATAACTC TCACCTCACA TGACTGTGGT 
351 GAACTGACCA AAAAAGATGT GAAGCTGGCC AAGTTTATTG AAAAAGCAGC 
401 TGCTTCTGTG TGATTTCTTC CAAAATACAT AAGTCTGAGA GGCTAAACTT 
451 GATGGCTGTG TTAACATATG TCACGTGTAG CACAGTGGAG AAAGCAGGAT 
501 ATGGCTCATA ATGACAGTGG TGAAGACCTG CGAATGAAGT TGCTAGTTAA 
551 CACCTACATT AGGGTTTGAC ATAGGTCTAT GTTATGGGTC GCTGCATCTG 
601 CTGGAACTCA CAGACTTTAC TATAGAGAAT CAAAGATCCC GTATCCGAAG 
651 TCTATGGAAA TGCTCATGGT GGTAAATTCC AACAGAATGA AACACCAAAC 
701 TTGCTTAAAG TAACTCACGT TTCAATTTGA AAGAGATATT GTCAAAATTG 
751 GAGGCCCCCA GGTTCCTGTC TGTTCCAAAT CTTTGCATGA TGACAGTGGT 
801 TTCTCTGATG TGGTAAGCTT TGGCTTTCTT CTGTTTTCTT TCTAAAAGAT 
851 CACTGGAGTA GAGAGGAGTT AAACAGACAT GACCTTTGAC CTCTTGCATG 
901 ACCTCCACAG ATAGCAAACC GGGCCGACAC ATGGTTGACG ATGTCCTTTT 
951 CTACAATGAA GTTAATGAAA GTTCTGAAAA TAGTGATTAC TTTCTGACAT 
1001 TGATAGGATT TAGGAAACCT CTGGATAAAT AGCTTAAGCA TGGCTGTTTA 
1051 TGTTTTTGCT ATAGACAAAA AGCAGCAGCA TGTACATTGT ATTTGGACAC 
1101 AAGCCTGCCT CGGTTAATAT ATTGAACTAT TGGACCACTA GGGTTAGTAG 
1151 GGAGCGGTCT GTACACTTTC TGATTCAGCA TTCAGAAACA TTCTAGGTGG 
1201 ACTCTGTAGC TTTCAGTTTT GTAAAGTTAT CGGAAAAACA TCGGGAGGGT 
1251 TTGGCCATCA TATGTGAGCT TTGTGTTTCA ATGCCAGTTA CTCAGGATTA 
1301 GTAAATTAAT GACTGTCCAG AGGACTTCAG GGTCACCAAG CTGCTGCACC 
1351 TGCCATTGGC TGACTCTCCC CGGCTATCTG TGGCTGAGAT GGTGCTGCTT 
1401 AGGTCACGCA GAGCATGAGC TGCTGCTGAA AGGGCACAGG AGATGGCCCT 
1451 TGGGCTTCTC ATCCCAGGAT GCCTGCCCTG CCCACCAATC CATGAGAAGA 
1501 TATGTATGAT TTCAGTAGGC CCTGGATCAG CTTGTCACCT CTGGTTTCCT 
1551 GTTTGCTTTC CACTCACTCA GCTGGAGTTT CATTTCCAGA CTAAAGTCTT 
1601 CATCATTGGC TTCAGAAACA GCATTCATCT GTGGCTGTGC TGATGTAGTA 
1651 CACCAAGAAC AACTGGGCTC TTCTCTGTCA CTTTCAGTGG GCTACCTTCC 
1701 CTCACCTCTC CAAGCAGCAT GAAAGAATTC TTTACATTTT TAATCTCTTT 
1751 TTTGTTTTTC CCTGAAAGTA TGCTTTGGTG CTTAAAGA6A GAAGTCACAA 
1801 AAGTATACTA CTGAGTTTCC TGGAGATGAA ATCCTGTTGT CCCTAGCTAT 
1851 GTGAATGAGC ACAGGGATCC CTGATGCCAT TATTTTGTAT ATTCATACGG 
1901 CACACACTTA CTGAGGGCCT TCTGTGTGCC CTAGGGGATT GAGCACAGTG 
1951 ACATATCAGG GCAGGTAGAA ACAGATGGAG AGCTGATGCG GGCTGTCTTA 
2001 GAGCAGCTGC CCCAGGAGGC CCCTGTGGAT GGATGTTGGG CAGGAGCCCT 
2051 GAGACGTTAG GGGCATATAA CTAAAGGACA TAGCAGGAGT TATAGGAGGA 
2101 GCTGATCCCT GAGGGAAACA ATGAAGACGG AGAAGATGGG GCTAAAGTTT 
2151 GAATTGTGGG GACATTAATC ACGGTGATTC TTAAAACTTT GCTGTTGATG 
2201 ATTTTAAATG GAGAAAATGA GTACGTAAGA TGTTATTTCC CAGTTCAGTA 
2251 TATAGGTTGC CCACAAAGTA TTTTCCTACC ATGAATGGTC ATATATACTT 
2301 GTTGTAGAAT ACCAGGGACA GCAGAGATGG TGGGGTAGTT ACTTCCTTTT 
2351 CTTACAGCCC AAGAACTTTG GTGTCCAGGA GATTGACCAA TTTAGCCACT 
2401 GAGCATTTAA TACAACACAG GGCTACCCAG ATCCCACTGT CCTGATTTGC 
24 51 CCTGAAAGCC AAAGGAGTCA GGAGAAGGTG AGTGGGGTGA ATATATTAAT 
2501 CCTGAGAGTT GAACAGAGCA AAAATCCCTA TTACTTTTGT ACTTAAAACA 
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2551 TCTCTGCCAC ATGTGCTCAC TCTTTATATT CTGTTTAGGT GGTTTATATG 
2601 TGCACATCCC ATCCTATGCC TGCAGTTAGC CAACTCAGGG TTTATATTGC 
2651 CTCCTTTCTT TTTTTCTTTT ttTTTTTTTT TTTTAAGAGA TGGGGTCTCG 
2701 TTCTGTCATG CAGACTGGAG TGCAGTGGTG TGATCACAGC TCATTGTAAC 
2751 CTCCAACGCC TGGACTGAAG TGATCCTCCT GCCTTGGCCT CTCTGGTAGC 
2801 TGGGACTACA GGTGCATGCC ACCACACCCA CCTAATTTTT TTTATTTTTA 
2851 TTTTTTGTAG AGACAGTCTC ACTATCTTGC TCGGGCTGGT CCTGAACTCC 
2901 TGGGCTCAAG TTATCTTGCT GCCTCAGCCT CCCATGGGTA ATCTTTATTT 
2951 CCTTTTTTTT TTTTTTTTGG AGATGGAGTT TCGCTCTTGT CGCCCAGGCT 
3001 GGAGTGCAAT GGCACGATCT TGGCTCACTG CAGTCTCCAC CTCCTGGGTT 
3051 CAGGTGATTC TCCATCCTCG GCCTACTGAG TAGCTGAGAT TACAGGCAAC 
3101 TGCCACCATG CGCGGCTAAT TTGTGTATTT TTTTTTAGTA AGAGATGGGG 
3151 TTTCGCCATG TTGGCCGGAC TGGTCTTAGA CTCCTGACCT CAAGCGACCT 
3201 GCCTGCCTTG GCCTCCCAAA GTGCTGGGAT TACAGGCATG AGCCGCTATG 
3251 CCTCGTCGCT GATTTTTATT TCTTATTTTT TTTTTAGAGA TGGGGGTCTC 
3301 ACTATGCTGC TCAGGCTGAT CTCAAACTCC TGGGCTCAAG TGATCCTCCC 
3351 ACCTTAGCCT CCCAAGTTGC TGGGATTATA AGTGTGAGCC ACTATCCCTA 
3401 CCTCACTATT ACCTTCTTTG CTTCTCTTGT TTTCTTTTGT TCTAAGTCAA 
3451 ACCCATCACA ATCTTTTCTT GTCCTTCCAG GTGTTTTCCA GTGCTGTGCC 
3501 CTGGATGTGC TCTCTTTCTC TTAGAGCCCA GAGAACTTGC TTTTCCCCCT 
3551 TATATATGAC CCTTAACTTT TTCTAACACA TTATTAAGGG CCTGTGTCTA 
3601 TCAGCTGGGG GCACTTCTTG AAGGGAGGGC CTTTGTGTGG TCTGTTTCTA 
3651 GTGACTTCCA GCTTTAACCC AGAGCCTCAT GATTGCTGGG TGCCCATAGC 
3701 CTTTTTGCTG AATGGAGGCA CTCAGTCTCC TTGGGAAGAG AGAATCCATG 
3751 ATAGACCCAC TTGGGAGCTC CCCACTTCAG GGGCCTACAC ACTGGTAATG 
3801 CAACAGAATG CCCAAGAGTG ACCTCATAAA GCAAGGATTC CCTTCGTGGC 
3851 CCCTTCTCTG CTGCCTCTCA GAATCCAGAC GCTAAGGAAA ATCCCTAAGC 
3901 AGAGATTTTC TGTTGGATGC TAAAAGCAAG GAATAAAAGT TGAAAATTTG 
3951 GAAAATGTCT CAACACCGTC ACCAGCGCCA CTCGAGAGTC ATTTCTAGTT 
4001 CACCAGTTGA CACTACATCG GTGGGATTTT GCCCAACATT CAAGAAATTT 
4051 AAGTAAATAT TATCTATCTC CATTGCCTGT TAAGAAATGT GCTAGTAGAA 
4101 GTGTGAGGGC AGGGTGTCAG TGTTCTCTCA GCCTCTTCCC TCAGATACTC 
4151 GTCTGCTTAC CAAAATAAGT TGCATGTCCT TGACAATCTG GTTTCTATGA 
4201 TTGGTGAGGC TGGCATGCTA TTACCTTTAT GTGCCCTGTA GACTTGAATG 
4251 ACCAGTTTGA CCAGTTTGAC TGTTAGATAA TCAGAAGGCT TTTCTCTTTT 
4301 TTTATAATAG ACCCCATCTC AAATCAGATA ATGAAAATTA CATATCTTGA 
4351 TATATTAGAA AAGTATATAC ATTCTGGCTG GGCACGGTGG CTCACGCCTG 
4401 TAATCCCTGC ACTTTGAGAG GCTGGGGCGG ATCACTTGAG GTCAGGAGTT 
4451 TGAGACCGGC CTGGCCAGCG TGGCGAAACC CCATCTCTAC TAAAAATACA 
4501 CAGATTAGCC CGGAGTGATG GTGTGCACCT GTTGTCCCAG CTACTCAGGA 
4551 TGCTGAGGCA GGAGAATCCC TTTAACCTGG GGGGCGAAGG TTGCAGTGAG 
4601 CCAGGATTGC ACCACTGCAC TCCAGCCTGG GTGACGGAAC GGGACTCTGT 
4 651 CTCAGAAAAA AAAAAAAAGA AGAGGAAAAA GAAAAATATA TATTCTATAT 
4701 TTTTTTAACT TATGAGAATG TGTTCATTTC ATTTGTAACA TATAATGGGA 
4751 AACAGTAATA CGTACTCTGA GAAAAATTGC AAAGCACAGA TAAATGGAAA 
4801 TAAACAGGAA AAAGAATCAC CTATAACCTC ACCATCCATA GACAGACACT 
4851 GTTAAAATTT TGGCATATTT CCTGCTGATT TTTTCTACTG CTGATTTTTG 
4901 CACAGGTGAG ATAATTTTGA ACAGAGAATT TTGTATCTTT GGTTTTTGTG 
4951 TTTCGCTGCA CACAAAAACA AAAGATATAA AAATGGATCA TAAACATTTT 
5001 TCTAAATCCT GAAAAGTGCA TAGACATATT TTAGTGCCTG TATTTCACAA 
5051 GATGGACATA CCATAATTTA CTTACACAGT CCTTTTTGTT AGATGTTTAA 
5101 GTTGTTTTCA AGCTTCTCAG TGCTGGAAAA AATACTGAGA TAGACATGTT 
5151 TAGTTGAAGT TATTTCATTT CAGGTTATAT TATCTTGGGT CAGAGAATGA 
5201 ATGGTTCTCA GGCTTTTCAA AAGAGCTGGT CAGTTTTTAT GCCTCTGGCA 
5251 GTTTTTGAGA GTGCTCAATC ATACTACACT GTTGCCAGCA TTAGATCTTA 
5301 TCACATTTAA GTCATTGCTA ATTTTATAAA CAAAAACAAT GGTTTTACTT 
5351 TGCATCTCCC TGATTGGTGT TGCTGTAGAA CATATTTGGA GAAGTTTGTT 
5401 TGTCTTTGGT GTTTATTCCA TGAATAGATT GTGTGCCCAT TTTCTCTTGG 
5451 GGTATTCAGT TTTTTATTAC TGATGTGAGC ATGTGTATGG GTGATTATTT 
5501 GATGATTATC AGTTTTGCTT AGTAGACTGG CAATATTTAG TCTTGCTGTC 
5551 ACTGTGTTCC CAGTGCCAAC TAGATTGCTT GATATGTAGT TGCCACTCAA 
5601 TAAAGATTTG TTGAGTCAAT GAAAAAAAAA AAAAAAAAAA A 


BLAST Results 


Entry AC004764 from database EMBL: 

Homo sapiens chromosome 5, PI clone 255g5 (LBNL H61), complete 
sequence . 

Score = 11057, P = O.Oe+00, identities = 2217/2224 
Bp 428-5625 of cDNA == Bp 2912-8107 of AC004764 

Entry HSAC1555 from database EMBL: 

Homo sapiens (subclone l_d8 from BAC H75) DNA sequence, complete 
sequence . ~ 

Score « 575, P 5,le-30, identities - 115/115 
Bp ^240- 430 of CDNA « HSAC1555 splice pattern 
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Medline entries 


93106787; 

Phenylalanine hydroxylase-stimulating protein/pterin-4 
alpha-carbinolamine dehydratase from rat and human liver. 
Purification, characterization, and complete amino acid 
sequence . 

93101632: 

Identity of 4a-carbinolamine dehydratase, a component of 
the phenylalanine hydroxylation system, and DCoH, a 
transregulator of homeodomain proteins. 

95242099: 

Crystal structure of DCoH, a bif unctional, protein-binding 
transcriptional coactivator 


Peptide information for frame 3 


ORF from 21 bp to 410 bp; peptide length: 130 
Category: strong similarity to known protein 


1 MAAVLGALGA TRRLLAALRG QSLGLAAMSS GTHRLIAEER NQAILDLKAA 
51 GWSELSERDA lYKEFSFHNF NQAFGFMSRV ALQAEKMNHH PEWFNVYNKV 
101 QITLTSHDCG ELTKKDVKLA KFIEKAAASV 

BLAST P hits 

Mo BLAST? hits available 

Alert BLASTP hits for DKFZphf kd2_4 6)cl9, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphf kd2_46kl 9, frame 3 

Report for DKFZphf kd2_4 6kl9, 3 


(LENGTH) 
IMW] 
(pl) 
[HOMOL] 

IFUNCAT] 

cerevisiae, 

[SCOP) 

[EC] 

fPIRKWl 

(PIRKWI 

(PIRKW} 

[PIRKWJ 

I PIRKW J 

I PIRKW] 

(PIRKW] 

(SUPFAM) 

fPROSITE] 

tPROSITE] 

[PROSITEl 

IKW] 

[KW] 

[KWl 


130 

14377.56 
9-17 

PIR:A47189 pterin-4-alpha-carbinolamine dehydratase (EC 4.2.1. 


96) - rat 4e- 


01.07.99 other vitamin, cof actor, and prosthetic group activities 
YHL018W] 5e-04 

dldchg_ 4.38.1.1.1 Pterin-4a-carbinolamine dehydratas 4e-50 

4.2.1.96 Tetrahydrobiopterin dehydratase 6e-34 

nucleus 6e-34 

carbon -oxygen lyase 6e-34 

homotetramer 6e-34 

hydro-lyase 6e~34 

cytosol 6e-34 

acetylated amino end 6e-34 
homodimer 6e-34 

pterin-4-alpha-carbinolamine dehydratase 6e-34 
MYRISTYL 2 
CK2_PH0SPHO_SITE 3 
PKC_PHOSPHO_SITE 4 
Alpha Beta 

3D 

LOW COMPLEXITY 14.62 % 


IS. 


SEQ MAAVLGALGATRRLLAALRGQSLGLAAMSSGTHRLIAEERNQAILDLKAAGWSELSERDA 

SEG . XXXKXXXXXXXXXXXXXXX 

IdchB CCCCHHHHHHHHHHHHHHCCEEECCCCE 

SEQ lYKEFSFHNFNQAFGFMSRVALQAEKMNHHPEWFNVYNKVQITLTSHDCGELTKKDVKLA 

SEG 

IdchB EEEEEECCCHHHHHHHHHHHHHHHHHHCCCCEEEETTTEEEEEECBTTTTBTCCHHHHHH 

SEQ KFIEKAAASV 
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SEG 

IdchB HHHHHHHHHH 


Prosit e for DKFZphf kc[2_4 6kl9. 3 


PS00005 

11->14 

PKC PHOSPHO 

SITE 

PDOC00005 

PS00005 

32->35 

PKC PHOSPHO" 

'site 

PDOC00005 

PS00005 

56->59 

PKC^PHOSPHO" 

"site 

PDOC00005 

PS00005 

113->116 

PKC PHOSPHO" 

"site 

PDOC00005 

PS00006 

56->60 

CK2_PH0SPHO" 

'site 

PDOC00006 

PS00006 

105->109 

CK2 PHOSPHO" 

"site 

PDOC00006 

PS00006 

113->117 

CK2 PHOSPHO" 

|SITE 

PDOC00006 

PS00008 

6->12 

MYRISTYL 


PDOC00008 

PS00008 

20->25 

MYRISTYL 


PDOC0O008 


(No Pfam data available for DKFZphfkd2_46kl9.3) 
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DKFZphfkd2_4 6m4 


group: signal transduction 

DKFZphfkd2_46m4 .3 encodes a novel 198 amino acid putative GTP-binding protein related to the 
SAR-1 family of Ras superfamily members. 

SARI proteins are involved in vesicular transport between the endoplasmic reticulum and the 
Golgi apparatus. 

The new -protein can find clinical application in modulating the transport of vesicles to the 
Golgi Apparatus, thus enabling post-translational modifications of the vesicles contents. 
Blocking of the molecule is expected to result modulation/blocking of secretory pathways. 


nearly identical to mouse GTP-binding protein 
complete cDNA, complete cds, EST hits 
Sequenced by MediGenomix 

Locus: /map=~43e-9 cR from top of ChrlO linkage group" 
Insert length: 2996 bp 

Poly A stretch at pos. 2969, polyadenylation signal at pos. 2958 


1 ACATCCGGCG AGTAGCTGGC GGTCCCGGGT GCTGCTGGTT AGTGTGCTCT 
51 GAGGGAGGGT CCGAGCCAGC CGCTGTTTTG CCGGAGGAGC CCCTCAGGCC 
101 GTAGTAAGCA TTAATAATGT CTTTCATCTT TGAGTGGATC TACAATGGCT 
151 TCAGCAGTGT GCTCCAGTTC CTAGGACTGT ACAAGAAATC TGGAAAACTT 
201 GTATTCTTAG GTTTGGATAA TGCAGGCAAA ACCACTCTTC TTCACATGCT 
251 CAAAGATGAC AGATTGGGCC AACATGTTCC AACACTACAT CCGACATCAG 
301 AAGAGCTAAC AATTGCTGGA ATGACCTTTA CAACTTTTGA TCTTGGTGGG 
351 CACGAGCAAG CACGTCGCGT TTGGAAAAAT TATCTCCCAG CAATTAATGG 
401 GATTGTCTTT CTGGTGGACT GTGCAGATCA TTCTCGCCTC GTGGAATCCA 
451 AAGTTGAGCT TAATGCTTTA ATGACTGATG AAACAATATC CAATGTGCCA 
501 ATCCTTATCT TGGGTAACAA AATTGACAGA ACAGATGCAA TCAGTGAAGA 
551 AAAACTCCGT GAGATATTTG GGCTTTATGG ACAGACCACA GG7U\AGGGGA 
601 ATGTGACCCT GAAGGAGCTG AATGCTCGCC CCATGGAAGT GTTCATGTGC 
651 AGTGTGCTCA AGAGGCAAGG TTACGGCGAG GGTTTCCGCT GGCTCTCCCA 
701 GTATATTGAC TGATGTTTGG ACGGTGAAAA TAAAAGAGTT TTACTTCTCT 
751 GGACTGATCC TATTCACAGC TTCCTCATGA ACTTTTCTAA TAGAACAAGG 
801 ATAGCTCTCC AACCATGTCT GGCGTTGAGA AGCCAAGAGT CTCTGTCAAC 
851 TCTCTCATTG CCCAGTGGTG ACATGTGCTC TTCTCCACAC TGTTGGGAGG 
901 TAATGCTGCC CCACGTGCTG GTGCAGGTCA GTATCCTGGG ACTTGGAAGC 
951 TGGCAGGATT TGCCGGGTAA AGCTGTATGC CATCATGGGG CACCTGAAAA 
1001 GAAAAACACG TCTCACCACT GTGGTTGATT CAAAAGAAAG TGATTCTATT 
1051 TTTTAAAGAA AGCGTTGTTA ATGTAATTGG TATCCCTCCT AACTTTTTGA 
1101 GTTCACAATT TACTTGGTCC AGAGTTTTCT ATTCTTTTTT TTTTTTTAAA 
1151 CTAATGAATG ACATTTAGAT ACTTCATAAA ATTATGAACA GATATGGAGG 
1201 CCAGAGCTCA TTTGGGTAAA CTTACTCCTG CTGAGTTAGC AGGTTGGTGA 
1251 GAG/^GCTCC CCTGAGCTCA CCTGTCTCTC TGACTGCCTT GGAGTAGGTG 
1301 GCATAACCTT GTGCACAGAG AACTAGAAAA GGGGCAGAAC CCCGGCCTTG 
1351 CAGTTGTGGC AGGTTTCCAC TGTGGTAAGC TAGGTTCATT CCTCATCAAG 
1401 GAATGTGTAG CAGATTGTTC ACTGTGGAGG AGGTAATTAT AGAATGGGTT 
1451 ATTGTTGTTA TTCTTACTCA TGAAGTTACA GATTTTAGCC AGTCTTTGCT 
1501 TTTATACTTT TGTGAAATTT AATTTCTCTC TATAGCACCT TCCTTTTTCG 
1551 TTTTCAGTTA TCAAAAGTGA CTTTGACCTC ATAAGAGAGT TGAGAACATC 
1601 TCTCGTGTCA CATACTGCAG GTGCATCAGT TACTTTTGCA CAGATTCTAG 
1651 GGGGACATTT TTCTGAATAG GAAGACAGGA CAAAGTTAAC AGCTTAAGGG 
1701 CTCTTAATTC TGTGAGTTGA GGACTTAAAA GTATTGTAGC ATTTGTTTGG 
1751 ATCCATGAAA AATGTATTCA GTGGGCTTTA AAATTTCCAT TTGCAGAATT 
1801 TGGTCTCTCA GGCTGTTTGG GAGCTCTTTT TTTTACATTT TTTCTCCTTT 
1851 GACACCTATT TTATTGGTGT TTAAAGTAAA GGTTAACATC TGTAGCTTTT 
1901 CCAGGTTTTT TTTTTTTTTT TTGATATGAA ATTGTCTTTC TCCATTGCAG 
1951 AAATAAGCTA GGGAAACACT AACCCAAAAA CTTTCTGTAG AGCTGTTCCT 
2001 TTGGAGGCAG CATCACTTAT TGGCAGTAAA GACTCAGTAT AAAAGCACCA 
2051 GCATCCCTAC TTGGGTGATG GGGATTAATT TTATAGCATT CCATTTTCCT 
2101 AGTGCCACAT GTGAAATTGG ATTTTGATGA TCTTAATCTA TATTCTACCC 
2151 TTATAATAAA AGATCAAAAG ATATATCTCC TATGAACAGA TTGCAGATAG 
2201 GAGATGAAAA GTTGGGAGGA TGCCTTTATT CTAATGTGAG GGTAGGGAAA 
2251 ATGTGGATAA CATTACTGGG GTGAAGGAGG CATTGTTCTT TAGTTGGAGT 
2301 TCTCATTTTT ATTCTCCAGT ACTGACTTGT GGGGAAAGCA TACTTTTTCA 
2351 CTGCCAGGTA CTGAATGCAG AGGCTCAGTG AAGTATATAT GTGGGAAGTG 
2401 CATGCATTTC GTTTATTAGC AAACATAGCT GGATTAAGAC GAAGTTGTTG 
2451 GTTTGGAAAG GGGTTAAAGC CTTAAGTGAA CAAATCTAGC TAACAGTGAA 
2 501 TGAACTAGGT AATATAACTT GCATATTTTT AATTTCCTTT GGTTAAAGGT 
2551 CCCCCATACT TCTCTGTTCG GAGACATGAG AAGTATGATT ACTTCAGTGT 
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2601 TAGTTTTCTT AATTTTTTTT TTCCCCTATT TGTCCCTTGT CACTTTGTTG 
2 651 CAAGCTAGAA ATCTGTGGGT TATACATAGG GCAGCTCTTT GCGAAAGTGG 
2701 TTTATTCCAC TGGAGAAAGG GGATTGAAAA TCAGTTAGAA CCAATGTATT 
2751 TCTTGCCCCA CGGAACACTA TTCCTATAAG ATAGCTGAAA GAAGCTGCTG 
2801 TGAGGAGCTC AGCTCCAACA CAGGATCAGC ACCTTGTATA GGAATTCCCA 
2851 TGAATTATGA CTTCTCATTC TGTTTTATCA GAGTGCATAT ATGTCCTACT 
2901 TCAGGAAAAG TAAAACAGTC ATTTACGAAA GAAAGTCAAT CTGTATCCTA 
2951 AGCATTTTAA TAAAAAGTTA AAACAAAAAA AAAAAAAAAA AAAAAA 


BLAST Results 


Entry H567931d from database EMBL: 
human STS WI-16722. 
Length = 265 
Minus Strand HSPs: 

Score « 1242 (186.4 bits). Expect 2-8e-50, P « 2.8e-50 
Identities - 260/265 (98%) 


Medline entries 


94085558: 

Molecular analysis of SARl-related cDNAs from a mouse 
pituitary cell line. 


Peptide information for frame 3 


ORF from 117 bp to 710 bp; peptide length: 198 
Category: strong similarity to known protein 


1 MSFIFEWrYN GFSSVLQFLG LYKKSGKLVF LGLDNAGKTT LLHMLKDDRL 

51 GQHVPTLHPT SEELTIAGMT FTTFDLGGHE QARRVWKNYL PAINGIVFLV 

101 OCADHSRLVE SKVCLNALHT OETISNVPIL ILGNKIDRTD AISEEKLREI 

151 FGLYGQTTGK GNVTLKELNA RPMEVFMCSV LKRQGYGEGF RWLSQYID 


BLAST? hits 


Entry S39543 from database PIR: 
GTP-binding protein - mouse 
Length - 198 

Score » 1029 (362.2 bits), Expect = 5.1e-104, P = 5.1e-104 
Identities « 197/198 (99%), Positives = 198/198 (100%) 


Entry SARA_MO0SE from database SWISSPROT: 
GTP-BINDING PROTEIN SARA. 
Length » 198 

Score « 1012 (356.2 bits), Expect « 3.2e-102, P « 3.2e-102 
Identities - 195/198 (98%), Positives - 196/198 (98%) 

Entry CEZK180_4 from database TREMBL: 

gene: ''ZK180.4*'; Caenorhabditis elegans cosmid ZK180. 

Length ^ 193 

Score - 679 (239.0 bits). Expect - 6.3e-67, P - 6.3e-67 
Identities » 125/197 (63%), Positives = 161/197 (81%) 


Alert BLASTP hits for DKFZphf kd2_46m4, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphf kd2_46m4, frame 3 


Report for DKFZphf kd2_46m4 .3 


[LENGTH] 198 

[MW) 22367.00 

[pI] 6,21 

[HOMOL] PIR:S39543 GTP-binding protein - mouse le-112 
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(FUNCAT) 08.07 vesicular transport (golgi network, etc.) (S. cerevisiae. yPL218w] 

le-58 

(FUNCAT) 30.09 organization of intracellular transport vesicles (S. cerevisiae, 

yPL218wJ le-58 

(FUNCAT) 06.10 assembly of protein complexes {S. cerevisiae, YOR094w) 2e-23 

(FUNCAT] 06.0? protein modification (glycolsylation, acylation, myristylation, 

palmitylation, farnesylation and processing) (s. cerevisiae, YPLOSlw] 4e-22 

(FUNCAT] 30.08 organization of golgi (S. cerevisiae, YDL192wJ 3e-20 

(FONCATl 30.03 organization of cytoplasm (S. cerevisiae, YBR164cJ 3e-19 

( FUNCAT] 03.22 cell cycle control and mitosis (S. cerevisiae, YMR138w] 2e-09 

(FUNCAT] 30.04 organization of cytoskeleton [S. cerevisiae, YMR138w] 2e-09 

(FUNCAT] 98 classification not yet clear-cut (S. cerevisiae, YHRlGSw] 7e-05 

(FUNCAT], 30.02 organization of plasma membrane (S. cerevisiae, YHROOSc] le-04 

(FUNCATJ 30.07 organization of endoplasmatic reticulum (S. cerevisiae, YKL154wJ 

le-04 

(FUNCAT] 03.07 pheromone response, mating-type determination, sex-specific proteins 

[S. cerevisiae, YHROOScJ le-04 

(FUNCAT) 10.05.07 g-proteins (S. cerevisiae, YHROOSc] le-04 

(FUNCAT] 06.04 protein targeting, sorting and translocation (S. cerevisiae, yKL154wl 
le-04 

(FUNCAT) 08.19 cellular import (S. cerevisiae, YMLOOlw) 3e-04 

(BLOCKS] BL00395A Alanine raceraase pyridoxal-phosphate attachment site proteins 

(BLOCKS) BL01019B ADP-ribosylation factors family proteins 

(BLOCKS) BL01019A ADP-ribosylation factors family proteins 

[BLOCKS) BL01020D SARI family proteins 

(BLOCKS) BL01020C SARI family proteins 

(BLOCKS) BL01020B SARI family proteins 

(BLOCKS) BL01020A SARI family proteins 

(SCOP) dlpij 3.25.1.3.1 cH-p21 Ras protein [human (Homo sapiens) 7e-36 

(SCOP) dlguaa__ 3.25,1,3.10 RaplA (Human (Homo sapiens) 8e-40 

(SCOP) dlrrf 3.25,1.3.5 ADP-ribosylation factor 1 (ARFl) (rat (Rattu 2e~55 

(SCOP) dlhurb_ 3.25.1.3.4 ADP-ribosylation factor 1 (ARFl) [human (Horn le-58 

(SCOP) dlgota2 3.25.1.3.3 (1-54,171-326) Transducin (alpha subunit) (ra 2e-33 

(SCOP) dltadb2 3.25.1.3.2 (1-30,152-316) Transducin (alpha subunit 6e-36 

(PIRKW) glycoprotein 4e-19 

(PIRKW) monomer le-16 

(PIRKW) P-loop 3e-64 

[PIRKW] lipoprotein 4e-19 

I PIRKW] GTP binding 3e-64 

[S0PFAMJ ADP-ribosylation factor 5e-22 

( PROS ITE 1 AT PGT PA 1 

[PROSITE) MYRISTYL 3 

[PROSITE] SARI 1 

(PROSITE) CK2_PH0SPH0_SITE 4 

(PROSITE) PKC_PHOSPHO_SITE 3 

[PROSITE) ASNGLYCOSYLATION 1 

IPFAM) ADP-ribosylation factors (Arf family) {contains ATP/GTP binding P-loop) 

CKW) Alpha_Beta 

(KW) 3D 


SEQ MSFI FEWIYNGFSSVLQFLGLYKKSGKLVFLGLDNAGKTTLLHMLKDDRLGQHVPTLHPT 
IhurA TTTTTCCCCEEEEEETTTTCHHHHHHHHCCCCEEEEEEETTEE 


SEQ SEELTIAGMTFTTFDLGGHEQARRVWKNYLPAINGIVFLVDCADHSRLVESKVELNALMT 

IhurA EEEEEETTEEEEEEETTTTTTTCCCHHHHHHCEEEEEEEEETTTTTHHHHHHHHHHHHHH 

SEQ DETISNVPILILGNKIDRTDAISEEKLREIFGLYGQTTGKGNVTLKELNARPMEVFMCSV 

IhurA TTTTTTTEEEEEEETTTTTTTCCHHHHHHHHCGG 

SEQ LKROGYGEGFRWLSQYID 

IhurA 


Prosite for DKFZphf )cd2_46m4 .3 


PSOOOOl 

162->166 

ASN GLYCOSYLATION 

PDOCOOOOl 

PS00005 

25->28 

PKC'PHOSPHO SITE 

PI3OC00005 

PS00005 

158-M61 

PKC PHOSPHO~SITE 

PDOCOOOOS 

PS00005 

164->167 

PKC_PH0SPHO~SITE 

PDOC00005 

PS00006 

60->64 

CK2 PH0SPH0''SITE 

PDOC00006 

PS00006 

72->76 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

111->115 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

164->168 

CK2 PHOSPHO SITE 

PDOC00006 

PS00008 

32->38 

myrTstyl 

PDOCOOOOS 

PS00008 

68->74 

MYRISTYL 

PDOC00008 

PS00008 

155->161 

MYRISTYL 

PDOCOOOOS 

PS00017 

32->40 

ATP GTP A 

PDOC00017 

PS01020 

171->197 

SARI 

PDOC00782 
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Pfam for DKFZphfkd2_46in4 . 3 


HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 


ADP-ribosylation factors (Arf family) {contains ATP/GTP binding P-loop) 

♦GMgWfsIFrkMWGlWNKEMRILMLGLDNAGKTTILYMLKlgEIVTTTPT 
++ FS++++++GL++K++++++LGLDNAGKTT+L+MLK++++ +++PT 
9 -YNGFSSVLQFLGLYKKSGKLVFLGLDNAGKTTLLHMLKDDRLGQHVPT 56 

IGFNVETVeYKNIKFNVWDVGGQdSlRPYWRHYYpNTDGIIWVVDSaDRD 
+++++E++++ +-I-+F++4D+GG++++R++W++Y P+++GI+++VD+AD++ 
57 LHPTSEELTIAGMTFTTFDLGGHEQARRVWKNYLPAINGIVFLVDCADHS 106 

RMeEaKqELHaMLNEEELrDAPlLIFANKQDLPgAMSesEIREaLGLHel 
R+ E+K+EL+A++++E ++++P+LI++NK+D+ +A+SE+++RE+ GL+ + 
107 RLVESKVELKALMTDETISNVPILILGNKIDRTDAISEEKLREIFGLYGQ 156 

RCn RPWYIQMCCAVtGEGLYEGMDWLSNYInlcRJcK* 

+++ RP++++MC++++++G++EG++WLS+YI 
157 TTGKGNVTLKELNARPMEVFMCSVLKRQGYGEGFRWLSQYI 197 


421 


wo 01/12659 


PCT/IBOO/01496 


DKFZphf kcl2_47a4 


group: transcription factor 

DKF2phfkd2_47a4,i encodes a novel 280 amino acid protein with similarity to zinc finger 
proteins. 

The new protein is a putative transcription factor with one C2H2 zinc fingers. 

The new protein can find application in modulating /blocking the expression of gene's controlled 
by this -transcription factor. 


similarity to C.elegans F46B6,7 

potential frame shift at 1092, will be checked see BLASTX 
Sequenced by MediGenoroix 
locus: roap='*7q31'' 
Insert length: 1756 bp 

Poly A stretch at pos. 1737, no polyadenylation signal found 


1 CCCTTTTCTT TTCTGCCGGG TAATGGCTGC TTCCAAGACC CAGGGGGCTG 
51 TCGCCCGAAT GCAGGAAGAC CGTGATGGGA GCTGCAGCAC AGTCGGGGGT 
101 GTAGGTTATG GGGTAAGGAT TGTATCCTGG AGCCGCTTTC CCTGCCAGAA 
151 AGTCCAGGTG GCACCACCAC TTTAGAAGGT TCTCCATCTG TGCCTTGTAT 
201 TTTCTGTGAA GAACATTTTC CTGTGGCTGA ACAAGACAAA CTTCTGAAGC 
251 ACATGATTAT TGAGCATAAG ATTGTCATAG CTGATGTCAA GTTGGTTGCT 
301 GATTTCCAAA GGTACATTTT ATATTGGAGG AAAAGGTTCA CTGAACAGCC 
351 CATCACAGAT TTTTGTAGTG TAATAAGAAT TAATTCCACT GCTCCATTTG 
401 AAGAACAAGA GAATTATTTT TTGTTATGTG ACGTTTTACC AGAAGATAGA 
451 ATTCTTAGAG AAGAGCTTCA GAAACAGAGA CTGAGAGAAA TTCTGGAACA 
501 ACAGCAGCAA GAACGAAATG ATAACAATTT TCATGGCGTT TGTATGTTTT 
551 GCAATGAAGA ATTCCTTGGA AACAGATCTG TTATTTTGAA CCACATGGCC 
601 AGAGAACATG CTTTCAACAT TGGATTGCCA GACAACATTG TAAACTGCAA 
651 TGAATTTTTG TGTACATTAC AGAAAAAGCT TGACAATTTG CAGTGCTTGT 
701 ACTGTGAGAA GACCTTCAGG GGCAAAAATA CACTTAAAGA TCACATGAGG 
751 AAAAAACAGC ATCGTAAGAT TAATCCTAAG AACAGAGAAT ATGACAGATT 
801 TTATGTCATC AATTATTTGG AACTTGGAAA ATCGTGGGAG GAAGTTCAGT 
851 TGGAAGATGA TCGGGAGTTG CTGGACCATC AGGAAGATGA CTGGTCTGAT 
901 TGGGAAGAAC ACCCTGCCTC TGCAGTCTGC TTATTTTGTG AAAAGCAAGC 
951 AGAAACAATT GAGAAGTTGT ATGTCCACAT GGAGGATGCA CACGAATTTG 
1001 ATCTTCTCAA AATAAAGTCA GAACTTGGAT TAAATTTCTA TCAGCAAGTG 
1051 AAACTGGTCA ATTTTATTCG GAGGCAAGTT CACCAATGCA GATGATGGCT 
1101 GCCATGTGAA GTTCAAATCC AAAGCAGACT TAAGAACTCA CATGGAAGAA 
1151 ACTAAACACA CTTCGCTGCT CCCCGATAGA AAGACGTGGG ATCAACTGGA 
1201 GTATTATTTT CCAACCTATG AAAATGACAC TCTCCTGTGT ACACTATCTG 
1251 ACAGTGAAAG TGACCTGACA GCTCAGGAAC AAAATGAAAA TGTTCCCATC 
1301 ATCAGTGAAG ATACATCTAA ACTGTATGCT TTGAAACAAA GCAGTATTTT 
1351 GAACCAGTTG CTACTATAAG AGTACTTGAA AACCTAGAAG AAACTACCAC 
1401 AGAAGCAATT TTTCATGTTT TTCTCCTATG AGACAGATAT GAAAGAACAA 
1451 TTTAAATTTG AACATCAACA AAAGATTGGT CCTTGGTGAA ATAAACTTTT 
1501 CAAAAATGAA TGTTCTTTTC AAAAAATAAA GTAGAAAAAT GCACTTACTA 
1551 AGAACATGAA AAAAAAATGA AGTAGGAAAA TAAGATGAAG ACTTTGTATT 
1601 TTGGCTGTAA AGTTTTATTG TGTGATCATC TTAAATTATC TCACTTCATT 
1651 AAACTCATAA TTATATATAG AAGTATATGT CAATTACAAA GAAATGAAAT 
1701 GTTCAAATTA TTTATAAACC TGATTTTTCA ATCAGCGAAA AAAAAAAAAA 
1751 AAAAAA 


BLAST Results 


Entry AC004112 from database EMBL: 

Homo sapiens BAC clone RG313E03 from 7q31, complete sequence. 
Score = 2660, P = 3.0e-241, identities = 534/535 
> 10 exons 

Entry AC004111 from database EMBL: 

Homo sapiens BAC clone RG103H13 from 7q31, complete sequence. 
Score = 598, P - 5.8e-17, identities = 128/137 
1 exon 


Medline entries 
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No Medline entry 


Peptide information for frame 1 


ORF from 253 bp to 1092 bp; peptide length: 280 
Category: similarity to unknown protein 


1 MIIEHKIVIA 
51 EQENYFLLCD 
101 NCEFLGNRSV 
151 CEKTFRGKNT 
201 EDDRELLDHQ 
251 LLKIKSELGL 


DVKLVADFQR 
VLPEDRILRE 
ILNHMAREHA 
LKDHMRKKQH 
EDDWSDWEEH 
NFYCXJVKLVN 


YILYWRKRFT 
ELQKQRLREI 
FNIGLPDNIV 
RKINPKNREY 
PASAVCLFCE 
FIRRQVHQCR 


EQPITDFCSV IRINSTAPFE 
LEQQQQERND NNFHGVCMFC 
NCNEFLCTLQ KKLDNLQCLY 
DRFYVINYLE LGKSWEEVQL 
KQAETIEKLY VHMEDAHEFD 


BLAST? hits 


Entry CEF4 6B6_6 from database TREMBLNEW: 
product: ^4686.7"; Caenorhabditis elegans cosmid F46B6 
>TREMBL:CEF4 6B6_6 product: "F46B6.7"; Caenorhabditis elegans cosmid 


F4 6B6 
Score 


630, 


P = l.le-61, identities = 123/289, positives = 183/289 


Entry AF059531_1 from database tremblnbw: 

gene: "PRMT3-; product: -protein arginine N-methyl transferase 3"; Homo 
sapiens protein arginine N-methyltransferase 3 (PRMT3) raRNA, partial 
cds. >TREMBL:AF059531_1 gene: -PRMT3"; product: "protein arginine 
N-methyltransferase 3"; Homo sapiens protein arginine 
N-methyltransferase 3 (PR^f^3) mRNA, partial cds. 
Score = 120, P = 1.5e-04, identities = 23/78, positives «= 42/78 

Entry YB9M__YEAST from database SWISSPROT: 

34.7 KD PROTEIN IN SHM1-MRPL37 INTERGENIC REGION. 

Score - 112, P ^ 4.6e-04, identities = 43/165, positives = 71/165 


Alert BLASTP hits for DKFZphf kd2_47a4, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphfkd2 47a4, frame 1 


Report for DKF2phfkd2 47a4.1 


[LENGTH! 
[MW] 
[pU 
{HOMOL) 

[BLOCKS] 

(BLOCKS) 

(PROSITEJ 

(PROSITEJ 

[PROSITE] 

[PROSITEJ 

[PROSITE] 

[PROSITE] 

[PROSITEJ 

IPFAM] 

[KWJ 

[KW] 


280 

33921.94 
5.63 

TREMBL:CEF46B6_5 gene: 


•*F4 686.7"; Caenorhabditis elegans cosmid F46B6 le-56 


BL01032B Protein phosphatase 2C proteins 
BL00028 Zinc finger, C2H2 type, domain proteins 
MYRISTYL 1 

1 
1 
3 
2 
2 
2 


ZINC_FINGER_C2H2 
CAMP_PHOS PHO_S ITE 
CK2_PH0S PHO_S ITE 
T YR_PHOS PHO^S I TE 
PKC_PHOSPHO_SITE 
ASN^GLYCOSYLATION 
Zinc finger, C2H2 type 
Alpha__Beta 

LOW COMPLEXITY 8.21 % 


SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 


MIIEHKIVIADVKLVADFQRYILYWRKRFTEQPITDFCSVIRINSTAPFEEQENYFLLCD 

cccccceeehhhhhhhhhhhhhhhhhhhhhhhcccceeeeeeccccccchhhwieeeecc 

VLPEDRILREELQKQRLREILEQOQQERNDNNFHGVCMFCNEEFLGNRSVILNHMAREHA 
xxxxxxxxxxxxxxxxxxxxxxx 

ccccchhhhhhhhhhhhhhhhhhhhhhhhcccceeeeeeccccccccceeeehhhhhhhh 
FNIGLPDNIVKCNEFLCTLQKKLDNLQCLYCEKTFRGKNTLKDHMRKKQHRKINPKNREY 
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SEG 
PRD 

SZQ 
SEG 
PRD 

SZQ 
SEG 
PRD 


hcccccccccchhhhhhhhhhhhhhhhheeecccccccchhhhhhhhhhhcccccccccc 

DRFYVINYLELGKSWEEVQLEDDRELLDHQEDDWSDWEEHPASAVCLFCEKQAETIEKLY 

ceeeeeeeeccccchhhhhhhhcchhhhhhcccccccccccccccchhhhhhhhhhhhhh 

VHMEDAHEFDLLKIKSELGLNFYQQVKLVNFIRROVHQCR 

hhhhhhhhhhhhhhhhhhcchhhhhhhhhhhhhhhhcccc 


Prosite for DKFZphf kci2_47a4 . 1 


PSOOOOl 
PSOOOOl 
PS00004 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00007 
PSO0O07 
PS00008 
PS00028 


107->111 
27->31 
154->157 
160-M63 
160->164 
194->198 
215->219 
178->185 
13->22 
124->130 
148->171 


ASN GLYCOSYLATION 

ASN^GLYCOSYLATION 

CAMP_PHOS PHO_S ITE 

PKC_PHOSPH0_SITE 

PKC_PHOSPHO_SITE 

CK2 PHOSPHORS I TE 

CK2~PH0SPH0_SITE 

CK2_PH0SPHO_SITE 

T YR_PHOS PHO_S I TE 

TyR_PHOSPHO_SITE 

MYRISTYL 

ZINC FINGER C2H2 


PDOCOOOOI 
POOCOOOOl 
PDOC00004 

PDOC00005 
PDOC00005 
PDOC00006 
POOC00006 
PDOC00006 
PDOC00007 
PDOC00007 
PDOC00008 
PDOC00028 


Pfam for DKFZphf fcd2_4 7 a4 . 1 


HMM^NAME Zinc finger, C2H2 type 

HMM ♦CpwPDCgKtFrrwsNLrRHMR. .T.H* 

C + C+KTFR + +L+ HMR H 
Query 148 CLY— CEKTFRGKNTLKDHMRKK-QH 


170 
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DKFZphfkd2 4b6 


group: kidney derived 

S'^partlaJ^CDS?'"''" ^ ''''''^^ ^^^^ "^^'^ similarity to Homo sapiens clone 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The^new protein can find application in studying the expression profile of kidney-specific 

similarity to Homo sapiens clone 25003 

complete cDNfl, complete cds, few EST hits 

Sequenced by GBF 

Locus: unknown 

Insert length: 1936 bp 

Poly A stretch at pos. 1916, polyadenylation signal at pos. 1890 

1 GGGAGACTTG CAATGAAGTT AGAATGAACA GGAGGAGTCT GCAGCTTTTC 
51 AGTGCCTGGG ATAACTATAG TTTAAAGATC ATTGTGTAAA ATAGGATTTT 
101 TAGTCAGCAT GCATTGTTTT AAACCGACTA ACTGATAGCC TAAAACTTTA 
151 TTTTTGCATT TTGCCAATCC TTGGAGTTTT GTTTTGCAGA ATTAAGAAAA 
201 AAATGAATGT ATGATCATCT GAAAAGGGCT TTCTCTCAAT CCCACTTCAT 

251 GGCATGACCT CTGCTGGATC ATTAGTTCTA GCCAGAGAAG TAGCAAAGGA 
301 ACATGACGTC TGAGACCTCC CTTCCCTCAT CAGTGGGGCT GACTGAGCTG 

351 GGGGCTTGAA GCCGGAGGTA ACCTTTCCTG TCGAATGTTT CTTTAGAGAA 
, 401 TGGCAATGGT CTCTGCGATG TCCTGGGTCC TGTATTTGTG GATAAGTGCT 

451 TGTGCAATGC TACTCTGCCA TGGATCCCTT CAGCACACTT TCCAGCAGCA 

501 TCACCTGCAC AGACCAGAAG GAGGGACGTG TGAAGTGATA GCAGCACACC 

551 GATGTTGCAA CAAGAATCGC ATTGAGGAGC GGTCACAAAC AGTAAAGTGT 

601 TCCTGTCTAC CTGGAAAAGT GGCTGGAACA ACAAGAAACC GGCCTTCTTG 

651 CGTCGATGCC TCCATAGTGA TTTGGAAATG GTGGTGTGAG ATGGAGCCTT 

701 GCCTAGAAGG AGAAGAATGT AAGACACTCC CTGACAATTC TGGATGGATG 

751 TGCGCAACAG GCAACAAAAT TAAGACCACG AGAATTCACC CAAGAACCTA 

801 ACAGAAGCAT TTGTGGTAGT AAAGGAAAAC CAACCCTCTG GAAAATACAT 

851 TTTGAGAATC TCAAACATCT CACATATATA CAAGCCAAAT GGATTTCTTA 

901 CTTGCACTTT GACTGGCTAC CAGATAATCA CAGTGCGTTT AGTGTGTGTA 

951 ACGAAATATC CTACAGTGAG AAGACACAGC GTTTTGGCAT CACCATGGAA 
1001 AGTGGGCTTA AAAAAGGGTC TTCTCAGTGA AATTTTTGGG CATCATGAAG 
1051 AACGATCAAC TATCTTCTAA TTTGAATCTA TAGTTACTTT GTACCATTTG 
1101 AAATATATGT ATATATATAT ATATAATATT TTGAAATATT ATCTATTCTC 
1151 TTCAAGAAAT GAACAGTACC ACAGTTTGAG ACGGCTGGTG TACCCCTTTG 
1201 AGTTTTGGAT GTTTTGTCTG TTTTGCTTTG TTTTGTTAGT CATTTCTTTT 
1251 TCTAACGGCA AGGAAGATAT GTGCCCTTTT GAGAATTCAA GATGGCACTG 
1301 ACACGGGAAG GCCAGCTACA GGTGGACTCC TGGAATTTGA GGCATCATAA 
1351 TGATACTGAA TCAAGAACTT CCTTCTGCTT CTACCAGATG GCCCAAGGAA 
1401 GCACATCGTC CTGTTTTATT GCTTTCTACC CTGTGCAATA TTAGCATGCA 
1451 AGCTTGGCTT ACATAGTCAT ACTTTATATT CAATTGATAT ATAATAACCG 
1501 TTCTAACCTC TTCCAGGAAA ATATTTTTAG AACTACTAGC TTTTCCACTT 
1551 AGAAGAAAAT GAGGATTCTT AAGGGAGCCA CTCCACCATG CTATTAAGAC 
1601 TCTGGCAGAG TTATGGGTAG GATATGGATC CCTACATGAA TAAGTCCTGT 
1651 AAATACAATG TCTTAAGGCT TTGTATAGCT GTCCTAGACT GCAGAAATGT 
1701 CCTCTGATTA AATCCAAAGT CTGGCATCGT TAACTACATA GTGCTGTAGC 
1751 AACAAGTCTT ATCATGGCAT CTCTTTCTAT GTTTGGTTTG CTTTTTCCAA 
J!c^ GAGTATTCAG GTCTCCTCTT GTGAGATAGG AAGGCCATGA AAACAATTAG 
1851 ATTTCAAGAT GATCTATGTG ACCAAATGTT GGACAGCCCT ATTAAAGTGG 
1901 TAAACAACTT CTTTCTAAAA AAAAAAAAAA AAAAAA 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 
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Peptide information for frame 1 


ORF from 400 bp to 798 bp; peptide length: 133 
Category: similarity to unknown protein 
Classification: no clue 

1 MAMVSAKSWV LYLWISACAM LLCHGSLQHT FQQHHLHRPE GGTCEVIAAH 
51 RCCNKNRIEE RSQTVKCSCL PGKVAGTTRN RPSCVDASIV IWKWWCEMEP 
101 CLEGEECKTL PDNSGWMCAT GNKIKTTRIH PRT 


BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_4b6, frame 1 

TREMBLNEW;AF131851_1 product: "Unknown"; Homo sapiens clone 25003 mRNA 
sequence, partial cds . , N = 1, Score = 242, P = 1.7e-20 

>TREMBLNEW:AF1318511 product: "Unknown"; Homo sapiens clone 2S003 mRNA 
sequence, partial cds. 
Length =165 

HSPs: 

Score = 242 (36.3 bits). Expect = 1.7e-20, P « 1.7e-20 
Identities = 44/89 (49%), Positives = 58/89 (65%) 

Query: 42 GTCEVIAAHRCCNKNRIEERSQTVKCSCLPGKVAGTTRNRPSCVDASIVIWKWWCEMEPC 101 

GTCE++ R ++ R QT +C+C G++AGTTR RP+CVDA 1+ K WC+M PC 

Sbjct: 76 GTCEIVTLDRDSSQPRRTIARQTARCACRKGQIAGTTRARPACVDARIIKTKQWCDMLPC 135 

Query: 102 LEGEECKTLPDNSGWMCAT-GNKIKTTRI 129 

LEGE C L + SGW C G +IKTT + 
Sbjct: 136 LEGEGCDLLINRSGWTCTQPGGRIKTTTV 164 

Pedant information for DKFZphf kd2_4b6, frame 1 

Report for DKFZphfkd2_4b6.1 

[LENGTH J 133 

[MWl 15030.64 

tpl] 8.49 

[HOMOLl TREMBLNEW:AF131851_1 product: "Unknown"; Homo sapiens clone 25003 mRNA 

sequence, partial cds. 4e-20 

[KM] Alpha Beta 

IKW] SIGNAL_PEPTIDE 26 

SEQ MAMVSAMSWVLYLWISACAMLLCHGSLQHTFQQHHLHRPEGGTCEVIAAHRCCNKNRIEE 
PRD ccchhhhhhhh hhhhhhhhhhhhccccchhhhhhhcccccccce eee e ee cc cccchhhh 

SEQ RSQTVKCSCLPGKVAGTTRNRPSCVDASIVIWKWWCEMEPCLEGEECKTLPDNSGWMCAT 

PRD hhhhhhccccccccccccccccccceeeeeehhhhhhccccccccceeeecccccceeec 

SEQ GNKIKTTRIHPRT 
PRD ccccccccccccc 


(No Prosite data available for DKFZphf kd2__4b6. 1) 
(No Pfaro data available for DKFZphf kd2_4b6. 1) 
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DKFZphfkd2_4c8 


group: kidney derived 

DKFZphfkd2_4cB encodes a novel 153 amino acid protein with partial similarity to huntinqton's 
associated protein HAPl. ^ 

The novel protein contains a leucine zipper involved in protein-protein interaction. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of kidney-specific 
genes . - . ■* ^ 


similarity to KIAA0549 and HAPl 

potential frame shift at Bp -1350-1500 will be checked 

Sequenced by GBF 

Locu 5 : un known 

Insert length: 3182 bp 

Poly A stretch at pos. 3162, polyadenylation signal at pos. 3135 


1 GGGCTTCCCC CATAGAATTT TTCTTTTCAT TGCCCACTTT ACTGTTTTGG 
51 CTCCAGACTG TCGTTAAGAA TGTACAGCCT AATTCTGGTG TGTTTCGGGA 
101 TATTCTTCTG TCCAGTATTC TGGAAGGGCG GGGAGGCATG GCAGCGTTTT 
151 ACTTGACGTT GATGGTGCTG TGAAGTCCAT TCTTTCCTCT GCAAGACTAC 
201 TGACTATGCA GAAATTTATC GAAGCGGATT ATTATGAACT AGACTGGTAT 
251 TATGAAGAAT GCTCGGATGT TTTATGTGCT GAAAGAGTTG GCCAGATGAC 
301 TAAGACATAT AATGACATAG ATGCTGTCAC TCGGCTTCTT GAGGAGAAAG 
351 AGCGGGATTT AGAATTGGCC GCTCGCATCG GCCAGTCGTT GTTGAAGAAG 
401 AACAAGACCC TAACCGAGAG GAACGAGCTG CTGGAGGAGC AGGTGGAACA 
451 CATCAGGGAG GAGGTGTCTC AGCTCCGGCA TGAGCTGTCC ATGAAGGATG 
501 AGCTGCTTCA GTTCTACACC AGCGCAGCGG AGGAGAGTGA GCCCGAGTCC 
551 GTTTGCTCAA CCCCGTTGAA GAGGAATGAG TCGTCCTCCT CAGTCCAGAA 
601 TTACTTTCAT TTGGATTCTC TTCAAAAGAA GCTGAAAGAC CTTGAAGAGG 
651 AGAATGTTGT ACTTCGATCC GAGGCCAGCC AGCTGAAGAC AGAGACCATC 
701 ACCTATGAGG AGAAGGAGCA GCAGCTGGTC AATGACTGCG TGAAGGAGCT 
751 GAGGGATGCC AATGTCCAGA TTGCTAGTAT CTCAGAGGAA CTGGCCAAGA 
801 AGACGGAAGA TGCTGCCCGC CAGCAAGAGG AGATCACACA CCTGCTATCG 
851 CAAATAGTTG ATTTGCAGAA AAAGGCAAAA GCTTGCGCAG TGGAAAATGA 
901 AGAACTTGTC CAGCATCTGG GGGCTGCTAA GGATGCCCAG CGGCAGCTCA 
951 CAGCCGAGCT GCGTGAGCTG GAGGACAAGT ACGCAGAGTG CATGGAGATG 
1001 CTGCATGAGG CGCAGGAGGA GCTGAAGAAC CTCCGGAACA AAACCATGCC 
1051 CAATACCACG TCTCGGCGCT ACCACTCACT GGGCCTGTTT CCCATGGATT 
1101 CCTTGGCAGC AGAGATTGAG GGAACGATGC GCAAGGAGCT GCAGTTGGAA 
1151 GAGGCCGAGT CTCCAGACAT CACTCACCAG AAGCGTGTCT TTGAGACAGT 
1201 AAGAAACATC AACCAGGTTG TCAAGCAGAG ATCTCTGACC CCTTCTCCCA 
1251 TGAACATCCC CGGCTCCAAC CAGTCCTCGG CCATGAACTC CCTCCTGTCC 
1301 AGCTGCGTCA GCACCCCCCG GTCCAGCTTC TACGGCAGCG ACATAGGCAA 
1351 CGTCGTCCTC GACAACAAGA CCAACAGCAT CATTCTGGAA ACAGAGGCAG 
1401 CCGACCTGGG AAACGATGAG CGGAGTAAGA AGCCGGGGAC GCCGGGCACC 
1451 CCCAGGCTCC CACGACCTGG AGACGGCGCT GAGGCGGCTG TCCCTGCGCC 
1501 GGGAGAACTA CCTCTCGGAG AGGAGGTTCT TTGAGGAGGA GCAAGAGAGG 
1551 AAGCTCCAGG AGCTGGCGGA GAAGGGCGAG CTGCGCAGCG GCTCCCTCAC 
1601 ACCCACTGAG AGCATCATGT CCCTGGGCAC GCACTCCCGC TTCTCCGAGT 
1651 TCACCGGCTT CTCTGGCATG TCCTTCAGCA GCCGCTCCTA CCTGCCTGAG 
1701 AAGCTCCAGA TCGTGAAGCC GCTGGAAGGT GATCACGCGG GGCCTCGGCC 
1751 CCTCTCTGTC CTCCTGGGGG ACTCCCTTTG GTCCCTGATC CACCTGCGGA 
1801 AGGCGGGGCA CCTCTGTCAC GCCTACTCCT TTTTCTTCCG CGACAGCCAC 
1851 CCGCGCTGCT GGTTTGAGTT CCTCTGAGGG TGGTGCTCAG CCTAGGCCTC 
1901 CGTCCCTCCC CTCTGGCTGG CAGGTGTGAC AATGCACACA TAGGCCATGA 
1951 AACTCGCCGA GGAAAGACAA GCATGTGCAC TGTGGTCTTC TAGTTCTTTC 
2001 CTTTGCCTTT AGAACCTTAG AAATAAAAAC TTTTGTGGCG GTAGAGGCAC 
2051 TGCTAACTGA TTCAAAAATT AATTAGGTTT TGCCTGTGGG TGTGAGGAAT 
2101 GCAGAAAATT AATGCTTTAG CTTTTCTGCA GTTTTGGTGT CGGGGAGAGG 
2151 TTCCAAGCAA ACTCTATTAA ATGGGGATTT TTTTTTCCCC ATAACCACCT 
2201 GAATGTGATT TGTGGGCTTA TGTGTTCTGA TTTGAACTTC ATATAGCAAG 
2251 GTTGTGGCTT TTGGCAGATG CAGTATGTTC TGAGCGCGGC TCCTAGAGTC 
2301 TACAATTTGG AGTCCAGGAA GGGGTGGCTG TGGAGACAAG TGAGTTTTGT 
2351 ACCTCCGTAA GCCACCCTTT TTCAGGGTCA GTTCATGTGT TAGTATCAGG 
2401 GGCATCTCAG ATGATTAAAC TCATGGGAAA AACTTCCTCC TTCCCTCTCT 
2451 CCCTCTTGCC CTCCTGCCTC tTTTTTTTTT tTTTTTTTTT AATTTGGGCA 
2501 CTTATAAAAT GTTTTCCCTC TACCTGCTGC TACTCTGCCA agagccacca 
2551 AGTGCTTATA TTTTTCATTT TTTACTCCTT TAGTTTGGAA AGCCATATAC 
2601 GTTTGAGAAG GTGTTTTAAA ACTCTGTGTT ACACTTACGA tgcaaagcca 
2651 AATCAGAACT TCTGTAAGGC AGAACTTTCC CAACTTTAAA aaaattattg 


427 


wo 01/12659 


PCT/IBOO/01496 


2701 TCCCCTCTAG GAGCCTTCTT AGACGTTTTT TCCTAATCAC CCCCCAAAGA 
2751 CATTTTAATA CC AC AT AT AT ATTGTTTATG TACTATATGT ATATACATAA 
2801 ACAATACATA AGCAATACAT CTGTGGTATT AAAATTAAAA AGAATCCAAT 
2851 TATGTTTACC TCAAAAGAAC CTGTTTTTGC TTCTTGGGAG CAATATTGCC 
2901 CCTGTGAGAC TGCATGCTAT AAGGTAAGGT TGTGCTTGTT AAAGACCCAA 
2951 GACATGACTG GGTTCCACAG TCTCCAAAGG AAGAGGGTGG GCTAGTTTGT 
3001 TTTTATTATT ATTTTAAAAT TGTATAATTG GGGTCTTTCT TAGAGTTCAG 
3051 AAAAGGTATA GCTTACTCTT TTTTAATTGT TTATTTAGTT GTAAGCTTAG 
3101 TGATTGTTTT CTGATCCACA TTGTGTGTGT TCTTCAATAA AATCTTTCAT 
3151 TTCTGCAATT TTAAAAAAAA ^^AAAAAAAAA AA 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 2 


ORF from 206 bp to 1531 bp; peptide length: 442 
Category: similarity to known protein 
Classification: unset 

Prosite motifs: LEUCrNE_ZIPPER (139-161) 


1 MQKFIEADYY ELDWYYEECS DVLCAERVGQ MTKTYNDIDA VTRLLEEKER 

51 DLELAARIGQ SLLKKNKTLT ERNELLEEQV EHIREEVSQL RHELSMKDEL 

101 LQFYTSAAEE SEPESVCSTP LKRNESSSSV ONYFHLDSLQ KKLKDLEEEN 

151 WLRSEASQL KTETITYEEK EQQLVNDCVK ELRDANVQIA SISEELAKKT 

201 EDAARQQEEI THLLSQIVDL QKKAKACAVE NEELVQHLGA AKDAQRQLTA 
251 ELRELEDKYA ECMEMLHEAQ EELKNLRNKT MPNTTSRRYH SLGLFPMDSL 

301 AAEIEGTMRK ELQLEEAESP DITHQKRVFE TVRNINQVVK QRSLTPSPMN 

351 IPGSNQSSAM NSLLSSCVST PRSSFYGSDI GNVVLDNKTN SIILETEAAD 

401 LGNDERSKKP GTPGTPRLPR PGDGAEAAVP APGELPLGEE VL 


BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_4c8, frame 2 

PIR:S72555 huntingtin-associated protein HAPl - human (fragment), N » 
1, Score =234, P = 8.6e-19 

TREMBL:CEUT27A3_7 gene: "T27A3.1*'; Caenorhabditis elegans cosmid 
T27A3., N = 1, Score = 226, P « 9.9e-16 

PIR:S67495 huntingtin-associated protein HAPl-A - rat, N = 1, Score = 
215, P = 1.6e-14 


>PIR:S72555 huntingtin-associated protein HAPl - human (fragment) 
Length * 320 

HSPs: 

Score = 234 (35.1 bits). Expect = 8.6e-19, P = 8.6e-19 
Identities - 66/189 (34%), Positives = 110/189 (58%) 

Query: 109 EESEPESVCSTPLKRNE--SSSSVQNYFH LDSLQKKLKDLEEENWLRSEASQLKTE 163 

EE+E + C+ P + S ++ + H L++LQ+KL+ LEEEN LR EASQL T 
Sbjct: 28 EEAEEDLQCAHPCDAPKLISQEALLHQHHCPQLEALQEKLRLLEEENHQLREEASQLDT- 86 

Query: 164 TITYEEKEQQLVNDCVKELRDANVQIASISEELAKKTEDAARQQEEITHLLSQIVDLQKK 223 

E++EQ L+ +CV++ +A+ 0+A +SE L + E+ RQQ+E+ L +Q++ LQ++ 
Sbjct: 87 LEDEEQMLILECVEQFSEASQQMAELSEVLVLRLENYERQQQEVARLQAQVLKLQQR 143 

Query: 224 AKACAVENEELVQHLGAAKDAQRQLTAE — LRELEDKYAECME— MLHEAQEELKNL-RN 278 
+ E E+L +L+K+QQLEL ++AE4 + +++RN 
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Sbjct: 144 CRMYGAETEKLQKQLASEKEIQMQLQEEETLPGFQETLAEEIiRTSLRRMISDPVYFMERN 203 

Query: 279 KTMP— NTTSRRY 289 

MP +T+S RY 
Sbjct; 204 YEMPRGDTSSLRY 216 


Peptide information for frame 3 


ORF from 1416 bp to 1874 bp; peptide length: 153 
Category: similarity to known protein 
Classification: unset 


1 MSGVRSRGRR APPGSHDLET ALRRLSLRRE NYLSERRFFE EEQERKLQEL 
51 AEKGELRSGS LTPTESIMSL GTHSRFSEFT GFSGMSFSSR SYLPEKLQIV 
101 KPLEGDHAGP RPLSVLLGDS IWSLIHLRKA GHLCHAYSFF FRDSHPRCWF 
151 EFL 


BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf k:cl2_4c8, frame 3 

TREMBL:AB011121_1 gene: "KIAA0549-; product: "KIAA0549 protein"; Homo 
sapiens mRNA for KIAA0549 protein, partial cds., N = 1, Score - 252, P 


>TREMBL:AB011121_1 gene: "KIAA0549"; product: "KIAA0549 protein"; Homo 
sapiens mRNA for KIAA0549 protein, partial cds. 
Length « 469 


HSPs: 


Score « 252 (37.8 bits). Expect = 5.5e-21, P = 5.5e-21 
Identities « 57/98 (58%), Positives = 69/98 (70%) 

Query: 8 GRRAPPGSHDLETALRRLSLRRENYLSERRFFEEEQERKLQEIAEKGELRSGSLTPTESI 67 

G+ P G OL TAL RLSLRR+NYLSE++FF EE +RK+Q IA++ E SG +TPTES+ 
Sbjct: 27 GQPGPSGDSDLATALHRLSLRRQNYLSEKQFFAEEWQRKIQVLADQKEGVSGCVTPTESL 86 

Query: 68 MSLGTHSRFSEFTGFSGMSFSSRSYLPEKLQIVKPLEG 105 

SL T SE T S S R ++PEKLQIVKPLEG 
Sbjct: 87 ASLCTTQ— SEITDLSSAS-CLRGEMPEKLQIVKPLEG 121 


Pedant information for DKFZphfkd2_4c8, frame 2 


Report for DKF2phfkd2_4c8.2 


[LENGTH) 
fMW) 
Ipl] 
(HOKOL] 
cds. 5e-29 
I FUNG AT J 
5e-08 
[FUNCAT) 
[FUNCAT) 
{FUNCAT I 
6e-08 
[FUNCAT] 
[FUNCAT) 
(FUNCAT) 
[FUNCAT J 
jannaschii 

[ FUNCAT J 

myosin-1 
[FUNCAT) 
[FUNCAT] 
repair) 
(FUNCAT] 
[FUNCAT) 
[FUNCAT] 


442 

50020.14 
4.77 

TREMBL:AF040723 


1 product: "neuroanl" 


08.07 vesicular transport (golgi network, etc.) 


Homo sapiens neuroanl mRNA, complete 
[S, cerevisiae, YDL058w) 


30.04 organization of cytoskeleton [S. cerevisiae, YILl49c) 5e-08 

30.03 organization of cytoplasm (S. cerevisiae, YDL058w] 5e-08 

03.04 budding, cell polarity and filament formation (S. cerevisiae, YILl38c] 


99 unclassified proteins [S. cerevisiae, YGR130c] 2e-07 

09.10 nuclear biogenesis (S. cerevisiae, YDR356w) le-06 

03.22 cell cycle control and mitosis [S. cerevisiae, YDR356w] le-06 
1 genome replication, transcription, recombination and repair [M. 
MJ1643) le-06 

08.22 cytoskeleton-dependent transport [S. cerevisiae, YHR023w MYOl - 

isoform] 3e-06 

03.25 cytokinesis (S. cerevisiae, YHR023w MYOl - myosin-l isoform) 3e-06 

11,04 dna repair (direct repair, base excision repair and nucleotide excision 
tS. cerevisiae, YKR095w| 4e-06 

30.10 nuclear organization [S. cerevisiae, YKR095wl 4e-06 
03.13 meiosis (S. cerevisiae, YNL250w] 2e-05 

03.19 recombination and dna repair (S. cerevisiae, YNL250w) 2e-05 
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tFUNCAT] 
5e-05 
(FUNCAT) 
[FUNCATJ 


08.99 other intracellular-transport activities 


(S. cerevisiae, YNL079cJ 


(FUNCATJ 

le-04 

(FUNCAT] 

(FUNCAT) 

YNL272cj 

( FUNCAT ] 

(BLOCKS! 

[BLOCKS] 

(ECl 

(PIRKWJ 

(PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

(PIRKWJ 

[PIRKW] 

[PIRKW] 

[PIRKW] 

(PIRKW] 

(PIRKWJ 

[PIRKW] 

[PIRKWJ 

[PIRKW] 

[PIRKWJ 

[PIRKWJ 

[PIRKW] 

(PIRKWJ 

[PIRKW] 

(PIRKWJ 

(PIRKWJ 

[PIRKW] 

(PIRKWJ 

(PIRKWJ 

(PIRKWJ 

(PIRKWJ 

(PIRKW) 

[PIRKW] 

[PIRKW] 

[SUPFAM] 

(SUPFAMJ 

[SUPFAM] 

(SUPFAMJ 

(SUPFAMJ 

(SUPFAM] 

(SUPFAM) 

(SUPFAM] 

(SUPFAM) 

(SUPFAM] 

(SUPFAMJ 

[SUPFAMJ 

(SUPFAMJ 

(PROSITE) 

(KWJ 

IKW] 

[KWJ 


(S 


03.01 cell growth [S. cerevisiae, yNL079cJ 5e-05 

03.07 pheroraone response, ma ting-type determination, sex-specific Droteins 
cerevisiae, YNL079cJ 5e-05 Fi-v^tcj-ua 

10.05.99 other pheroraone response activities (s. cerevisiae, YHR158cJ 

30.13 organization of chromosome structure (S. cerevisiae, YDR285w] le-04 
30.09 organization of intracellular transport vesicles [S. cerevisiae 


3e-04 


08.16 extracellular transport 
BL01289B 
BL0041SM Synapsins proteins 
3-6.1.32 Myosin ATPase 2e-07 
tandem repeat 2e-07 
heterodimer le-06 
endocytosis 9e-C7 
heart le-06 

transmembrane protein 4e-07 
zinc finger 9e-07 
metal binding 9e-07 
DNA binding 3e-06 
muscle contraction 2e-07 
acetylated amino end 3e-06 
actin binding 2e-0'? 
mitosis le-06 
microtubule binding le-06 
ATP 2e-07 

chromosomal protein le-06 

receptor 3e-08 

thicic filament 2e-07 

phosphoprotein 8e-06 

glycoprotein 3e-08 

skeletal muscle 3e~06 

ONA condensation le-06 

alternative splicing 2e-06 

coiled coil 2e-07 

P-loop 2e-07 

heptad repeat 4e-07 

methylated amino acid 2e-07 

peripheral membrane protein 9e-07 

cardiac muscle 6e-06 

hydrolase 2e-07 

muscle 2e-06 

cytosJceleton 2e-06 

Golgi apparatus 4e-07 

calmodulin binding 9e-07 

myosin motor domain homology 2e-07 

tropomyosin TPMl 2e-06 

giantin 4e-07 

protein )cinase C zinc-binding repeat homology 2e-06 

human early endosome antigen 1 9e-07 

unassigned kinesin-related proteins 4e-07 

M5 protein 8e-08 

cytoslceletal keratin 3e-06 

myosin heavy chain 2e-07 

conserved hypothetical PI 15 protein le-06 

centromere protein E le-06 

plecJcstrin repeat homology 2e-06 

kinesin motor domain homology 4e-07 

LEUCINE ZIPPER 1 

AllAlpha 

LOW COMPLEXITY 6.79 % 

COILED^COIL 27.15 % 


(S. cerevisiae, YNL272c) 3e-04 


SEQ MQKFIEADYYELDWYYEECSDVLCAERVGQMTKTYNDIDAVTRLLEEKERDLELAARIGQ 

r.J^ * ' ' iili^mi' '1 xxxxxxxxxxxxxxx . . . 

COILS '^^^ ^^'^^^*^^^^^^***^*^^^^^^^^'^^*^^^^^^h^'^Wi*iJ^hhhhhhhhhhhhhh 

SEQ SLLKKNKTLTERNELLEEQVEHIREEVSQLRHELSMKDELLQFYTSAAEESEPESVCSTP 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhlihJ^hhhhhlihhhhjihliiiiililihhlihhhhhhh^ 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ LKRNESSSSVQNYFHLDSLQKKLKDLEEENWLRSEASQLKTETITYEEKEQQLVNDCVK 

SEG 

55?,^ ^^*»h*^^^hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhlilihhhiihlihhhh]ihhh^ 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 
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SEQ ELRDANVQIASISEELAKKTEDAARQQEEITHLLSQIVDLQKKAKACAVENEELVQHLGA 

seg 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh^ 

^^^^^ CCCCCCCCCCC 

SEQ akdaqrqltaelreledkyaecmemlheaqeelknlrnktmpnttsrryhslglfpmdsl 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh^ 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ aaei egtmrkelqleeaespdithqkrvfetvrni nqvvkqrsltps pmni pgsnqssam 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhli^ 

COILS 

SEQ NSLLSSCVSTPRSSFYGSDIGNVVLDNKTNSIILETEAADLGNDERSKKPGTPGTPRLPR 

bdS **■•• xxxxxxxxxxx 

COILS ^^^^^^^^^^^^^^^^®^®®®^'^*^^^®®^cccccccccccccccccccccccccc 

SEQ PGDGAEAAVPAPGELPLGEEVL 

SEG xxxx 

PRD cccccccccccccccccccccc 

COILS 


Prosite for DKFZphf kd2_4c8 .2 
PSO0O29 139~>161 LEUCINEZIPPER PDOC00029 

<No Pfam data available for DKFZphf kd2_4c8 .2) 

Pedant information for DKFZphf kd2_4c8, frame 3 

Report for DKFZphf kd2_4c8. 3 

(LENGTH J 153 

IMW} 17642.03 

{pl! 9.38 

«RNA°iL ^Ta.nJa''"^^;''?^^^^^^ '•KIAA0549"; product: "KIAA0549 protein"; Homo sapiens 

mRNA for KIAA0549 protein, partial cds. 2e-12 f ^ 

tKW) Alpha_Beta 
[KWJ LOW^CCMPLEXITY 12.42 % 


SEQ MSGVRSRGRRAPPGSHDLETALRRLSLRRENYLSERRFFEEEQERKLQELAEKGELRSGS 

xxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccc 

SEQ LTPTESIMSLGTHSRFSEFTGFSGMSFSSRSYLPEKLQIVKPLEGDHAGPRPLSVLLGDS 
SEG 

PRD cccccceeeccccceeeccccccccccccccccchhhhhhhhcccccccccceeeeeccc 

SEQ LWSLIHLRKAGHLCHAYSFFFRDSHPRCWFEFL 
SEG . 

PRD chhhhhhhhhcccccceeeeecccccccccccc 

(No Prosite data available for DKFZphf kd2 4c8.3) 
(No Pfam data available for DKFZphf kd2_4c8 .3) 


431 


wo 01/12659 


PCT/I BOO/01496 


DKFZphfkd2_4kl4 


group: intracellular transport and trafficking 

DKFZphf kd2_4kl4 .3 encodes a novel 254 amino acid putative GTP-binding protein nearly identical 
to Rab6. 

Rab proteins are members of the Ras super family of GTPases* Rab proteins are localised to the 
cytoplasmic side of organelles and vesicles involved in the secretory (biosynthetic) and 
endocytotic pathways in eukaryotic cells. Rab proteins direct the targeting and fusion of 
transport vesicles to their acceptor membranes. 

rab6 is a ubiquitous ras-like GTPase involved in intra-Golgi transport. 

The new protein can find application in modulating the transport of vesicles inside the Golgi 
apparatus. 


strong similarity to Rab6 

complete cDNA, complete cds, EST hits 

Sequenced by GBF 

Locus: unknown 

Insert length: 3084 bp 

Poly A stretch at pos. 3061, polyadenylation signal at pos. 3043 


1 GGGGCACTCA GCAGGTTGGG CTGCGGCGGC GGCGGCTGGG GAAGCCGAAG 
51 CGCCGCGCGT GAGAGATCCC GGATACATCT GCGGTTTGGG CTCCGCCACC 
101 CTCCGTCTCT CTCCCGCAGG TCTCTGAGCC GGGTGCGGAA GGAGGGAACG 
151 GCCCTAGCCT TGGGAAGCCA AAGCACACCC CTGGCTCCCG CCGACACCGC 
201 CCTCCTTCCC TTCCCAGCCG CGGGCCTCGC TCCGTGCTCG GCTACTCTGC 
251 CGGGAGGCGG CGGCGGCTGC CAGTCTGTGG CGAGCCCTGC TGCCCTCCAG 
301 CCGGGCTTCT CCAGCCGGGC TCCTCCACCG GCCCTTGCAG GGGCACAGAG 
351 AGCTCGGCGC CCGCCCTTCC GCTCGCCTTT TTCGTCAGCC GGCTGGAGGA 
401 GCATCGGTCC GGGAGGTCTC TGGGCTGAGG CGGCGACAGC TCCTCTAGTT 
451 CCACCATGTC CGCGGGCGGA GACTTCGGGA ATCCGCTGAG GAAATTCAAG 
501 CTGGTGTTCC TGGGGGAGCA AAGCGTTGCA AAGACATCTT TGATCACCAG 
551 ATTCAGGTAT GACAGTTTTG ACAACACCTA TCAGGCAATA ATTGGCATTG 
601 ACTTTTTATC AAAAACTATG TACTTGGAGG ATGGAACAAT CGGGCTTCGG 
651 CTGTGGGATA CGGCGGGTCA GGAACGTCTC CGTAGCCTCA TTCCCAGGTA 
701 CATCCGTGAT TCTGCTGCAG CTGTAGTAGT TTACGATATC ACAAATGTTA 
751 ACTCATTCCA GCAAACTACA AAGTGGATTG ATGATGTCAG AACAGAAAGA 
801 GGAAGTGATG TTATCATCAC GCTAGTAGGA AATAGAACAG ATCTTGCTGA 
851 CAAGAGGCAA GTGTCAGTTG AGGAGGGAGA GAGGAAAGCC AAAGGGCTGA 
901 ATGTTACGTT TATTGAAACT AGGGCAAAAA CTGGATACAA TGTAAAGCAG 
951 CTCTTTCGAC GTGTAGCAGC AGCTTTGCCG GGAATGGAAA GCACACAGGA 
1001 CGGAAGCAGA GAAGACATGA GTGACATAAA ACTGGAAAAG CCTCAGGAGC 
1051 AAACAGTCAG CGAAGGGGGT TGTTCCTGCT ACTCTCCCAT GTCATCTTCA 
1101 ACCCTTCCTC AGAAGCCCCC TTACTCTTTC ATTGACTGCA GTGTGAATAT 
1151 TGGCTTGAAC CTTTTCCCTT CATTAATAAC GTTTTGCAAT TCATCATTGC 
1201 TGCCTGTCTC GTGGAGGTGA TCTATTAGCT TCACAAGCAC AAAAAAAGTC 
1251 AGCGTCTTCA TTATTTATAT TTTACAAAAA GCCAAATTAT TTCAGCATAT 
1301 TCCGGTGATA ACTTTAAAAA TTAGATACAT TTTCTTAACA TTTTTTTCTT 
1351 TTTTAATGTT ATGATAATGT ACTTCAAAAT GATGGAAATC TCAACAGTAT 
1401 GAGTATGGCT TGGTT7VACGA GCAGTATGTT CACAGCCTGC TTTATCTCTC 
1451 CTTGCTCTTC TCACCTCTCC CTTACCCCGT TCCCTATTTC CGTGTTCTTA 
1501 CCTAGCCTCC CCCCACTTCC TCAAAACAAA CAAGAGATGG CAAAGCAGCA 
1551 GTCCGACCAA GCCCACTGGA ATTATCCTTT AATTTTACAG ATACCACTTG 
1601 CTGTAGGCTG TGGACCAAGA TGTCCAGAAT TATTCTTGAG CACTGATGTA 
1651 AATTACTTAG ATCTTCTTTG AGGTCAGAAT TCAGCGATCA CGGTAGGCAG 
1701 TGCTTGAATG AGAAAAGCCT CCTGGTGCAT CTTCAAAATG AGTCCTAAAG 
1751 AACATACTGA GTACTTATAA GTAGCAGAAC ATAAAATGTA TTTCTGACTA 
1801 ACACAAATGG TCCTTTCACA TGTGCTTTAT TAGACTCTGG GAGAGAAAAG 
1851 TAACCAAGTG CTTCAGAACA GGTTTTTAGT ATTTACTTCT TCATGGTAAG 
1901 ATAATGAAGT TCTAATGAAC TATTTCTCCC AAGGTTTTAA AATTGTCAAG 
1951 AGTTATTCTG TTTGTTTAAA AAGTAAGAAA CCTCTGTAAG CAATAGATTT 
2001 TGCTTGGGTT TTCTTTCTTA AAAAAATAAT ACTATGCAGG CAAGACACCA 
2051 TAAAAGTTTA ATTCCTTACA GAAGAACCAG TGGAAGAATT TAAATTTGGC 
2101 ACTACGATCA AAACTACTGA ATTAGCAGAA ATAACGATAT CTAAAGCTTA 
2151 CCAGCAAAAG AACCCTCAGC AGAATAGCAA AAACTTTGCT CAGGACATTT 
2201 GAGGTCAAAT TGAAGACGGA AGACGGAAAC CGGAAACCGT TTTCTTGTAA 
2251 GCCCCTAGAG GCAGATCAGG TAAGCATACA TAGTAGAGGG AAAGGAGAGA 
2301 ATGGAAATAA AACTGAATAT TATGCAGATT TATGCCTTAT TTTTTAGCAT 
2351 TTTTTAAGGT TGGGTCTTTC AGGCTGGTTT TGGTTTGTAT TAGATCTGTA 
2401 TAGTTTAGTG ATTTAGTTTT ATATTTAAGC TACGATTAAT ATTTTTTCTT 
2451 TGGCGATATT TCTTTGCTTT TTTTTTTTAA CAACTTTCCA TTTTTAGATG 
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2501 TTTCGTTGAA TCTATTTAGA GCTTCACCAT GGCAATATGT ATTTCCCTTA 
2561 AAACACTGCA AACAAATATA CTAGGAGTGT GCCCTTTTAA TCTTTACTAG 
2601 TTATTGTGAG ACTGCTGTGT AAGCTAATAA ACACATTTGT AAAAACATTG 
2651 TTTGCAGGAA GAAAACTTCG AGTTACAGGT CAGGAAAAGC CTGCTGAATT 
2701 TATGTTGTAA ACGTTACTTA ACACAGTATA AAGATGAAAA GACAACAAAA 
2751 GTATCTTCAT ACTTCCTCAT CCCCTCATTG CAACAAAACC TTAAACTGGG 
2801 AGAACCTTAG TCCCCTCTCT TTCCTCTTCC TCCTCCACTT CCCACTTATT 
2851 GCCACTTTGT AATATTCAGA GAGCACTTGG ATTATGGATC TGAATAGAGA 
2901 AATGCTTACA GATAATCATT AGCCCACATA CCAGTAACTT ATACTTAAAG 
2951 ATGGGATGGA GTTATAAAGT GCTTTTATAA TCCAATATAA TTGCTAAAGG 
3001 CAAGGGTTGA CTCTTTGTTT TATTTTGACA TGGCATGTCC TGAAATAAAT 
3051 ATTGGTTCAC TATGAAAAAA AAAAAAAAAA AAAA 


BLAST Results 


No BLAST result 


Medline entries 


98382468: 
Rab proteins. 

97203146: 

GTP-bound forms of rab6 induce the redistribution of Golgi 
proteins into the endoplasmic reticulum. 


Peptide information for frame 3 


ORF from 456 bp to 1217 bp; peptide length: 254 
Category: strong similarity to knovm protein 
Classification: unset 

Prosite motifs: BACTERIAL OPSIN_RET (45-57) 


1 MSAGGDFGNP LRKFKLVFLG EQSVAKTSLI TRFRYDSFDN 
51 LSKTMYLEDG TIGLRLWDTA GQERLRSLIP RYIRDSAAAV 
101 FQQTTKWIDD VRTERGSDVI ITLVGNRTDL ADKRQVSVEE 
151 TFIETRAKTG YNVKQLFRRV AAALPGMEST QDGSREDMSD 
201 VSEGGCSCYS PMSSSTLPQK PPYSFIDCSV NIGLNLFPSL 
251 VSWR 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfkd2_4kl4, frame 3 

PIR:G34323 GTP-binding protein Rab6 - human, N « 1, Score • 944, P «= 

6.5e-95 

TREMBL:CET25G12_2 gene: ••T25G12 . 4**; Caenorhabditis elegans cosmid 
T25G12., N = 1, Score = 756, P * 5.4e-75 

TREMBL:NTNTRAF_1 gene: "Nt-rab6*; Nicotiana tabacum SRI Nt-rab6 mRNA, 
complete cds., N - 1, Score = 698, P = 7.6e-69 

TREMBL:D84314_1 product: "rabe"; Drosophila melanogaster mRNA for 
rab6, complete cds., N « 1, Score = 836, P = 1.9e-83 

PIR:T01588 small GTP-binding protein F16B22.10 - Arabidopsis thaliana, 
N « 1, Score = 704, P = 1.8e-69 


>PIR:G34323 GTP-binding protein Rab6 ~ human 
Length » 208 

HSPs: 

Score = 944 (141.6 bits). Expect = 6.5e-95, P « 6.5e-95 
Identities « 186/208 (89%), Positives « 190/208 (91%) 
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Query: 1 MSAGGDFGNPLRKFKLVFLGEQSVAKTSLITRFRYDSFDNTYQAITGIDFLSKTMYLEDG 60 

MS GGDFGNPLRKFKLVFLGEQSV KTSLITRF YDSFDNTYQA IGIDFLSKTMYLED 
Sbjct: 1 MSTGGDFGNPLRKFKLVFLGEQSVGKTSLITRFMYDSFDNTYQATIGIOFLSKTMYLEDR 60 

Query: 61 TIGLRLWDTAGQERLRSLIPRYIRDSAAAVVVYDITNVNSFQQTTKWIDDVRTERGSDVI 120 

T+ L+LWDTAGQER RSLIP YIRDS AVVVYDITNVNSFQQTTKWIDDVRTERGSDVI 
Sbjct: 61 TVRLQLWDTAGQERFRSLIPSYIRDSTVAVVVYDITNVNSFQQTTKWIDDVRTERGSDVI 120 

Query: 121 ITLVGNRTDLADKRQVSVEEGERKAKGLNVTFIETRAKTGYNVKQLFRRVAAALPGMEST 180 

I LVGN+TDLADKRQVS+EEGERKAK LNV FIET AK GYNVKQLFRRVAAALPGMEST 
Sbjct: 121 IMLVGNKTDLADKRQVSIEEGERKAKELNVMFIETSAKAGYNVKQLFRRVAAALPGMEST 180 

Query: . 181 QDGSREDMSDIKLEKPQEQTVSEGGCSC 208 

QD SREDM DIKLEKPQEQ VSEGGCSC 
Sbjct: 181 QDRSREDMIDIKLEKPQEQPVSEGGCSC 208 


Pedant information for DKFZphf kd2_4kl4, frame 3 


Report for DKFZphf kd2_4kl4 .3. 


[LENGTH] 
fMW] 
[pll 
IHOMOLJ 
IFUNCAT] 
7e-60 
IFUNCAT J 
IFUNCAT] 
YOR089C] 
IFUNCAT) 
IFUNCAT) 
IFUNCAT) 
2e-33 
IFUNCAT] 
YGL210wl 
IFUNCAT) 
(FUNCAT) 
8e-27 
IFUNCAT] 
2e-2i 

IFUNCAT] 

IFUNCAT] 

2e-21 

IFUNCAT) 

IFUNCAT] 

cerevisiae 

IFUNCAT] 

[FONCAT] 

(FUNCAT) 

IFUNCAT) 

IFUNCAT) 

IFUNCAT] 

(FUNCAT] 

IFUNCAT] 

(FUNCAT) 

IFUNCAT] 

IFUNCAT] 

IFUNCAT] 

[S. 

I FUNCAT) 

(FUNCAT) 

(BLOCKS) 

(SCOP) 

I SCOP] 

(SCOP] 

(SCOP) 

(SCOP) 

(PIRKW) 

IPIRKW] 

I PIRKW] 

(PIRKW) 

[PIRKH) 

IPIRKW) 

IPIRKW)- 

(PIRKW) 

IPIRKW] 

IPIRKW] 

(PIRKW) 


254 

28385.29 
7.58 

PIR:G34323 GTP-binding protein Rab6 - human le-i02 
08-07 vesicular transport (golgi network, etc.) 


(S. cerevisiae, YLR262c) 


2e-33 


30.08 organization of golgi (S. cerevisiae, YLR262c) 7e-60 

30.09 organization of intracellular transport vesicles (S. cerevisiae, 

08.19 cellular import (S. cerevisiae, YOR089c) 2e-33 

08.13 vacuolar transport [S. cerevisiae, yOR089c) 2e-33 

06.04 protein targeting, sorting and translocation (S. cerevisiae, YOR089c) 


3e-28 


09.09 biogenesis of intracellular transport vesicles 


[S. cerevisiae. 


30.02 organization of plasma membrane (S. cerevisiae, YFLOOSw] 8e-27 

03.04 budding, cell polarity and filament formation (S. cerevisiae, YFLOOSw] 


01.05,04 regulation of carbohydrate utilization 


(S. cerevisiae, YORlOlw) 


11. 
01. 


10 cell death {S. cerevisiae, YORlOlw] 

03.13 regulation of nucleotide metabolism 


2e-21 


(S. cerevisiae, YORlOlw) 


30.03 organization of cytoplasm (S. cerevisiae, YORlOlw) 2e-21 

03.99 other cell growth, cell division and dna synthesis activities 
YORlOlw] 2e-21 

10.04.07 g-proteins (S. cerevisiae, YORlOlw] 2e-21 

03.22 cell cycle control and mitosis (S. cerevisiae, YNL098c) 6e-19 

11.01 stress response (S. cerevisiae, YNL098cl 6e-19 

03.10 sporulation and germination IS. cerevisiae, YNL098c] 6e-19 

04.07 rna transport IS. cerevisiae, YORlSSc] 6e-16 

30.10 nuclear organization [S. cerevisiae, YOR185c) 6e-16 

08.01 nuclear transport (s. cerevisiae, YORl85c] 6e-16 

30-04 organization of cytoskeleton (S. cerevisiae, YPRl65w] 4e-13 

10.02.07 g-proteins [S. cerevisiae, YPR165w) 4e-13 

10.99 other signal- transduction activities [S. cerevisiae, YCR027c) 2e-09 
10.05.07 g-proteins (S. cerevisiae, YLR229c) 8e-08 

03.07 phercmone response, mating-type determination, sex-specific proteins 
cerevisiae, YLR229c) 8e-08 

03.01 cell growth (S. cerevisiae, YNLlOOc] le-05 

06.10 assembly of protein complexes IS. cerevisiae, YOR094w) 5e-05 

BL01H5A GTP-binding nuclear protein ran proteins 

dlas3_2 3.29.1.4.12 Transducin (alpha subunit) , insertion domai le-32 

dlmhl 3.29.1.4.2 Racl (Human (Homo sapiens) 2e-51 

dbp21 3.29.1.4.1 cH-p21 Ras protein (human (Homo sapiens) 7e-53 

dihura_ 3.29.1.4.8 ADP-ribosylation factor 1 (ARFl) (human (Hom le-46 

dla2kc_ 3.29-1.4.5 Ran Nuclear transport factor-2 (NTF2) (Do 6e-60 

nucleus 2e-14 

cell cycle control 5e-15 

membrane trafficking 3e-71 

endoplasmic reticulum le-29 

phosphoprotein le-29 

prenylated cysteine 2e-36 

signal transduction 5e-15 

transforming protein 5e-30 

purine nucleotide binding le-28 

alternative splicing le-lB 

P-loop 3e-71 


(S. 
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(PIRKW) lipoprotein 2e-36 

[PIRKWJ proto-oncogene le-20 

[PIRKWl methylated carboxyl end le-20 

[PIRKWJ membrane protein le-29 

[PIRKW] GTP binding 3e-71 

{PIRKWl thiolester bond le-29 

I PIRKW] Golgi apparatus le-29 

[SUPFAM] ras transforming protein le-76 

[PROSITE] BACTERIAL_OPSIN_RET 1 

fPFAM] Ras family (contains ATP/GTP binding P-loop) 

tKW) Alpha_Beta 

(KW) 3D 


SEQ MSAGGDFGNPLRKFKLVFLGEQSVAKTSLITRFRYDSFDNTYQAI IGIDFLSKTMYLEDG 

1 kao- CCEEEEEEECTTTTCHHHHHHHHHHCCCCCCCTTTTC-EEEEEEEEETTE 

S EQ T T GLRLWDTAGQERLRSL T PRY I RDS AAA VWYDITN VNS FQQTTKWI DDVRTERGSDV I 

IJcao- EEEEEEEECCTTTTCHHHHHHHHHHCCEEEEEEETTTHHHHHHHHHHHHHHHHHTTTCCC 

SEQ ITLVGNRTDLAOKRQVSVEEGERKAKGLNVTFIETRAKTGYNVKQLFRRVAAALPGMEST 

llcao- EEEEEETTTTGGGCCCCHHHHHHHHHHHCCCEEECTTTTHHHHHHHHHHH 

SEQ QDGSREDMSDIKLEKPQEQTVSEGGCSCYSPMSSSTLPQKPPYSFIDCSVNIGLNLFPSL 

llcao- 


SEQ 
Ikao- 


ITFCNSSLLPVSWR 


PS00327 


Prosite for DKFZphf kd2_4kl4 .3 
45->57 BACTERIAL OPSIN RET PDOC00291 


Pfam for DKFZphfkd2_4kl4.3 


HMM_NAME Ras family (contains ATP/GTP binding P-loop) 

HMM *KLVLIGDSGVGKSCLLIRFTQNeFnEeYIPTIGvDFYtKTIEIDGKtIK 

KLV++G+ +V K++L RF +++F++ Y + IG+DF++KT++-I-++ TI 
Query 15 KLVFLGEQSVAKTSLITRFRYDSFDNTYQAIIGIDFLSKTMYLEDGTIG 63 

HMM LQIWDTAGQERYRsMRPMYYRGAMGFMLVYDITNRqSFENI rNWweEIrR 

L +WDTAGQER RS+ P y+R++ ++++VYDITN SF+ ++W++++R+ 
CJuery 64 LRLWDTAGQERLRSLIPRYIRDSAAAWVYDITNVNSFQQTTKWIDDVRT 113 

HMM HCDrDENVPIMLVGNKCDLEDQRQVStEEGQeFAREWGAIPFMETSAKTN 
+ ++V+I LVGN +DL+D+RQVS EEG+ A+ ++ + F+ET AKT+ 
Query 114 ERG — SDVIITLVGNRTDLADKRQVSVEEGERKAKGLN-VTFIETRAKTG 160 

HMM iNVEEAFMEIvRellqrMqe . q . NqteNinidQpsrnrJc rCCCIM* 

+NV++ F ++♦♦- +++ +++ + +++++ + + I+ ++++ + +C+ + 

Query 161 YNVKQLFRRVAAALPGMESTQDGSREDMSDIKLEKPQEQTVSEGGCS-C 208 
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DKFZphfkd2_4rnll 


group: transraeinbrane protein 

DKFZphfbr2-4nill encodes a novel 159 amino acid protein with weak similarity to the putative 
membrane protein YMR034c of S. cerevisiae. 

The novel protein contains 4 transmembrane regions. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new. protein can find application in studying the expression profile of kidney-specific 
genes and as a new marker of neuronal cells. 


weak similarity to YMR034c 

complete cDNA, complete cds, no EST hits 

Sequenced by GBF 

Locus : unknown 

Insert length: 1749 bp 

Poly A stretch at pos. 1727, polyadenylation signal at pos. 1713 

1 GGGGTCCTCA AAGCCGCCGG AGCAACCCCC AGGTCTTTAC TTTACAATCG 
51 GCAATTTGAC TTGCTCTGCT GCATGTCTGG AGGGACCAAG GAAAGTGTGG 
101 AGACGCTCCA AGGATTAGGT GATCGGAGCT TGAAAAGAAA AAAAGCCAAA 
151 CAAATAAACA AAACCCACCC ACCCTAACGA ATATGAGGCT GCTGGAGAGA 
201 ATGAGGAAAG ACTGGTTCAT GGTCGGAATA GTGCTGGCGA TCGCTGGAGC 
251 TAAACTGGAG CCGTCCATAG GGGTGAATCG GGGACCACTG TUVGCCAGAAA 
301 TAACTGTATC CTACATTGCT GTTGCAACAA TATTCTTTAA CAGTGGACTA 
351 TCATTGAAAA CAGAGGAGCT GACCAGTGCT TTGGTGCATC TAAAACTGCA 
401 TCTTTTTATT CAGATCTTTA CTCTTGCATT CTTCCCAGCA ACAATATGGC 
451 TTTTTCTTCA GCTTTTATCA ATCACACCCA TCAACGAATG GCTTTTAAAA 
501 GGTTTGCAGA CAGTAGGTTG CATGCCTCCG CCTGTGTCTT CTGCAGTGAT 
551 TTTAACCAAG GCAGTTGGTG GAAATGAGGC AGCTGCAATA TTTAATTCAG 
601 CCTTTGGAAG TTTTTTGGTA AGTAAACATA GTTTAACTTG TCTATTACAA 
651 CTTTTGCTGT GATATTGTGT ATATGAAAGA TTTAGTGAAA GCTGGATTTG 
701 TTTTACTCTT TGGTTAAGTA TAAAAATTGT TGAATCTTTT CATGTGCCAG 
751 TATCCATACC CTGAAGAAAA GTAGTTAATG AATAAAGCAA ATGTTCTCTT 
801 ACAATATATT TTGGAGGTTT GGATTTTAAA ATTCCATTTA ATGAATTCAA 
851 GGAATCAATT AAAACACTAT GTGTCTCCTT ATAGAGGTTA TGTCAATATA 
901 TTGATCATTT AATGAGGTCT TTTAGATTAT TATTATTTTG TATCATGGGA 
951 CTGAGGATTT TGAAAAGGAA ACATGACCCA GCTGGTCAGA AAGGGAATGC 
1001 TAATTTACTT GTTGACATGC CATTTATTTT GTACATTTCA CTGTCAAAGA 
1051 AGCTACTGGC TTGGATGCTT CTGAGAAATC TATGTGAGAA AAAATTTGAA 
1101 AGGAAGATAT GACTAATGAG TAATTTGCAA GTAAATGTTG TATCTATATA 
1151 TATATATATA TAAAGATTCA AAAGTAGTTC AGCTTTCATA AGTAGAACCA 
1201 ATATAAGGAC GTTGTTTTAG CATTTTTAAT CATTATTTTT AAATAAATGA 
1251 TGTAACAGAG GCTTGATTTG TGTTATGAAA GATTGAGAAA CTAAATTTTC 
1301 TGTTGATTTA ATTTTTTTGT GCCTTAAAAC TTTGTTAAAT TCCTGAAGTT 
1351 AATTATCATA TTGTACTTTT TGGGGCATAA CTCATTAGCA GATATGTAGT 
1401 GCAGTGATTT ACAAATAATT GAGAGTAAAA TCAGTGATGT ATAAACTAGT 
1451 TCATGAGTCT AGGTAAAATA TCAATTACCT CTGTTTAAAA TGCTCTGTTA 
1501 ATTATTATTG TATGTATTTA AATGTAGTTA AAGCTTTTAA ACATGTTGTT 
1551 ACATAGTGTT AATTCTACAC AGTGCTACAC AGCTTTTAGT GTCACATAGC 
1601 CTTACAGAGT TTATAATGAT GTAGCATCTG CAAAATATAT GCATAGCTTA 
1651 TATCCTATTT TTATAGAGCC AGTAATGGTT TTTGTGATGC TGTATTACTT 
1701 CTGGGTTTTA GACAATAAAG TCTGTTTAAC AAAAAAAAAA AAAAAAAAA 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 3 
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ORF from 183 bp to 659 bp; peptide length: 159 
Category: similarity to unknown protein 


1 MRLLERMRKD WFMVGIVLAI AGAKLEPSIG VNGGPLKPEI TVSYIAVATI 
51 FFNSGLSLKT EELTSALVHL KLHLFIQIFT LAFFPATIWL FLQLLSITPI 
101 NBWLLKGLQT VGCMPPPVSS AVILTKAVGG NEAAAIFNSA FGSFLVSKHS 
151 LTCLLQLLL 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf Icd2_4mll, frame 3 

PIR: 353951 probable membrane protein YMR034c - yeast (Saccharomyces 
cerevisiae), N = 1, Score = 171, P - 3.2e-12 

PIR:A65015 yfeH protein - Escherichia coli (strain K-12), N «• 1, Score 
* 131, P = 4.2e-08 


>PIR:S53951 probable membrane protein YMR034C - yeast (Saccharomyces 
cerevisiae) 

Length = 434 

HSPs: 

Score = 171 (25.7 bits). Expect = 3.2e-12, P - 3.2e-12 

Identities = 38/144 (26%), Positives = 72/144 (50%) 

Query: 5 ERMRKDWF^4VGIVLAIAGAKLEPSIGVNGGPLKPEITVSYIAVATIFFNSGLSLKTEELT 64 

E ++ WF + + + I A+ P+ +GG +K + ++ Y VA IF SGL +K+ L 
Sbjct: 18 EFLKSQWFFICLAILIVIARFAPNFARDGGLIKGQYSIGYGCVAWIFLQSGLGMKSRSU1 77 

Query: 65 SALVHLKLHLFIQIFTLAFFPATIWLF— LQLLSITPINEWLLKGLQTVGCMPPPVSSA 121 

+ +++ +HI++ +++F +++ I++W+L GL P V+S 

Sbjct: 78 ANMLNWRAHATILVLSFLITSSIVYGFCCAVKAANDPKIDDWVLIGLZLTATCPTTVASN 137 

Query: 122 VILTKAVGGNEAAAIFNSAFGSFL 145 

VI+T GGN + G+ L 

Sbjct: 138 VIMTTNAGGNSLLCVCEVFIGNLL 161 


Pedant information for DKFZphf kd2_4mll, frame 3 


Report for DKFZphf lcd2_4mll .3 


r LENGTH! 159 

(MW) 17282.92 

{plj 9.06 

[hOMOL] PIR:S53951 probable membrane protein YMR034c - yeast (Saccharomyces cerevisiae) 
5e-12 

tFUNCATJ 99 unclassified proteins [S. cerevisiae, YMR034cl 2e-13 

[PROSITE] MYRISTYL 2 

(PROSITE] PKC PHOSPHOSITE 1 

(KW) TRANSMEMBRANE 4 


SEQ MRLLERMRKDWFMVGIVLAIAGAKLEPSIGVNGGPLKPEITVSYIAVAT I FFNSGLSLKT 
PRD ccchhhhhhhhhhhhhhhhhhhhhcccccccccccccceeeeeeeccccccccccchhhh 
MEM . , . . .MMMMMMMMMMMMMMMMMMMMMMMMMM. . MMMMMMMMMMMMMMMMMMMMMM . . 


SEQ EELTSALVHLKLHLFIQI FTLAFFPAT I WLFLQLLS IT P I NEWLLKGLQT VGCMPPPVSS 
PRD hhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhccccchhhhhhhheeeecccccccc 
MEM NMMMMMMMMMMMMMMMMMMMMMMMMMMMM 


SEQ AVILTKAVGGNEAAAI FNSAFGSFLVSKHSLTCLLQLLL 
PRD ceeeeeccccchhhhhhhcccccceeecceeeeeeeccc 
MEIM . MMMMMMMMMMMMMMMMMMMMMMMMMMM 


Prosite for DKFZphf kd2_4mll .3 

PS00005 57->60 PKC_PH0SPH0_S1TE PDOC00005 

PS00008 15->21 MYRISTYL PDOC00008 

PS00008 129->135 MYRISTYL PDOC00008 

(No Pfam data available for DKFZphf kd2_4mll.3) 
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PAGE INTENTIONALLY LEFT BLANK 
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DKFZphutel_17k7 


group: uterus derived 

DKFZphutel_17k7 encodes a novel 520 amino acid protein with weak similarity to S. Cerevisiae 
Fipl . 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of uterus-specific 
genes . 


similarity to S. cerevisiae Fipl 

complete cDNA, complete cds, EST hits 

Sequenced by BMFZ 

Locus : unknown 

Insert length: 1914 bp 

Poly A stretch at pos. 1897, polyadenylation signal at pos. 1867 


1 CGGACGCGTG GGCGGACGCG TGGGGCCTTC CTGGGATTGG AGTCTCGAGC 
51 TTTCTTCGTT CGTTCGCCGG CGGGTTCGCG CCCTTCTCGC GCCTCGGGGC 
101 TGCGAGGCTG GGGAAGGGGT TGGAGGGGGC TGTTGATCGC CGCGTTTAAG 
151 TTGCGCTCGG GGCGGCCATG TCGGCCGGCG AGGTCGAGCG CCTAGTGTCG 
201 GAGCTGAGCG GCGGGACCGG AGGGGATGAG GAGGAAGAGT GGCTCTATGG 
251 CGATGAAAAT GAAGTTGAAA GGCCAGAAGA AGAAAATGCC AGTGCTAATC 
301 CTCCATCTGG AATTGAAGAT GAAACTGCTG AAAATGGTGT ACCAAAACCG 
351 AAAGTGACTG AGACCGAAGA TGATAGTGAT AGTGACAGCG ATGATGATGA 
401 AGATGATGTT CATGTCACTA TAGGAGACAT TAAAACGGGA GCACCACAGT 
451 ATGGGAGTTA TGGTACAGCA CCTGTAAATC TTAACATCAA GACAGGGGGA 
501 AGAGTTTATG GAACTACAGG GACAAAAGTC AAAGGAGTAG ACCTTGATGC 
551 ACCTGGAAGC ATTAATGGAG TTCCACTCTT AGAGGTAGAT TTGGATTCTT 
601 TTGAAGATAA ACCATGGCGT TIAACCTGGTG CTGATCTTTC TGATTATTTT 
651 AATTATGGGT TTAATGAAGA TACCTGGAAA GCTTACTGTG AAAAACAAAA 
701 GAGGATACGA ATGGGACTTG AAGTTATACC AGTAACCTCT ACTACAAATA 
751 AAATTACGGT ACAGCAGGGA AGAACTGGAA ACTCAGAGAA AGAAACTGCC 
801 CTTCCATCTA CAAAAGCTGA GTTTACTTCT CCTCCTTCTT TGTTCAAGAC 
851 TGGGCTTCCA CCGAGCAGGA GATTACCTGG GGCAATTGAT GTTATCGGTC 
901 AGACTATAAC TATCAGCCGA GTAGAAGGCA GGCGACGGGC AAATGAGAAC 
951 AGCAACATAC AGGTCCTTTC TGAAAGATCT GCTACTGAAG TAGACAACAA 
1001 TTTTAGCAAA CCACCTCCGT TTTTCCCTCC AGGAGCTCCT CCCACTCACC 
1051 TTCCACCTCC TCCATTTCTT CCACCTCCTC CGACTGTCAG CACTGCTCCA 
1101 CCTCTGATTC CACCACCGGG TTTTCCTCCT CCACCAGGCG CTCCACCTCC 
1151 ATCTCTTATA CCAACAATAG AAAGTGGACA TTCCTCTGGT TATGATAGTC 
1201 GTTCTGCACG TGCATTTCCA TATGGCAATG TTGCCTTTCC CCATCTTCCT 
1251 GGTTCTGCTC CTTCGTGGCC TAGTCTTGTG GACACCAGCA AGCAGTGGGA 
1301 CTATTATGCC AGAAGAGAGA AAGACCGAGA TAGAGAGAGA GACAGAGACA 
1351 GAGAGCGAGA CCGTGATCGG GACAGAGAAA GAGAACGCAC CAGAGAGAGA 
1401 GAGAGGGAGC GTGATCACAG TCCTACACCA AGTGTTTTCA ACAGCGATGA 
1451 AGAACGATAC AGATACAGGG AATATGCAGA AAGAGGTTAT GAGCGTCACA 
1501 GAGCAAGTCG AGAAAAAGAA GAACGACATA GAGAAAGACG ACACAGGGAG 
1551 AAAGAGGAAA CCAGACATAA GTCTTCTCGA AGTAATAGTA GACGTCGCCA 
1601 TGAAAGTGAA GAAGGAGATA GTCACAGGAG ACACAAACAC AAAAAATCTA 
1651 AAAGAAGCAA AGAAGGAAAA GAAGCGGGCA GTGAGCCTGC CCCTGAACAG 
1701 GAGAGCACCG AAGCTACACC TGCAGAATAG GCATGGTTTT GGCCTTTTGT 
1751 GTATATTAGT ACCAGAAGTA GATACTATAA ATCTTGTTAT TTTTCTGGAT 
1801 AATGTTTAAG AAATTTACCT TAAATCTTGT TCTGTTTGTT AGTATGAAAA 
1851 GTTAACTTTT TTTCCAAAAT AAAAGAGTGA ATTTTTCATG TTAAGTTAAA 
1901 AAAAAAAAAA AAAA 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 
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Peptide information for frame 3 


ORF from 168 bp to 1727 bp; peptide length: 520 
Category: similarity to known protein 


1 MSAGEVERLV SELSGGTGGD EEEEWLYGDE NEVERPEEEN ASANPPSGIE 
51 DETAENGVPK PKVTETEDDS DSDSDDDEDD VHVTIGDIKT GAPQYGSYGT 
101 APVNLNIKTG GRVYGTTGTK VKGVDLDAPG SINGVPLLEV DLDSFEDKPW 
151 RKPGADLSDY FNYGFNEDTW KAYCEKQKRI RMGLEVIPVT STTNKITVQQ 
201 GRTGNSEKET ALPSTKAEPT SPPSLFKTGL PPSRRLPGAI DVIGQTITIS 
251 RVEGRRRANE NSNIQVLSER SATEVDNNFS KPPPFFPPGA PPTHLPPPPF 
301 LPPPPTVSTA PPLIPPPGFP PPPGAPPPSL IPTIESGHSS GYDSRSARAF 
351 PYGNVAFPHL PGSAPSWPSL VDTSKQWDYY ARREKORDRE RDRDRERDRD 
401 RDRERERTRE RERERDHSPT PSVFNSDEER YRYREYAERG YERHRASREK 
451 EERHRERRHR EKEETRHKSS RSNSRRRHES EEGDSHRRHK HKKSKR5KEG 
501 KEAGSEPAPE QESTEATPAE 

BLASTP hits 
Entry AF016427_4 from database TREMBL: 

gene: "F32D1.9"; Caenorhabditis elegans cosmid F32D1 . 

Score = 392, P = 1.8e-36, identities = 156/519, positives = 212/519 

Entry S624S4 from database PIR: 

hypothetical protein SPAC22G7.10 - fission yeast (Schizosaccharomyces 
pombe) 

Score = 246, P » 2.0e-22, identities = 62/163, positives 91/163 

Entry A56545 from database PIR: 

FIPl protein - yeast (Saccharomyces cerevisiae) 

Score = 186, P - 2.9e-16, identities - 56/206, positives « 92/206 


Alert BLASTP hits for DKFZphutel_17k7, frame 3 

TREMBLNEW: API 09907^1 product: "5164"; Homo sapiens 8164 gene, partial 
cds; PSl and hypothetical protein genes, complete cds; and S171 gene, 
partial cds., N « 2, Score - 236, P 1.5e-16 


>TREMBLNEW:AF109907_1 product: "3164"; Homo sapiens SI 64 gene, partial cds; 
PSl and hypothetical protein genes, complete cds; and S171 gene, partial 
cds . 

Length =735 

HSPs: 

Score = 236 (35.4 bits). Expect = 1.5e-16, Sum P(2) = 1.5e-16 
Identities = 51/120 (42%), Positives = 76/120 {63%) 

Query: 383 REKDRDRERDRDRERDRDRDRERERTRERERERDHSPTPSVFNSDEERYRYREYA ER 439 

REK+^+RER+R+R-^RDRDR +ER+R R^-RER+RD S + +++R R RE + ER 

Sbjct: 227 REKEKERERERERDRDRDRTKERDRDRDRERDRDRDRERSS-DRNKDRSRSREKSRDRER 285 

Query: 440 GYERHRASREKEERHRER-RHREKEETRHKSSRSNSRRRHESEEGDSHRRHKHKKSKRSK 498 

ER R + ER RER R RE+E R + + +R E +E D++ R K ++ R K 
Sbjct: 286 EREREREREREREREREREREREREREREREREKDKKRDREEDEEDAYERRKLERKLREK 345 

Query: 499 E 499 
E 

Sbjct: 346 E 346 

Score - 214 (32.1 bits). Expect = 4.4e-14, Sum P(2) = 4.4e-14 
Identities => 50/133 (37%), Positives = 75/133 (56%) 

Query: 383 REKDRDR-ERDRDRERDRDRDRERERTRERERERDHSPTPSVFNS-DEERYRYREYAERG 440 

RE++R+R ER+R+RER+R+R++E+ER RERER+RD T D ER R R+ ER 

Sbjct: 208 RERERERREREREREREREREKEKERERERERDRDRDRTKERDRDRDRERDRDRD-RERS 266 

Query: 441 YERHRASREKEERHRERRHREKEETRHKSSRSNSRRRHESEEGDSHRRHKHKKSKRSKEG 500 

+R++ E+ R+R RE+E R+R RR E+R+++KK 

Sbjct: 267 SDRNKDRSRSREKSRDRE-RERERERERE-REREREREREREREREREREREREKDKKRD 324 

Query: 501 KEA6SEPAPEQESTE 515 

-►E E A E+ E 
Sbjct: 325 REEDEEDAYERRKLE 339 
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Score = 214 (32.1 bits). Expect = 4.4e-14, Sum P(2) = 4.4€-14 
Identities = 55/141 (39%), Positives = 80/141 (56%) 


Query: 

383 

REKDRDR-ERDRDRERDRDRDRERERTRERERERDHS PTPSVFNS -DEERYRYREYAERG 

440 



RE++R+R ER+R+RER+R+R++E+ER RERER+RD T D ER R R+ ER 


Sbjct: 

208 

RERERERREREREREREREREKEKERERERERDRORDRTKERDRDRDRERDRDRD-RERS 

266 

Query: 

441 

YERHR-ASREKEE-RHRER-RHREKEETRHKSSRSNSRRRHESEEGDSHRRHKHKKSKRS 

497 



+R++ SR +E+ R RER R RE+E R+ REE RKKK R 


Sbjct: 

267 

SDRNKDRSRSREKSRDREREREREREREREREREREREREREREREREREREKDKKRDRE 

326 

Query: 

498 

KEGKEAGSEPAPEQESTEATPA 519 




++ ++A E++ E A 


Sbjct: 

327 

EDEEDAYERRKLERKLREKEAA 348 


Score 

= 210 

(31.5 bits). Expect = 1.2e-13, Sum P(2) = 1.2e-13 



Identities = 59/142 (41%), Positives = 78/142 (54%) 


Query: 

383 

EIEKDRDRERDRDRERDRDRDRERERTRERERERDHSPTPSVFNS DEERYRYREYAER 

439 


RE++RDR+RDR +ERDRDRDRER+R R+RER D+ S DERRREER 


Sbjct: 

235 

RERERDRDRDRTKER0RDRDRERDR0RDRER5SDRNKDRSRSREKSRDRERERERE-RER 

293 

Query: 

440 

GYERHRA-SREKE-ERHRER-RHREKEETRHRSS RSNSRRRHESEEGDSHRRH 

489 



ER R RE-fE ER RER R REK+'^ R R R-»- ■ -l-E R 


Sbjct: 

294 

EREREREREREREREREREREREREKDKKRDREEDEEDAYERRKLERKLREKEAAYQERL 

353 

Query: 

490 

KHKKSKRSKEGKEAGSEPAPEQE 512 




K+ + + K+ +E E E+E 


Sbjct: 

354 

KNWEIRERKKTREYEKEAEREEE 376 


Score 

- 205 

(30.8 bits), Expect = 4.4e-13, Sum P(2) « 4.4e-13 


Identities - 59/149 (39%), Positives = 83/149 (55%) 


Query: 

372 

DTSKQNDYYARREKDRDR — ERDRDRBRDRDRDRERERTRERERERDHSPTPSVFNSDEE 

429 


+ K+ + R++DRDR ERDRDR+R+RDRDR+RER+ +R ++R S S D E 


Sbjct: 

228 

EKEKERERERERDRDRDRTKERDRDRDRERDRDRDRERSSDRKKDRSRSREKS RDRE 

284 

Query: 

430 

RYRYREYAERGYERHRA-SREKE-ERHRER-RHREKEETRHKSS RSNSRRRHE 

479 



R R RE ER ER R RE+E ER RER R REK++ R + R R+ 


Sbjct : 

285 

RERERE-REREREREREREREREREREREREREREKDKKRDREEDEEDAYERRKLERKLR 

343 

Query : 

480 

SEEGDSHRRHKHKKSKRSKEGKEAG5EPAPEQE 512 




+E R K+ + + K+ +E E E+E 


Sbjct: 

344 

EKEAAYQERLKNWEIRERKKTREYEKEAEREEE 376 


Score 

- 202 

(30.3 bits). Expect « 9.6e-13, Sum P(2) - 9.6e-13 


Identities = 49/117 (41%), Positives = 70/117 (59%) 


Query: 

383 

REKDRDRERDRDRERDRDRORERERTRERERERDHSPTPSVFNSDEERYRYREYAERGYE 

442 


REK RDRER+R+RER+R+R+RERER RERERER+ D++R R E E YE 


Sbjct: 

277 

REKSRDRERERERER£RERERERERERBRERERERERER£R-EKDKKROR-EEDEEDAYE 

334 

Query: 

443 

RHRASREKEERHRERRHREKEETRHKSSRSNSRR-RHESEEGDSHRRHKHKKSKRSKE 4 99 



R + E++ R +E ++E+ + R +R E+E +• RR K++KR KE 


Sbjct: 

335 

RRKL- -ERKLREKEAA YQERLKNWEI RERKKTREYEKEAEREEERRREMAKEAKRLKE 3 90 

Score 

- 183 

(27.5 bits), Expect =■ 1.2e-10, Sum P(2) ^ 1.2e-10 


Identities « 52/141 (36%), Positives « 79/141 (56%) 


Query: 

372 

DTSKQWDYY-ARREKDRDR-ERDRDRERDRDRDRERERTRERERERDHSPTPSVFNSDEE 

429 


DT K+ + ++EK+R E++R RER+R+R+RERER RERERER+ ++E 


Sbjct: 

178 

DTHKKLEEEKGKKEKERQEIEKER-RERERERERERER-RERERERERER EREKE 

230 

Query: 

430 

RYRYREYAERGYERHRASREKEERHRER RHREKEETRHKSSRSNSRRRHESEEGDSH 

486 



+ R RE ER +R R +R RER R RE+ R+K RS SR + E + 


Sbjct: 

231 

KERERE-RERDRDRDRTKERDRDR0RERDRDRDRERSSDRNKO>RSRSREKSRDRERERE 

288 

Query: 

487 

RRHKHKKSKRSKEGKEAGSEPAPEQE 512 




R + ++ + + +E E E+E 


Sbjct: 

289 

REREREEUEIRERERERERERERERERE 314 


Score 

= 171 

(25.7 bits), Expect » 2.56-09, Sum P(2) " 2.Se>09 


Identities -= 49/150 (32%), Positives - 78/150 (52%) 


Query : 

383 

REKDRDRERDRDRERDRDRDRERERTRERERERDHSPTPSVFNSDEERYRYREYAERGYE 

442 


RE++R+RER+R+RER+R+R+RERER RERERER+ +E+ Y R+ + E 


Sbjct: 

285 

REREREREREREREREREREREREREREREREREKDKKRDREEDEEDAYERRKLERKLRE 

344 

Query: 

443 

RHRASREK EERHRERRHR EKEETRHKSSRSNSRRRHES-EEGDSHRRH-KH 

491 


+ A +E+ ER + R + E+EE R + ++R E E+ D R K+ 


Sbjct: 

345 

KEAAYQERLKNWEIRERKKTREYEKEAEREEERRREMAKEAKRLKEFLBDYDDDRDDPKY 

404 
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Query: q92 KKSKRSKEGKEAGSEPAPEQESTE 515 

+K R +E + E ++E E 
Sbjct: 405 YRGSALQKRLRDREKEMEADERDRKREKEE 434 

Score = 162 (24.3 bits). Expect = 2.4e-08, Sum P(2) = 2.4e-08 
Identities = 45/141 (31%), Positives = 74/141 (52%) 

Query: 372 DTSKQWDYYARREKORDRERDRDRERDRDRDRCRERTRERERERDHSPTPSVFMSOEERY 431 

+ SK D + + E+++ ++ +E +++R RERER RERERER + ER 

Sbjct: 172 EISKFRDTHKKLEEEKGKKEKERQEIEKER-RERERERERERERRERERER--ERERERE 228 

Query: . 432 RYREYAERGYERHRASREKEERHRER-RHREKEETRHKSSRSNSRRRHESEEGDSHRRHK 490 

+ +E ER ER R +ER R+R R R+ + + R +SS N R E+ R + 

Sbjct: 229 KEKE-RERERERDRDRDRTKERDRDRDRERDRDRDRERSSDRNKORSRSREKSRDRERER 287 


Query: 

491 

HKKSKRSKEGKEAGSEPAPEQE 512 




++ +R +E +E E E+E 


Sbjct: 

288 

ERERERERE-RERERERERERE 308 


Score 

* 137 

(20.6 bits). Expect *= 1.2e-05, Sum P(2) = l,2e-05 


Identities 

« 48/152 (31%), Positives « 68/152 (44%) 


Query: 

364 

APSWPSLVDTSKQWDYYARREKDRDR-ERDRDRERDRDRDRERERTRERERERDHSPTPS 

422 



AP P + T + + E RD R+ + RD 4 E E+ + +E+ER 


Sbjct: 

143 

APLIPYPLITKEDINAIEMEEDKRDLISREISKFRDTHKKLEEEKGK-KEKERQEIEKER 201 

Query: 

423 

VFNSDEERYRYREYAERGYERHRA-SREKE-ERHRER-RHREKEETRHKS-SRSNSRRRH 

478 



+ ER R RE ER ER R REKE ER RER R R+++ T+ + R R R 


Sbjct: 

202 

R-ERERERERERERREREREREREREREKEKERGRERERDRDRORTKERDRDRDRERDRD 

260 

Query: 

479 

ESEEGDSHRRHKHKKSKRSKEGKEAGSEPAPEQE 512 




E S R +S+ +E E E+E 


Sbjct: 

261 

RDRERSSORNKDRSRSREKSRDRERERERERERE 294 


Score 

= 126 

(18.9 bits). Expect - 1.8e-04, Sum P(2) « 1.8e-04 


Identities - 41/149 (27%), Positives « 66/149 (44%) 


Query: 

375 

K(?WDYYARREKDRDRERDRDRERDRDRDRERERTRERERERDHSPT PSVFNSD— EE 

429 



K W+ R+K R+ E++ +RE +R R+ +E R +E D+ P + ++ 


Sbjct: 

354 

KNWEI-RERKKTREYEKEAEREEERRREMAKEAKRLKEFLEDYDDDRDDPKYYRGSALQK 

412 

Query: 

430 

RYRYREYAERGYERHRASREKEERHRERR HREKEETRHKSSRSNSRRRHES--E 

481 



R R RE ER R REKEE R+ H + + + + RRR ' + 


Sbjct; 

413 

RLRDREKEMEADERDR-KREKEELEEIRQRLLAEGHPDPDAELQRMEQEAERRRQPQIKQ 

471 

Query: 

482 

EGOSHRRHKHKKSKRSKEGKEAGSEFAPEQE 512 




E +S + K+ K K + E PEQ+ 


Sbjct : 

472 

EPESEEEEEEKQEKEEKREEPMEEEEEPEQK 502 


Score 

= 124 

(18.6 bits). Expect = 3.0e-04, Sum P(2) = 3.0e-04 


Identities * 41/141 (29%), Positives « 65/141 (46%) 


Query: 

380 

YARREKDRD-RERDRDRERDRDRDRERERTRERERERDHSPTPSVFKSDEERYRYREYAE 

438 



Y R K+ + RER + RE +++ +RE ER RE +E + + D++R + Y 


Sbjct: 

349 

YQERLKNWEIRERKKTREYEKEAEREEERRREMAKEAKRLKE-FLEDYDDDRDDPKYYRG 

407 

Query: 

439 

RGYERHRASREKEERHRER>RHREKEETRHKSSRSNSRRRHESEEGOSHRRHKHKKSKRS 

497 



++ REKE ER R REKEE R + H ++R + ++R 


Sbjct: 

408 

SALQKRLRDREKEMEADERDRKREKEELEEIRQRLLAEG-HPDPDAELQRMEQEAERRRQ 

466 

Query: 

498 

KEGKEAGSEPAPEQESTEATPAE 520 




+ K+ EP E+E E E 


Sbjct : 

467 

PQIKQ EPESEEEEEEKQEKE 486 


Score 

= 121 

(18.2 bits). Expect = 6.2e-04, Sum P(2) = 6.2e-04 


Identities = 43/149 (28%), Positives = 67/149 (44%) 


Query: 

364 

APSWPSLVDTSKQWDYYARREKDRDR-ERDRDRERDRDRDRERERTRERERERDHSPTPS 

422 



AP P .+ T + + E RD R+ + RD + E E+ + +E+ER 


Sbjct: 

143 

APLIPYPLITKEDINAIEMEEDKRDLISREISKFRDTHKKLEEEKGK-KEKERQEIEKE- 

200 

Query: 

423 

VFNSDEERYRYREYAERGYERHRASREKEERHRERRHREKEETRHKSSRSNSRRRHESEE 

482 



+ ER R RE R ER R RE+E + R RE+E R + R+ R R E 


Sbjct: 

201 

— RRERERERERERERRERERER-EREREREKEKERERERERDRDRD-RTKERDRDRDRE 

256 

Query : 

483 

GDSHRRHKHKKSKRSKEGKEAGSCPAPEQE $12 




D R + + S R+K+ + E + ++E 


Sbjct: 

257 

RDRDR-DRERSSDRNKD-RSRSREKSRDRE 284 


Score 

« 105 

(15.8 bits), Expect - 3.1e-02, Sum P(2) = 3.1e-02 
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Identities = 25/73 (34%), Positives = 33/73 (45%) 


Query: 

428 

EERYRYREYAERGYERHRASREKE-ERHRERRHREKEETRHKSSRSNSRRRHESEEGDSH 

486 



EE +E + E+ R RE+E ER RERR RE+E R + REE 


Sbjct : 

184 

EEEKGKKEKERQEIEKERRERERERERERERREREREREREREREKEKERERERERDRDR 

243 

Query : 

487 

RRHKHKKSKRSKE 499 




R K + R +E 


Sbjct : 

244 

ORTKERDRDRDRE 256 


Score 

= 105 

(15.8 bits), Expect = 3.1e-02, Sum P(2) « 3.16-02 


Identities - 

= 31/87 (35%), Positives = 45/87 (51%) 


Query : 

382 

RREKORDRERDRDRERDRDRDRER-ERTRERERERDHSPTPSVFKSDEERYRYREYAERG 

440 



+R +DR++E + D ERDR R++E E R+R H P P D E R + AER 


Sbjct: 

412 

KRLRDREKEMEAD-ERDRKREKEELEEIRQRLLAEGH-PDP DAELQRMEQEAERR 

464 

Query: 

441 

YERHRASREKEERHRERRHREKEETRHK 468 




+ + +E E E +EKEE R + 


Sbjct: 

465 

-RQPQIKQEPESEEEEEEKQEKEEKREE 491 


Score 

=* 46 

(6.9 bits). Expect = 1.5e-16, Sura P(2) = 1.5e-16 



Identities 13/49 (26%), Positives 21/49 (42%) 

Query: 54 AENGVPKPKVTETEDDSDSDSDDDEDDVHVTIGDIKTGAPQYGSYGTAP 102 

A NG +P+ +D+ D + D + G 1+ +Y S AP 
Sbjct: 70 ASNGNARPETVTNDDEEALDEETKRRDQMIK-GAIEVLIREYSSELNAP 117 

Score - 46 (6.9 bits), Expect = 1.8e-04, Sum P(2) » 1.8e-04 
Identities « 14/53 (26%), Positives 21/53 (39%) 

Query: 30 ENEVERPEEENASANPPSGIEDETAENGVPKPKVTETEDDSDSDSDDDEDDVH 82 

+ EERE EEE ++EED D ++DE+D + 

Sbjct: 282 DRERERERERERERERERERERER-EREREREREREREKDKKRDREEDEEDAY 333 

Score =44 (6.6 bits). Expect = 2.0e-13, Sum P(2) = 2.0e-13 
Identities = 13/60 (21%), Positives « 21/60 (35%) 

Query: 20 DEEEEWLYGDENEVERPEEENASANFPSGIEDETAENGVPKPKVTETEDDSDSOSDDDED 79 

+4-E + + + EERE + E K-f-EEDDD +D 

Sbjct: 191 EKERQEIEKERRERERERERERERREREREREREREREKEKERERERERDRDRDRTKERD 250 


Pedant information for DKFZphutel_17k7, frame 3 


Report for DKF2phutel_17lc7. 3 


[LENGTH] 520 

[MWl 58375.30 

(pIJ 5.41 

(HOMOLl PIR:S62454 hypothetical protein SPAC22G7.10 


(Schizosaccharomyces pombe) 3e-18 


fFUNCAT] 04.05.05 mrna 
cerevisiae, YJR093cl 2e-13 

IFUNCATI 30.10 nuclear organization 

(PROSITEJ MYRISTYL 9 

(PROSITEl AMIDATION 1 

IPROSITE] CK2_PH0SPH0_SITE 18 

{PROSITEJ TYR_PHOSPH0_SITE 2 

(PROSITEJ PKC_PHOSPHO_SITE 12 

tPROSITE) ASNGLYCOSYLATION 2 

[KWJ AlphaBeta 

(KW) LOW COMPLEXITY 35-00 % 


fission yeast 

processing (5 '-end, 3 '-end processing and mrna degradation) 
(S. cerevisiae, YJR093c) 2e-13 


SEQ MSAGEVERLVSELSGGTGGDEEEEWLYGDENEVERPEEENASANPPSGIEDETAENGVPK 

SEG xxxxxxxxxx 

PRD cccchhhhhhhhcccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ PKVTETEDDSDSDSDDDEDDVHVTIGDIKTGAPQYGSYGTAPVNLNIKTGGRVYGTTGTK 

SEG . , .xxxxxxxxxxxxxxxxx 

PRD cceeeecccccccccccccceeeeeccccccccccccccccceeeeeecccceeeccccc 

SEQ VKGVDLDAPGSINGVPLLEVDLDSFEDKPWRKPGADLSDYFNYGFNEDTWKAYCEKQKRI 

SEG 

PRD ceeeccccccccccceeeeccccccccccccccccccccccccccccchhhhhhhhhhhh 

SEQ RMGLEVIPVTSTTNKITVQQGRTGNSEKETALPSTKAEFTSPPSLFKTGLPPSRRLPGAI 

SEG 
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PRD hhhheeeeeccccceeeeeeecccccccccccccceeeeccccceeeecccccccccccc 

SEQ DVIGQTITISRVEGRRRANENSNIQVLSERSATEVDNNFSKPPPFFPPGAPPTHLPPPPF 

SEG xxxxxxxxxxxxxxxxxxx 

PRD ccccceeeeeecccccccccccceeecccccccccccccccccccccccccccccccccc 

SEQ LPPPPTVSTAPPLIPPPGFPPPPGAPPPSLIPTIESGHSSGYDSRSARAFPYGNVAFPHL 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccceeeccc 

SEQ PGSAPSWPSLVDTSKQWDYYARREKDRDRERDRDRERDRDRDRERERTRERERERDHS PT 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx .... 

PRD ccccccccceeeccccchhhhhhhhhhccccccccccccccchhhhhhhhhhhhcccccc 

SEQ PSVFNSDEERYRYREYAERGYERHRASREKEERHRERRHREKEETRHKSSRSNSRRRHES 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccchhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccc 

SEQ EEGDSHRRHKHKKSKRSKEGKEAGSEPAPEQESTEATPAE 

SEG XX . .xxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccc 


Prosite for DKFZphutel_17k7. 3 


PSOOOOi 

PSOOOOl 

PS00O05 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS000D6 

FS00006 

PS00006 

PS00006 

PS0OOO6 

PS0OOO6 

PS00006 

PS00006 

PS00006 

PS00006 

PS0OO06 

PS00006 

PS00006 

PS0O007 

PS00007 

PS0O008 

PS00008 

PS0OOO8 

PS00008 

PS0O008 

PS00008 

PS00008 

PS00008 

PS0O008 

PS0O0O9 


278->282 
169->172 
193->196 
206->2O9 

214- >217 
233->236 
268->271 
346->349 
373->375 
469->472 
474->477 
485->488 
494->497 

2->6 
17->21 
47->51 
64->68 
66->70 
70->74 
72->76 
74->78 
84->88 
144->148 
206->210 

215- >219 
250->254 
271->275 
273->277 
340->344 
369->373 
426->430 
434->442 
152->161 

15->21 
96->102 
115->121 
130->136 
154-M60 
229->235 
244->250 
289->295 
362->368 
253->257 


ASN_GL YCOS Y LAT I ON 

AS N_GL YCOS YLAT I ON 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2PHOSPHO_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PHOSPHO_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0 SITE 

CK2_PHOSPHO~SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_STTE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0__SITE 

CK2 PHOSPHOSITE 

TYR~PHOSPHO_SITE 

TYR_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

AMI DAT ION 


PDOCOOOOl 

PDOCOOOOl 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOCD0005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

POOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00005 

PDOC00005 

PDOC00006 

PDOC00D06 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

FOOCO00O6 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00007 

PDOC00007 

PDOC00008 

PDOC00008 

PDOC00008 

PDOC00008 

PDOCOOOOB 

PDOC00008 

PDOC00008 

PDOCOOOOB 

PDOC00008 

PDOC00009 


(No Pfam data available for DKF2phutel__17k7. 3) 
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DKF2phutel_18cl2 


group: uterus derived 

DKFZphuCel_18cl2 encodes a novel 378 amino acid protein nearly identical to human 
WUGSC:H_DJ0872F07. 1 protein. 

The novel protein has an additional N-terminal domain, which is not present in 
WUGSC : H_DJ0872F07 . 1 . 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of uterus-specific 
genes. 


nearly identical to human WUGSC :H_D JOS 72 F07 .1 protein 

on genomic level encoded by AC004537, 10 exons the predicted 
protein sequence AC004 537_1 is only partialy o.k. first exon wasn't 
predicted there are additional exons predicted 
(BLASTX/EST-BLAST shows that the cDNA is only party spliced) 
intron -1216-354O//-3577-5059 

Sequenced by AGOWA 

Locus: inap=''7q31*' 

Insert length: 6005 bp 

Poly A stretch at pos. 5980, polyadenylation signal at pos, 5968 


1 AGCGGGTGCT GCTAGCGGAG GCGCCATATT GGAGGGGACA AAACTCCGGC 
51 GACAGCGAGT GACACAAATA AACCCCTGGA CCCCCTTGTT CCCTCAGCTC 
101 TAAGGGCCGC GATGTTGTAC CTAGAAGACT ATCTGGAAAT GATTGAGCAG 
151 CTTCCTATGG ATCTGCGGGA CCGCTTCACG GAAATGCGCG AGATGGACCT 
201 GCAGGTGCAG AATGCAATGG ATCAACTAGA ACAAAGAGTC AGTGAATTCT 
251 TTATGAATGC AAAGAAAAAT AAACCTGAGT GGAGGGAAGA GCAAATGGCA 
301 TCCATCAAAA AAGACTACTA TAAAGCTTTG GAAGATGCAG ATGAGAAGGT 
351 TCAGTTGGCA AACCAGATAT ATGACTTGGT AGATCGACAC TTGAGAAAGC 
401 TGGATCAGGA ACTGGCTAAG TTTAAAATGG AGCTGGAAGC TGATAATGCT 
451 GGAATTACAG AAATATTAGA GAGGCGATCT TTGGAATTAG ACACTCCTTC 
501 ACAGCCAGTG AACAATCACC ATGCTCATTC ACATACTCCA GTGGAAAAAA 
551 GGAAATATAA TCCAACTTCT CACCATACGA CAACAGATCA TATTCCTGAA 
601 AAGAAATTTA AATCTGAAGC TCTTCTATCC ACCCTTACGT CAGATGCCTC 
651 TAAGGAAAAT ACACTAGGTT GTCGAAATAA TAATTCCACA GCCTCTTCTA 
701 ACAATGCCTA CAATGTGAAT TCCTCCCAAC CTCTGGGATC CTAT7VACATT 
751 GGCTCGTTAT CTTCAGGAAC TGGTGCAGGG GCAATTACCA TGGCAGCTGC 
801 TCAAGCAGTT CAGGCTACAG CTCAGATGAA GGAGGGACGA AGAACATCAA 
851 GTTTAAAAGC CAGTTATGAA GCATTTAAGA ATAATGACTT TCAGTTGGGA 
901 AAAGAATTTT CAATGGCCAG GGAAACAGTT GGCTATTCAT CATCTTCGGC 
951 ACTTATGACA ACATTAACAC AGAATGCCAG TTCATCAGCA GCCGACTCAC 
1001 GGAGTGGTCG AAAGAGCAAA AACAACAACA AGTCTTCAAG CCAGCAGTCA 
1051 TCATCTTCCT CCTCCTCTTC TTCCTTATCA TCGTGTTCTT CATCATCAAC 
1101 TGTTGTACAA GAAATCTCTC AACAAACAAC TGTAGTGCCA GAATCTGATT 
1151 CAAATAGTCA GGTTGATTGG ACTTACGACC CAAATGAACC TCGATACTGC 
1201 ATTTGTAATC AGGTAAAAGT CTGTTATATC TATAAAAGTA TAATCTGAAT 
1251 AAACTAGAAG GAAGAGAACT ATTTCATTTT TAAGCACTTT TTTAAACTCA 
1301 CTTAAAATAC CTTTGCTTTA TTTGTATACT TTTCTCCCCC TTCTTACAAA 
1351 AGTGACATTT GCTGTAAATA CTGAGTATAA AGAAAAATGT TACCCATAAT 
1401 CCTAGCCCTC AGATACAACC TGTAACTAAA CATTTTTGGT ATACCACTAC 
1451 CATATACCTC ATGTGCACAT TGGCTGCCTT AATAAAATAC AACAGACTGG 
1501 GTAGCTTAAA CAACAGAAAA TMTTTTCTC ACAGGTATGA AGGCTGGGAA 
1551 GTCCAAGATC AAGGTGTCCA CTGACTCAGT TCTGGAGGAG GGCTCCCTTC 
1601 CTAGATGGAG ACTGCTGCCT TCTCACCGGG TCCTCACATG ATAGAGGGAG 
1651 AAAGAGTGTG CTCTGGTGTC TTTTCTTATA AGGGCACCAG CCTTGTCAGA 
1701 GTAGGACCCC ACTCTATGAC CTCATTTAAC CTTTACCACC TCCTCACAGG 
1751 CCCTGTTTCC AATTATAGTC ACGTTGGGGG TTAGGGCTTC AACATATGAT 
1801 TTTGAGACAT AAGCTTGCAT TTCATAACAC GTGTCTATGC AGATTTGCAC 
1851 ATGCATGTGT GTATAAGTTT GTCAGTAGGA ACCACAGTGT ATACTTTCTT 
1901 GTTACTGGCT TTTTTCTCTA AATCAGGTAT ACCGAACATG ATTTTTCTTT 
1951 AAGATCATAT TTTTAATTTT CACATAGTTA TCTCTTATGC CATCCAGTGT 
2001 AGTTTTCTTA ACCAATACCT AGCTATAGAT TATATTAGTG GTTTTAATTT 
2051 GTTTGAAATT AGGGATAATA TTACGATAGG CATTTTTTAA ATGTAATCCA 
2101 TTTTATACAT CTAATTTCTT GGATAATCTT TTAGAAATAA AATTAGGCTG 
2151 TAAATATTTG ACAGACACCA AAATATATTT TCTAGAAATT TATTACCAAA 
2201 AATTAATAAA CATACCGGTT TACTAAACCC TGTCCAACAC TGGATATTAT 
2251 TTTCTTTTAA AAACTAAGTA CCAATTTGGT AGTTTTATAT TATGATTGTT 
2301 TTAAATACAC TAGTATTATT GAAGTTGGAC ATTTTTTGAC CATTTTTGTT 
2351 TTTTACATTA TGAATCGACT CCTAATGGTG TCGGCTGATT TTTCTATTGT 
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2401 TTTTGTTATG TACTCTAAAT ATTTGCTTGA TTTAGTTTTT TAAAAATAAT 
24 51 TCTAAAATTT TAATTTTATG TAGTTATGAC TGTTAATTTT TTTTTATGAA 
2501 GCAAGCCATG GATTATATAC TTAGAAGGGC TTTCTCTTTG GCTCTTCTTT 
2551 CTACAAAAAA TTGTCTTGTA TAATATTTTC TCCTAGTTTT TATATGGTTT 
2601 TGTCTAGTTC TTTGCATGCT TCAGTTTCTT CACATTTAAG ACTTAGTCTA 
2651 TCAGCAGATT ATTGTGTCTA ACAGTATGAG TTGCCAGTCT GATTTTTAAA 
2701 AATTTTAACA ATTTGTTAGC TGTTCCACTA TCACCCGATA AACATTTTTC 
2751 AGTACAAATG ATAGAAAAGC ATATCCTGTA TCCTGACAAC AAAAGTAGAT 
2801 TACTTGCAAA AGAACAAAAT CAGACTGAAC CTAGAGTTTT CCTCTGTAAC 
2851 ACTAAAAAAC TAGAAGGTGA TGGAATATGT CTGTAGAGCT TTCAGGGAAA 
2901 AATTAAGAGC CCCCAAAAAC TTGATATTCA GAGAAGTTAT TTCTCTGCAT 
2951 AGGACCATGT AAATATATTT TCACTCATGC AGAGAATCAG AAGATATGCC 
3001 ATCTAGTTAA TCCTGTCTGA AAAATTATTC AATCCACTGA GAACTTCAGT 
3051 GAACTCAAGA ATTAGCAAGT TATGCCCTAA AGTGCTGGTG ATGAAGAGCA 
3101 AAAGAAAAAT GAGAAAGGAC ATAAAATAGA TAAGTTTAGA AGTTTCAAGG 
3151 AAGGAGACTA TTAATTGCAA AAATATATAT GACCTAATGT GACCCAAGAA 
3201 GTAAAAACTT TCAGTAAGTA AATAATCAAG AAAGGAACTT AAAATTTTTA 
3251 CAATAAGAAC TACCCAGAAA GATGACTCCT TCATCCGGGT GATTTATATG 
3301 TCAAGTTCTT CCAGACTTCT GAAGGGCAGA TAATTCCTGT GCATTTCTTC 
3351 CCACCCTTGC CCCACCCTGC CCAAAAGAGT ATTTCAGGAA AAAATTATTA 
3401 TACCTTGATT CTCAATGTT^ TTGTATATTC AGTGTATTTC CCTTTATTTT 
3451 CCAGCAGTAT CATACATAAA CAGTTAATTG GTATCTAGGT GTTTGTTACA 
3501 TAGTCATAAT AAAGACATTT AATTTTTTTT AACTAGGTAT CTTATGGTGA 
3551 GATGGTGGGA TGTGATAACC AAGATGTAAG TATTACATTT TTCTATTTAG 
3601 gaatgaa;^ AATCACAGGT TGTTATTACT TGAATATTTG TCTTATTTGC 
3651 TGTATGGTTT ggtctaagaa AACAGGTTTG CAGGTATATT agttatgtta 
3701 tgctaatgct agaatattcc tcttcaaaat agggtagtgt cccttaatgt 
3751 gttccctatt ttaattttta aagctaattt tatggtttta tgtgcagatt 
3801 gtctcagaag tgttatgttg tatgaaaatt ataaataccc tcctttccct 
3851 ttactaaaaa atactgtgtt tactagaatc cagttcattt atcacattga 

3901 AGAAATGGAA TTTTAAAACA ATTCATTCTT TCAGGCTGCA CCGTGCTAAA 
3951 GTGAAGGGTG GGATAATTGA GGATCTAATG TGAGATTATC TTCCTCTCAT 
4 001 GAGTATAATA TTTTTTCCTG TACTCTGCAG GTGTCAGCTG ATAAGAGCCA 
4051 CCCCTGATCT AAAAAGTAAA GGAAATTTGA AAGGAAGGAA TTCTTGGTTT 
4101 TTAGGAGACT TAATTTTAGT TAGAGATACG TTTTTTATTC AATACTGAGA 
4151 ATATTGTTGT CTAGTAATTT TGACTCCCTC CTTATTTAGT AGTGACAGGA 
4201 TCCTAAGATT AACAAGAGTT TTAAATTTGT AAAACAATCT GAAGATTGAG 
4251 GGAGCTGGCT AGGTGCATTA AAATGTGTAC TTTTCCTAGA CCTGATAGGG 
4 301 TTACAGCAAC ATGCTCACGT AGATTGGGAC AGAGCCTCCT TCTGTTTCCC 
4 351 TGTCTAGAAT CCCTTGTAGG CTGTTTGTGG TTGTTGCAAA AACAATATTG 
4401 CCCAACCATT TCAAGAACAT CACTGTAAAC TCTTCTGGGG CAGTTAGTGA 
4451 AAATGATGAA TGAGATTTCT ATGAGTACCA GCATCATGCT TCTCTGATTC 
4501 TTCTTATTCC CAGTTGTGCT CTTCTGAGTG CTAAGACTTT CATGAAAGAG 
4551 TTTTCTGCTT AATATGTTTC AAAGAGGAAT AATTTTTCTC TACATTTCAA 
4 601 GGAATAGAAA CACCCACGTA GGAAATGCAG GGCATAAGAC ATAAATTAAT 
4 651 GTCTTTAATT ACAATCAGCT TATTCTACTT TATGAGACAG CAAATAAGGC 
4 701 TGACTATTAA ATAAAATCTT AAGTTATATT TACCTTCTAC ATAGAAGATT 
4751 CATCCCACTT CTTTTTGCCC TTGAAAGCTG AAAACTAGTG AATTTTCATT 
4801 CATTAGGATG AGGGGACTAG ATTACATGGA CCTCAGGATT CTTGAAGATG 
4851 CATAATTTTT CTGTGCCTTC ATTTCCTCAT TCCTGAAGCT TATCATTTAG 
4901 TCTAAATGAT GTCTAAATAA TCTAGATCTA AAAATTCTGA TGTCACACAT 
4951 CTAATTATTG TTAAATTAAA TGGATTATTC AGTCTCCTGA GCATATTTTA 
5001 ATATACTCTC TTGTCTTCAG AAGTACTGAA AACTTGTTTT TTGCAATTTT 
5051 GCTTTCTAGT GCCCTATAGA ATGGTTCCAT TATGGCTGCG TTGGATTGAC 
5101 AGAGGCACCA AAAGGCAAAT GGTACTGTCC ACAGTGCACT GCTGCAATGA 
5151 AGAGAAGAGG CAGCAGACAC AAATAAAGGT GGTCCTTTTG TTTGATGAAG 
5201 AAATAAACTT CAGCTGAAGA TTTTATATAG GACTTTAAAA AGAAGAGAAG 
5251 AGAAAGAAGA AACAATGCAT TTCCAGGCAA CCACTTAAAG GATTTACATA 
5301 GACAATCCTA TAAGATCTTG AACTTGAATT TTATGGGTTG TATTTTAATA 
5351 ATGTAAGTAA ATTATTTATG CACTCCTGGT GTGCTATGAA TATTATTCCA 
5401 GTTAGCCTTG GATTATTTCA GTGGCC7VACA TATGCAGACA TTTGTACTCC 
54 51 TCAACCATTT TCTCAAAGTA ATGGGCATTC TATGATTTAG ACTTCAAGGA 
5501 ATTCCAATGA TGAAGATTTT AAGGAAAGTA TTTTATATTC AACAGGTATA 
5551 TTCTGCTGCA TGTACTGTAC TCCAGAGCTG TTATGTAACA CTGTATATAA 
5601 ATGGTTGCAA AAAAAAAAAA AAGTCAGTGC TTCTAAAAAG AATTTAAGAT 
5651 AATGGTTTTT AAAATGCCTT TATAATAAGC TTTGTTTCTT TGTGAAACTA 
5701 ATTCAGCAGG CTGAAGGAAA TGGTTCATGT GATAATGTGG GCTGGTATCC 
5751 TCTAGAGTAC CTGGGTACAT AAACAGAAAC TCCTGTAGGT AAAAAGTAAT 
5801 TTGTGCCATT AGTCTTTCTA TGTTTCTGCA TCCAGATAGA GTGCAGTTCA 
5851 TGAGGGAGGG GGCGGGGGAC TGAAGGGGAA AGGGCGTTAA AGTGATACAT 
5901 TTTTATACCA AATGTGTTTA TTTTTTTGTG CAAGTAATCC TTAAAATTGC 
5951 AATTGTATTA GGTGTTAAAA TAAAGTTTTT AAAAAATTAA AAAAAAAAAA 
6001 AAAAA 


BLAST Results 


Entry HSG20547 from database EMBL: 
HSG20547I human STS A005W09. 
Length = 154 
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Minus Strand HSPs: 

Score - 770 (115.5 bits), Expect - 2.9e-26, P - 2.9e-26 
Identities - 154/154 (100%) 


Medline entries 


98101645: 

The candidate tumour suppressor p33lNGl cooperates with p53 in cell, 
growth control. 


ORF from 112 bp to 1245 bp; peptide length: 378 
Category: similarity to known protein 


1 MLYLEDYLEM lEQLPMDLRD RFTEMREMDL QVQNAMDQLE QRVSEFFMNA 
51 KKNKPEWREE QMASIKKDYY KALEDADEKV QLANQIYDLV DRHLRKLDQE 
101 LAKFKMELEA DNAGITEILE RRSLELDTPS QPVNNHHAHS HTPVEKRKYN 
151 PTSHHTTTDH IPEKKFKSEA LLSTLTSDAS KENTLGCRNN NSTASSNNAY 
201 NVNSSQPLGS YNIGSLSSGT GAGAITMAAA QAVQATAQMK EGRRTSSLKA 
251 SYEAFKNNDF QLGKEFSMAR ETVGYSSSSA LMTTLTQNAS SSAADSRSGR 
301 KSKNNNKSSS QQSSSSSSSS SLSSCSSSST VVQEISQQTT WPESDSNSQ 
351 VDWTYDPNEP RYCICNQVKV CYIYKSII 


Entry AF04407 6_1 from database TREMBL: 

"INGl"; product: "candidate tumor suppressor p33lNGl''; Homo 
sapiens candidate tumor suppressor p33lNGl (INGI) mRNA, complete 
cds. Homo sapiens (human) 
Length » 279 

Score = 162 (57.0 bits), Expect « l.le-09, P = l.le-09 
Identities «= 48/183 (26%), Positives = 92/183 (50%) 

Entry AC004537_1 from database TREMBL: 

gene: '*WOGSC:H~DJ0872F07, 1"; Homo sapiens PAC clone DJ0872F07 from 
7q31, complete sequence. 

Score = 1814, P «■ 3.7e-187, identities = 358/358, positives - 358/358 
Entry CEYSlHlAl from database TREMBL: 

gene: "YSIHIA. 4"; Caenorhabditis elegans cosmid Y51H1A 

Score = 213, P = 3.7e-15, identities = 37/123, positives = 82/123 


Peptide information for frame 1 


BLASTP hits 


Alert BLASTP hits for DKFZphutel_18cl2, frame 1 


No Alert BLASTP hits found 


Pedant information for DKFZphutel_18cl2, frame 1 


Report for DKFZphutel_18cl2 . 1 


(LENGTH] 
[MWJ 


378 

42275.72 
5.72 

TREMBL:AC004537_1 gene: -WUGSC:H_DJ0872F07.i"; Homo sapiens PAC clone DJ0872F07 


[HCMOL] 


from 7q31, 

(FUNCATJ 

[FUNCAT] 

(PROSITEl 

(PROSITEJ 

[PROSITEl 

tPROSITE] 

(PROSITEl 

(PROSITEl 

(PROSITEl 

(PROSITEJ 

{KWl 


complete sequence. Ie-157 


CAMP_PHOSPHO_SITE 1 

CK2_PH0SPH0_SITE 4 

PROKAR_LIPOPROTEIN 1 

GLYCOSAMINOGLYCAN 1 

PKC_PHOSPHO_SITE 3 

ASN_GLYCOSYLATION 5 
All^Alpha 


MYRISTYL 3 


AMIDATION 2 


99 unclassified proteins (S. cerevisiae, YHR090cl 8e-05 

04.05.01.04 transcriptional control (s. cerevisiae, YNL097c] 2e-04 


tKWl 


LOW^COMPLEXITY 20.63 % 
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tKWj 


COILBO COIL 


7.94 % 


SEQ 
SEG 
PRO 

COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 


MLyLEDYLEMIEQLPMOLRDRFTEMREMDLQVQNAMDQLEQRVSEFFMNAKKNKPEWREB 
ccchhhhhhhhhhhccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccchhhhh 

QMASIKKDYYKALEDADEKVQLANQIYDLVDRHLRKLDQELAKFKMELEADNAGITEILE 

hhhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccchhhhh 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

RRSLELDTPSQPVNNHHAHSHTPVEKRKYNPTSHHTTTDHIPEKKFKSEALLSTLTSDAS 
hhccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhhhcccc 


SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 


KENTLGCRNNNSTASSNNAYNVNSSQPLGSYNIGSLSSGTGAGAITMAAAQAVQATAQMK 

xxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx . . 

cccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhhh 


EGRRTSSLKASYEAFKNNDFQLGKEFSMARETVGYSSSSALMTTLTQNASSSAADSRSGR 

xxxxxxxxxxxx 

hccccccccchhhhhhccccccccccccccccccccccceeeeecccccccccccccccc 


KSKNNNKSSSQQSSSSSSSSSLSSCSSSSTWQEISQQTTVVPESDSNSQVDWTYDPNEP 

xxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

ccccccccccccccccccccceeecccccccccccccccccccccccccceeeecccccc 


RYC I CNQVKVCY I YKS 11 

eeeeceeeeeeeeeeccc 


Prosite for DKF2phutel_18cl2 . 1 


PSOOOOl 

190- 

->194 

ASN GLYCOSYLATION 

PDOCOOOOl 

PSOOOOl 

191- 

->195 

ASM GLYCOSYLATION 

PDOCOGOOl 

PSOOOOl 

203- 

->207 

ASN GLYCOSYLATION 

PDOCOOOOl 

PSOOOOl 

288- 

->292 

ASN GLYCOSYLATION 

PDOCOOOOl 

PSOOOOl 

306- 

->310 

ASN__GL YCOS Y L ATION 

PDOCOOOOl 

PS00002 

218- 

■>222 

GLYCOSAMINOGLYCAN 

PDOC00002 

PS00004 

243- 

->247 

CAMP^PHOSPHO SITE 

PDOC00004 

PS000O5 

64->67 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

247- 

•>250 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

298- 

->301 

PKC PHOSPHO SITE 

PDOC00005 

PS00006 

142- 

•>146 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

156- 

■>160 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

292- 

■>296 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

349- 

•>353 

CK2 PHOSPHO SITE 

PDOC00006 

PS00008 

186- 

>192 

MYRISTYL 

PDOC00008 

PS00008 

214- 

'>220 

MYRISTYL 

PDOC00008 

PS00008 

219- 

>225 

MYRISTYL 

PDOC00008 

PS00009 

241- 

>245 

AMIDATION 

PDOC00009 

PS00009 

298- 

■>302 

AMIDATION 

PDOC00009 

PS00013 

315- 

>326 

PROKAR LIPOPROTEIN 

PDOC00013 


(No Pfam data available for DKFZphutel_18cl2.1) 
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DKFZphutel_18il9 


group: transcription factors 


OKFZphutel 18119 encodes a novel 759 amino acid protein with similarity to the SREBP-2 mutant 
sterol regulatory element binding protein-2 of Cricetulus griseus. 

The SREBP-2 protein is embedded in the membranes of the nucleus and endoplasmic reticulum, in 
cholesterol-depleted cells the proteins are cleaved to release soluble NH2-terminal fragments 
that enter the nucleus and activate genes encoding the low density lipoprotein receptor and 
enzymes of cholesterol synthesis. The new protein is a putative transcription factor capable 
of protea-n-protein interaction via a lim domain and additionally shows similarity to the 
common sunflower transcription factor SF3. 

The new protein can find application in modulating /bloc king the expression of genes involved 
m lipxd metabolism. ' 


similarity to transcription factor SF3 

complete cDNA, complete cds, EST hits 

strong similarity to mutated SREBP-2 of hamster. 

Similarity is not to SREP-2 part of protein but to the unknown part of 
the fusion protein 

Sequenced by AGOWA 

Locus: /map»12 

Insert length: 3664 bp 

Poly A stretch at pos. 3647, polyadenylation signal at pos. 3636 


1 GCGCTAGGTA GAGCGCCGGG ACCTGTGACA GGGCTGGTAG CAGCGCAGAG 
51 GAAAGGCGGC TTTTAGCCAG GTATTTCAGT GTCTGTAGAC AAGATGGAAT 
101 CATCTCCATT TAATAGACGG CAATGGACCT CACTATCATT GAGGGTAACA 
151 GCCAAAGAAC TTTCTCTTGT CAACAAGAAC AAGTCATCGG CTATTGTGGA 
201 AATATTCTCC AAGTACCAGA AAGCAGCTGA AGAAACAAAC ATGGAGAAGA 
251 AGAGAAGTAA CACCGAAAAT CTCTCCCAGC ACTTTAGAAA GGGGACCCTG 
301 ACTGTGTTAA AGAAGAAGTG GGAGAACCCA GGGCTGGGAG CAGAGTCTCA 
351 CACAGACTCT CTACGGAACA GCAGCACTGA GATTAGGCAC AGAGCAGACC 
401 ATCCTCCTGC TGAAGTGACA AGCCACGCTG CTTCTGGAGC CAAAGCTGAC 
451 CAAGAAGAAC AAATCCACCC CAGATCTAGA CTCAGGTCAC CTCCTGAAGC 
501 CCTCGTTCAG GGTCGATATC CCCACATCAA GGACGGTGAG GATCTTAAAG 
551 ACCACTCAAC AGAAAGTAAA AAAATGGAAA ATTGTCTAGG AGAATCCAGG 
601 CATGAAGTAG AAAAATCAGA AATCAGTGAA AACACAGATG CTTCGGGCAA 
651 AATAGAGAAA TATAATGTTC CGCTGAACAG GCTTAAGATG ATGTTTGAGA 
701 AAGGTGAACC AACTCAAACT AAGATTCTCC GGGCCCAAAG CCGAAGTGCA 
751 AGTGGAAGGA AGATCTCTGA AAACAGCTAT TCTCTAGATG ACCTGGAAAT 
801 AGGCCCAGGT CAGTTGTCAT CTTCTACATT TGACTCGGAG AAAAATGAGA 
851 GTAGACGAAA TCTGGAACTT CCACGCCTCT CAGAAACCTC TATAAAGGAT 
901 CGAATGGCCA AGTACCAGGC AGCTGTGTCC AAACAAAGCA GCTCAACCAA 
951 CTATACAAAT GAGCTGAAAG CCAGTGGTGG CGAAATCAAA ATTCATAAAA 
1001 TGGAGCAAAA GGAGAATGTG CCCCCAGGTC CTGAGGTCTG CATCACCCAT 
1051 CAGGAAGGGG AAAAGATTTC TGCAAATGAG AATAGCCTGG CAGTCCGTTC 
1101 CACCCCTGCC GAAGATGACT CCCGTGACTC CCAGGTTAAG AGTGAGGTTC 
1151 AACAGCCTGT CCATCCCAAG CCACTAAGTC CAGATTCCAG AGCCTCCAGT 
1201 CTTTCTGAAA GTTCTCCTCC CAAAGCAATG AAGAAGTTTC AGGCACCTGC 
1251 AAGAGAGACC TGCGTGGAAT GTCAGAAGAC AGTCTATCCA ATGGAGCGTC 
1301 TCTTGGCCAA CCAGCAGGTG TTTCACATCA GCTGCTTCCG TTGCTCCTAT 
1351 TGCAACAACA AACTCAGTCT AGGAACATAT GCATCTTTAC ATGGAAGAAT 
1401 CTATTGTAAG CCTCACTTCA ATCAACTCTT TAAATCTAAG GGCAACTATG 
1451 ATGAAGGCTT TGGGCACAGA CCACACAAGG ATCTATGGGC AAGCAAAAAT 
1501 GAAAACGAAG AGATTTTGGA GAGACCAGCC CAGCTTGCAA ATGCAAGGGA 
1551 GACCCCTCAC AGCCCAGGGG TAGAAGATGC CCCTATTGCT AAGGTGGGTG 
1601 TCCTGGCTGC AAGTATGGAA GCCAAGGCCT CCTCTCAGCA GGAGAAGGAA 
1651 GACAAGCCAG CTGAAACCAA GAAGCTGAGG ATCGCCTGGC CACCCCCCAC 
1701 TGAACTTGGA AGTTCAGGAA GTGCCTTGGA GGAAGGGATC AAAATGTCAA 
1751 AGCCCAAATG GCCTCCTGAA GACGAAATCA GCAAGCCCGA AGTTCCTGAG 
1801 GATGTCGATC TAGATCTGAA GAAGCTAAGA CGATCTTCTT CACTGAAGGA 
1851 AAGAAGCCGC CCATTCACTG TAGCAGCTTC ATTTCAAAGC ACCTCTGTCA 
1901 AGAGCCCAAA AACTGTGTCC CCACCTATCA GGAAAGGCTG GAGCATGTCA 
1951 GAGCAGAGTG AAGAGTCTGT GGGTGGAAGA GTTGCAGAAA GGAAACAAGT 
2001 GGAAAATGCC AAGGCTTCTA AGAAGAATGG GAATGTGGGA AAAACAACCT 
2051 GGCAAAACAA AGAATCTAAA GGAGAGACAG GGAAGAGAAG TAAGGAAGGT 
2101 CATAGTTTGG AGATGGAGAA TGAGAATCTT GTAGAAAATG GTGCAGACTC 
2151 CGATGAAGAT GATAACAGCT TCCTCAAACA ACAATCTCCA CAAGAACCCA 
2201 AGTCTCTGAA TTGGTCGAGT TTTGTAGACA ACACCTTTGC TGAAGAATTC 
2251 ACTACTCAGA ATCAGAAATC CCAGGATGTG GAACTCTGGG AGGGAGAAGT 
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2301 GGTCAAAGAG CTCTCTGTGG AAGAACAGAT AAAGAGAAAT CGGTATTATG 
2351 ATGAGGATGA GGATGAAGAG TGACAAATTG CAATGATGCT GGGCCTTAAA 
2401 TTCATGTTAG TGTTAGCGAG CCACTGCCCT TTGTCAAAAT GTGATGCACA 
2451 TAAGCAGGTA TCCCAGCATG AAATGTAATT TACTTGGAAG TAACTTTGGA 
2501 AAAGAATTCC TTCTTAAAAT CAAAAACAAA ACAAAAAAAC ACAAAAAACA 
2551 CATTCTAAAT ACTAGAGATA ACTTTACTTA AATTCTTCAT TTTAGCAGTG 
2601 ATGATATGCG TAAGTGCTGT AAGGCTTGTA ACTGGGGAAA TATTCCACCT 
2651 GATAATAGCC CAGATTCTAC TGTATTCCCA AAAGGCAATA TTAAGGTAGA 
2701 TAGATGATTA GTAGTATATT GTTACACACT ATTTTGGAAT TAGAGAACAT 
2751 ACAGAAGGAA TTTAGGGGCT TAAACATTAC GACTGAATGC ACTTTAGTAT 
2801 AAAGGGCACA GTTTGTATAT TTTTAAATGA ATACCAATTT AATTTTTTAG 
2851 TATTTACCTG TTAAGAGATT ATTTAGTCTT TAAATTTTTT AGGTTAATTT 
2901 TCTTGCTGTG ATATATATGA GGAATTTACT ACTTTATGTC CTGCTCTCTA 
2951 AACTACATCC TGAACTCGAC GTCCTGAGGT ATAATACAAC AGAGCACTTT 
3001 TTGAGGCAAT TGAAAAACCA ACCTACACTC TTCGGTGCTT AGAGAGATCT 
3051 GCTGTCTCCC AAATAAGCTT TTGTATCTGC CAGTGAATTT ACTGTACTCC 
3101 AAATGATTGC TTTCTTTTCT GGTGATATCT GTGCTTCTCA TAATTACTGA 
3151 AAGCTGCAAT ATTTTAGTAA TACCTTCGGG ATCACTGTCC CCCATCTTCC 
3201 GTGTTAGAGC AAAGTGAAGA GTTTAAAGGA GGAAGAAGAA AGAACTGTCT 
3251 TACACCACTT GAGCTCAGAC CTCTAAACCC TGTATTTCCC TTATGATGTC 
3301 CCCTTTTTGA GACACTAATT TTTAAATACT TACTAGCTCT GAAATATATT 
3351 GATTTTTATC ACAGTATTCT CAGGGTGAAA TTAAACCAAC TATAGGCCTT 
3401 TTTCTTGGGA TGATTTTCTA GTCTTAAGGT TTGGGGACAT TATAAACTTG 
3451 AGTACATTTG TTGTACACAG TTGATATTCC AAATTGTATG GATGGGAGGG 
3501 AGAGGTGTCT TAAGCTGTAG GCTTTTCTTT GTACTGCATT TATAGAGATT 
3551 TAGCTTTAAT ATTTTTTAGA GATGTAAAAC ATTCTGCTTT CTTAGTCTTA 
3601 CCTAGTCTGA AACATTTTTA TTCAATAAAG ATTTTAATTA AAATTTGAAA 
3651 AAAAAAAAAA AAAA 


BLAST Results 


Entry HS5 12217 from database EMBL: 
human STS SHGC-14654. 
Length =250 
Minus Strand HSPs: 

Score = 1202 (180.3 bits). Expect = 1.8e-46, P = 1.8e-46 
Identities = 242/244 (99%) 


Medline entries 


95263566: 

Three different rearrangements in a single intron truncate 
sterol regulatory element binding protein-2 and produce 
sterol-resistant phenotype in three cell lines. Role of introns 
in protein evolution. 

93258417: 

Characterization of a pollen-specific cDNA from sunflower 
encoding a zinc finger protein. 


Peptide information for frame 1 


ORF from 94 bp to 2370 bp; peptide length: 759 
Category: similarity to known protein 


1 MESSPFNRRQ WTSLSLRVTA 
51 EKKRSNTENL SQHFRKGTLT 
101 ADHPPAEVTS HAASGAKADQ 
151 LKDHSTESKK MENCLGESRH 
201 FEKGEPTQTK ILRAQSRSAS 
251 NESRRNLELP RLSETSIKDR 
301 HKMEQKENVP PGPEVCITHQ 
351 EVQQPVHPKP LSPDSRASSL 
401 ERLLANQQVF HISCFRCSYC 
451 NYDEGFGHRP HKDLWASKNE 
501 VGVLAASMEA KASSQQEKED 
551 MSKPKWPPED EISKPEVPED 
601 SVKSPKTVSP PIRKGWSMSE 
651 TTWQNKESKG ETGKRSKEGH 
701 EPKSLNWSSF VDNTFAEEFT 


KELSLVNKNK SSAIVEIFSK YQKAAEETNM 
VLKKKWENPG LGAESHTDSL RNSSTEIRHR 
EEQIHPRSRL RSPPEALVQG RYPHIKDGED 
EVEKSEISEN TDASGKIEKY NVPLNRLKMM 
GRKISENSYS LDDLEIGPGQ LSSSTFDSEK 
MAKYQAAVSK QSSSTNYTNE LKASGGEIKI 
EGEKISANEN SLAVRSTPAE DDSRDSQVKS 
SESSPPKAMK KFQAPARETC VECQKTVYPM 
NNKLSLGTYA SLHGRIYCKP HFNOLFKSKG 
NEEILERPAQ LANARETPHS PGVEDAPIAK 
KPAETKKLRI AWPPPTELGS SGSALEEGIK 
VDLDLKKLRR SSSLKERSRP FTVAASFQST 
QSEESVGGRV AERKQVENAK ASKKNGNVGK 
SLEMENENLV ENGADSDEDD NSFLKQQSPO 
TQNQKSQDVE LWEGEVVKEL SVEEQIKRNR 
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751 YYDEDEDEE 

BLASTP hits 
Entry CG22818_1 from database TREMBL: 

"SREBP-2"; product: "mutant sterol regulatory element binding 
protein-2"; Cricetulus griseus SRD-2 mutant sterol regulatory 
element binding protein-2 (SREBP-2) mRNA, complete cds. Cricetulus 
griseus (Chinese hamster) 
Length = 839 

Score ~ 1502 (528.7 bits). Expect « 3.9e-154, P = 3.9e-154 
Identities - 290/380 (76%), Positives = 322/380 (84%) 

Entry S28507 from database PIR: 

transcription factor SF3 - common sunflower 

Length = 219 

Score - 212 (74,6 bits). Expect = 6.3e-18, Sum P(2) = 6,3e-18 
Identities = 36/82 (43%), Positives = 55/82 (67%) 

Entry NTLIMDOMl from database TREMBL: 

"SF3"; product: "LlM-domain SF3 protein"; N.tabacum mRNA for 
LiM-domain protein Nicotiana tabacum (common tobacco) 
Length = 189 

Score = 216 (76.0 bits). Expect = l.Oe-16, P = l.Oe-16 
Identities * 42/94 (44%), Positives = 57/94 (60%) 

Alert BLASTP hits for DKFZphutel_18il9, frame 1 
No Alert BLASTP hits found 

Pedant information for DKF2phutel_18il9, frame 1 

Report for DKFZphutel_18il9. 1 

{LENGTH! 759 

[MW] 85225.57 

Iplj 6.41 

(HOMOLJ TREMBL :CG228 18^1 gene: "SREBP-2"; product: "mutant sterol regulatory element 

binding protein-2"; Cricetulus griseus SRD-2 mutant sterol regulatory element bindinq protein- 
2 {SREBP-2) mRNA, complete cds. le-151 

(FUNCAT3 99 unclassified proteins (S. cerevisiae, YLR257wJ 3e-05 

(FUNCATl 05-04 translation (initiation, elongation and termination) (S. cerevisiae, 

YGR162W TIF4 631 - mRNA cap-binding protein) le-04 

(FUNCAT) 30.03 organization of cytoplasm [S. cerevisiae, YGR162w TIF4631 - mRNA 

cap-binding protein] le-04 

[BLOCKS] BL00478B 

IPIRKW) zinc finger 9e-16 

(PIRKW] DNA binding 9e-16 

[SUPFAM] LIM metal-binding repeat homology 9e-16 

[PROSITEJ MYRISTYL 6 

IPROSITE] LIMDOMAINl 1 

[PROSITE] AMIDATION 2 

[PROSITE) CAMP_PHOSPHO_SITE 4 

{PROSITEJ CK2_PH0SPH0_SITE 28 

[PROSITE] TYR_PHOSPHO_SITE 2 

[PROSITE] PKC_PHOSPH0_SITE 15 

t PROSITE] ASN^GLYCOSYLATION 6 

(PFAMJ LIM domain containing proteins 

fKW] Irregular 
CKWl 3D 

[KW] LOW_COMPLEXITY 5.53 % 

SEQ MESSPFNRRQWTSLSLRVTAKELSLVNKNKSSAIVEIFSKYQKAAEETNMEKKRSNTENL 

SEG 

icti- 

SEQ SQHFRKGTLTVLKKKWENPGLGAESHTDSLRNSSTEIRHRADHPPAEVTSHAASGAKADQ 
SEG 

icti- 

SEQ EEQIHPRSRLRSPPEALVQGRYPHIKDGEDLKDHSTESKKMENCLGESRHEVEKSEISEN 
SEG 

icti- ^ ^ ^ !!!!!!!!!!!!!!!!!!!!!!!!! ! 

SEQ TDASGKIEKYNVPLNRLKMMFEKGEPTQTKILRAQSRSASGRKISENSYSLDDLEIGPGQ 
SEG 
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Ictl- 


SEQ LSSSTFDSEKNESRRNLELPRLSETSIKDRMAKYQAAVSKQSSSTNYTNELKASGGEIKI 

SEG 

Ictl- 

SEQ HKMEQKENVPPGPEVCITHQEGEKISANENSLAVRSTPAEDDSRDSQVKSEVQQPVHPKP 

SEG X 

ICtl- 

SEQ LSPDSRASSLSESSPPKAMKKFQAPARETCVECQKTVYPMEEU^LANQQVFHISCFRCSYC 

SEG xxxxxxxxxxxxxxxx 

Ictl - ETTTTEEETTTCEEEETTEEEETTTTBTTTT 

SEQ NNKLSLGTYASLHGRIYCKPHFNQLFKSKGNYDEGFGHRPHKDLWASKNENEEILERPAQ 

SEG 

Ictl- TCBCBTTBEEEETTEEEETTTTTTTTTTCCTTTTTTTCTTT 

SEQ LAKARETPHSPGVEDAPIAKVGVLAASMEAKASSQQEKEDKPAETKKLRIAWPPPTELGS 

SEG 

Ictl- 

SEQ SGSALEEGIKMSKPKWPPEDEISKPEVPEDVDLDLKKLRRSSSLKERSRPFTVAASFQST 

SEG xxxxxxxxxxxxxxxxxx 


SEQ SVKSPKTVSPPIRKGWSMSEQSEESVGGRVAERKQVENAKASKKNGNVGKTTWQNKESKG 

SEG 

Ictl- 

SEQ ETGKRSKEGHSLEMENENLVENGADSDEDDNSFLKQQSPQEPKSLNWSSFVDNTFAEEFT 

SEG 

Ictl- 

SEQ TQNQKSQDVELWEGEVVKELSVEEQI KRNRYYDEDEDEE 

SEG xxxxxxx 


Ictl- 


Ictl- 


Prosite for DKFZphutel_18il9 . 1 


PSOOOOl 
PSOOOOl 
PSOOOOl 
PSOOOOl 
PSOOOOl 
PSOOOOl 
PS00004 
PS00004 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 


132->136 
168->172 
230->234 
244->248 
266->270 
294->298 
318->322 
326->330 
33'7->341 


158->161 
184->187 
220->223 
248->251 
253->256 
266->269 
525->528 
5B3->586 
601->60fl 
604->607 
642->645 
662->665 


222->226 
579->583 


251->255 
286->290 
706->7l0 


19->23 
48->52 
55->59 
85->89 

93->97 


15->18 
19->22 
89->92 


52->56 
65->69 


29->33 
59->63 
92->96 


AS N_G L YCOS Y L AT ION 

ASN_GLyCOSYLATION 

ASN_GLYCOSYLATION 

ASN^GLYCOSYLATION 

ASNGLYCOSYLATION 

ASN_GLYCOSYLATrON 

CAMP_PHOSPHO_SITE 

CAMP PHOSPHORS ITE 

CAMP^PHOSPHO_S ITE 

CAMPPHOS PHO_S I TE 

PKC_PH0SPHO_SITE 

PKC_PH0SPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKCPHOS PHO_S I TE 

PKCPHOS PHO_S T TE 

PKC_PHOS PHO_S I TE 

PKC_ PHOS PHO_S I TE 

PKC_PHOSPHO_SITE 

CK2_PH0SPHO_SITE 

CK2_PH0SPH0_SITE 

CK2 PHOSPHOSITE 

CK2~PH0SPH0 SITE 

CK2_PH0SPH0^SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2__PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PHOSPHO_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0 SITE 


PDCx:ooooi 

PDOCOOOOl 
PDOCOOOOl 
PDOCOOOOl 
PDOCOOOOl 
PDOCOOOOl 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC0Q005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOCOQ005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
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PS00006 

369 

->373 

CK2 PHOSPHO 

SITE 

f U\J\^ U U U U O 

PS00006 

389 

->393 

CK2 PHOSPHO' 

"site 

D nor* c\f\nnc 

PS00006 

4 67 

->471 

CK2~PH0SPH0* 

"site 


PS00006 

514 

->518 

CK2'~PHOSPH0' 

"site 

rlAA,UUUUO 

PS00006 

543 

->547 

CK2 PHOSPHO' 

"site 

fLMJuUUUUo 

PS00006 

563 

->567 

CK2 PHOSPHO' 

"site 


PS00006 

583 

->587 

CK2 PHOSPHO" 

'site 

fUUCUUUUo 

PS00006 

617- 

->621 

CK2 PHOSPHO' 

~SITE 


PS00006 

658- 

->662 

CK2 PHOSPHO" 

"site 

T\y\/^ A A A A ^ 

PS00006 

686- 

->690 

CK2 PHOSPHO" 

SITE 

Dr\/'V* AA AA t 

PS00006 

698- 

->702 

CK2 PHOSPHO SITE 

DtW^ AAA A £ 

PS00006 

709- 

->713 

CK2 PHOSPHO 

.SITE 

PDOC00006 

PS00006 

714- 

->718 

CK2 PHOSPHO' 

SITE 

PDOC00006 

PS00006 

741- 

->745 

CK2 PHOSPHO" 

SITE 

PDOC00006 

PS00007 

223- 

->230 

TYR PHOSPHO" 

SITE 

PDOC00007 

PS00007 

222- 

->230 

TYR PHOSPHO 

SITE 

PDOC00007 

PS00008 

239- 

■>245 

MYRISTYL 


P0OC00008 

PS00008 

427- 

->433 

MYRIST^L 


PDOC00008 

PS00008 

502- 

■>508 

MYRISTYL 


PDOC00008 

PS00008 

539- 

■>545 

MYRISTYL 


PDOC00008 

PS00008 

548- 

>554 

MYRISTYL 


PDOCO0008 

PS00008 

527- 

>633 

MYRISTYL 


PDOC00008 

PS00009 

220- 

>224 

AMI DAT I ON 


PDOC00009 

PS00009 

662- 

>666 

AMI DAT I ON 


PDOC00009 

PS00478 

390- 

>425 

LIM DOMAIN 1 


PDOC00382 


Pfam for DKFZphutel iail9.1 


HMM_NAME 

HMM 

Query 

HMM 

Query 


LIM dooiain containing proteins 

♦CagCNrpIyDREivMRAMNKvWHpECFrCcdCqqPLtegdeFYErDGrl 
C c++++Y-^ E-^-^ A+ V+H++CFRC+ C+ L+ G+ + ++ GRI 
390 CVECQKTVYPMERLL-ANQQVFHISCFRCSYCNNKLSLGT-YASLHGRI 

YCKhDYYrrFg* 
YCK+++ ++F+ 
437 YCKPHFNQLFK 447 
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DKFZphutel_18i4 


group: uterus derived 

DKFZphutel_18i4 encodes a novel 220 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of uterus-specific 
genes . 


weak similarity to C.elegans D2085.2 
complete cDNA, complete cds, few EST hits 
Sequenced by AGOWA 
Locus: /map="7q31'* 
Insert length: 1568 bp 

Poly A stretch at pos. ISSl, polyadenylation signal at pos. 1523 


1 GCCGAGCGGA GAGGGTAGAG ACGGGGTTTC ACCGTGTTAG CCAAGATGGT 
51 CTCGATCTCC TGACCTCGTG ATCCGCCCGC CTCGGCCTCC CAAAGTGCTG 
101 GGATTACAGG CGTGAGCCAC TGCGCCCGGC CTGTTGTACA GTTATTAAAG 
151 TTATCATTTA ACATGGAAGA AGATGAGTTC ATTGGAGAAA AAACATTCCA 
201 ACGTTATTGT GCAGAATTCA TTAAACATTC ACAACAGATA GGTGATAGTT 
251 GGGAATGGAG ACCATCAAAG GACTGTTCTG ATGGCTACAT GTGCAAAATA 
301 CACTTTCAAA TTAAGAATGG GTCTGTGATG TCACATCTAG GAGCATCTAC 
351 CCATGGACAG ACATGTCTTC CCATGGAGGA GGCTTTCGAG CTACCCTTGG 
401 ATGATTGTGA AGTGATTGAA ACTGCAGCAG CGTCCGAAGT GATTAAATAT 
451 GAGTATCATG TCTTATATTC CTGTAGCTAC CAAGTGCCTG TACTTTACTT 
501 TAGGGCAAGC TTTTTAGATG GGAGACCTTT AACTCTGAAG GACATATGGG 
551 AAGGAGTTCA TGAGTGCTAT AAGATGCGAC TGCTACAGGG ACCATGGGAC 
601 ACTATTACGC AACAGGAACA TCCAATACTT GGGCAACCCT TTTTTGTACT 
651 TCATCCCTGC AAGACGAATG AATTCATGAC TCCTGTATTA AAGAATTCTC 
701 AGAAAATCAA TAAGAATGTC AACTATATCA CATCATGGCT GAGCATTGTA 
751 GGGCCAGTTG TTGGGCTGAA TCTACCTCTG AGTTATGCCA AAGCAACGTC 
801 TCAGGATGAA CGAAATGTCC CTTAACAAGA TTCTTCTATT GAGTTTAGGA 
851 ATTGCGGCAC GAAGAATGCC AAGAGTTTAC CTGGCCAGCC CTGGCTTTAA 
901 TAGGACTGAT ACCATGGAAT ATTTCATCTC ACCAAGATGT GACATGGATT 
951 ATTTTTCCCT TGGACACAAA TGTCTACAGC AACTGATGTT TGATAGGCTG 
1001 AATGTTTAGA AGAAACACTT CAAAGGGATA CATCATGGCC AGGCATGGTG 
1051 GCTCACACCT GTAATCCAAG CACTTTGGGA GGCCAAGGTG GGAGCATCAC 
1101 TTGATCCTGG GAGTTCGAGA CCAGCCTGGG CAACATGGTG AAACCCTGTC 
1151 GGTACAAAAA AATACAAAAA TTTGCCTGTT TATGGTGGTG TGTTCCTGTA 
1201 GTCCCAGCTC CCCAGGAGGC TGAGGTGGGA GGTTGGCTTT AACCCAGGAG 
1251 GCAGAGGTTG CAGTGAGCTG AGACTGTGCC ACTGCAGTCC AGCCTGGGTG 
1301 ACAGAGCCAG ACACTGTCTC GGGAAAAAAA AAAAAAAAAA AAAGACACAT 
1351 CACTATAAAT AGCAAAAAAA CAAATCTAAC TTATTAATAC TAGGAATACC 
1401 AACATTATTA GGGCACTTGC AGGTTATTCT TTTCTAGGCC AAGTACTTCA 
1451 CTTCCATTTG TCTGACATGG AGATTGAGGG AGAAATGTAT TTGTGTGTTC 
1501 ATTTTAATGT AAGATATATA AAAATTAAAT TACTGGATTT ACCTGTCCCT 
1551 GAAAAAAAAA AAAAAAAA 


BLAST Results 


No BLAST result 


Hedline entries 


No Medline entry 


Peptide information for frame 1 


ORF from 163 bp to 822 bp; peptide length: 220 
Category: similarity to unknown protein 
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1 MEEDEFIGEK TFQRYCAEFI KHSQQIGDSW EWRPSKDCSD 
51 KNGSVMSHLG ASTHGQTCLP MEEAFELPLD DCEVIETAAA 
101 LYSCSYQVPV LYFRASFLDG RPLTLKDIWE GVHECYKMRL 
151 OEHPILGQPF FVLHPCKTNE FMTPVLKNSQ KINKNVNYIT 
201 GLNLPLSYAK ATSQDERNVP 

BLAST? hits 


GYMCKIHFQI 
SEVIKYEYHV 
LQGPWDTITQ 
SWLSIVGPVV 


Entry CED2085_2 from database TREMBL: 
"D2085.2"; Caenorhabditis elegans cosmid D2085 
Length = 173 

Score = 167 (58.8 bits). Expect « l.le-12, P ^ l.le-12 
Identities = 36/121 (29%), Positives - 64/121 (52%) 


Alert BLASTP hits for DKF2phutel_18i4 , frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphutel_18i4, frame 1 


Report for DKFZphutel_18i4 . 1 

[LENGTH] 220 

[MWJ 25278.99 

[pIJ 5.34 

[HOMOLl TREMBL:CED2085_2 gene: "02085. 2"; Caenorhabditis elegans cosmid D2085 2e-ll 

[BLOCKS] BL00221E 
(PROSITEJ MYRISTYL 2 

[PROSITEJ CK2_PH0SPH0_SITE 4 

[PROSITE] PKC_PHOSPHO_SITE 2 

[PROSITEJ ASN^GLYCOSYLATION 1 

IKWJ AlphaBeta 

SEO MEEDEFIGEKTFQRYCAEFIKHSQQiiSDSWEWRPSKDCSDGyMCKIHFQIKNGSVMSHLG 

PRD cccccccchhhhhhhhhhhhhhhhcccccccccccccccceeeeeeeeeeeccceeeeec 

SEQ ASTHGQTCLPMEEAFELPLDDCEVIETAAASEVIKYEYHVLYSCSYQVPVLYFRASFLDG 
PRD cccccccchhhhhhhhccccceeehhhhhchhhhhhhheeeeccccceeeeeeecccccc 

SEQ RPLTLKDIWEGVHECYKMRLLQGPWbTITQQEHPILGQPFFVLHPCKTNEFMTPVLKNSQ 
PRD cccccchhhhhhhhhhhhhhhhccccccccccccccccceeeeccccccccccccccccc 

SEQ KINKNVNYITSWLSIVGPVVGLNLPLSYAKATSQDERNVP 
PRD ccccccccccccceeeeccccccccceeeecccccccccc 


Prosite for DKF2phutel_18i4 . 1 


PSOOOOl 

52->56 

PS00005 

124-M27 

PS00005 

179->182 

PS00006 

116->120 

PS00006 

124->128 

PS00006 

149->a53 

PS00006 

212->216 

PS00008 

53->59 

PS00008 

131->137 


ASNGLYCOSYLATICftJ 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0 SITE 

MYRISTYL 

MYRISTYL 


PDOCOOOOl 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 


(No Pfam data available for DKFZphutel_18i4 . 1) 
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DKF2phutel_1811 


group: nucleic acid management 

DKFZphtes3_15jl8 encodes a novel 184 amino acid protein with similarity to S. cerevisiae 
putative ribosomal protein YHR148w. 

The novel protein is similar to several 40S ribosomal proteins and therefore seems to part of 
the corresponding ribosome subunit* 

The new protein can find application in modulation of ribosome assembly, structure and 
function. 


strong similarity to S. cerevisiae YHRX48w 

complete cDNA, complete cds, EST hits, 

potential start at Bp 45 matchs kozak consensus ANNatgG 

gene disruption of yHR148w is lethal! 

Sequenced by AGOWA 

Locus: unknown 

Insert length: 1076 bp 

Poly A stretch at pos. 1035, polyadenylation signal at pos . 1006 


1 GCGCGCTCTC AGCTTCGGGT CCTGCGGCTG CGGCTGCCGC CATCATGGTG 
51 CGGAAGCTTA AGTTCCACGA GCAGAAGCTG CTGAAGCAGG TGGACTTCCT 
101 GAACTGGGAG GTCACCGACC ACAACCTGCA CGAGCTGCGC GTGCTGCGGC 
151 GTTACCGGCT GCAGCGGCGG GAGGACTACA CGCGCTACAA CCAGCTGAGC 
201 CGTGCCGTGC GTGAGCTGGC GCGGCGCCTG CGCGACCTGC CCGAACGCGA 
251 CCAGTTCCGC GTGCGCGCTT CGGCCGCGCT GCTGGACAAG CTGTATGCTC 
301 TCGGCTTGGT GCCCACGCGC GGTTCGCTGG AGCTCTGCGA CTTCGTCACG 
351 GCCTCGTCCT TCTGCCGCCG CCGCCTCCCC ACCGTGCTCC TCAAGCTGCG 
401 CATGGCGCAG CACCTTCAGG CTGCCGTGGC CTTTGTGGAG CAAGGGCACG 
451 TACGCGTGGG CCCTGACGTG GTTACCGACC CCGCCTTCCT TGTCACGCGC 
501 AGCATGGAGG ACTTTGTCAC TTGGGTGGAC TCGTCCAAGA TCAAGCGGCA 
551 CGTGCTAGAG TACAATGAGG AGCGCGATGA CTTCGATCTG GAAGCCTAGC 
601 GGATCTCCCA CTTTGCATGG CTGTCTTTTA CAGATGGGAA AACTGAGGCC 
651 TGATGCTGGA GATTCTATGA GGGTGCTCTC CTCAAGGGTA TCAGACGGTC 
701 GTAGGTTCTT AAGAATTTGA TTCATCAGTG GCAGGCCATG CATAGAGCCA 
751 CGGGAGGTGC GTCCTTGTTT TCCAGGAAAT GTTCTTAGAA CTTGGACTAC 
801 TGATTATTAA TTGACTGTGC CTTGGGAAAC AGTGGGAAGT AACTTGGTGC 
851 AGCACTGGGG TATTGTTGGA CTGGTTCAAT TCGTTTAACT CGAATTCTTG 
901 CTCCTGGCCG TGGTTAAGCT GTGTACAGAT GATGGAGAGT TTGGCCTCAA 
951 GTTTTTATAA ACTGAGCGAG ACTAGTGTTC AGGATCTCCT CCCTTGTTTA 

1001 AATGTCAATA AATGCCCCAA CTGCTTTGTA AGCTCAAAAA AAAAAAAAAA 

1051 AAAAAAAAAA AAAAAAAAAA AAAAAA 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 3 


ORF from 45 bp to 596 bp; peptide length: 184 
Category: strong similarity to known protein 


1 MVRKLKFHEQ KLLKQVDFLN WEVTDHNLHE LRVLRRYRLQ RREDYTRYNQ 

51 LSRAVRELAR RLRDLPERDQ FRVRASAALL DKLYALGLVP TRGSLELCDF 

101 VTASSFCRRR LPTVLLKLRM AQHLQAAVAF VEQGHVRVGP DVVTDPAFLV 

151 TRSMEDFVTW VDSSKIKRHV LEYNEERDDF DLEA 

BLAST? hits 
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No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_1811, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphutel_1811, frame 3 

Report for DKFZphutel_1811 . 3 

(LENGTH] 184 

(MH) 21850.21 

[pIJ 9,54 

IHOMOL) PIR;S33911 probable ribosomal protein YHR148w - yeast (Saccharomyces 
cerevisiae) 4e-47 

(FUNCATJ 05.01 ribosomal proteins (S. cerevisiae, YHR148w) 2e-48 

IFUNCAT] 30.03 organization of cytoplasm [S. cerevisiae, YPLOSlw) 5e-07 

[FUNCAT] j mrna translation and ribosome biogenesis (M. jannaschii, MJ0190] 8e-05 

[BLOCKSl BL00632 

[PIRKWl cytosol le-07 

(PIRKW) ribosome le-07 

{PIRKW] protein biosynthesis le-07 

(SUPFAMI rat ribosomal protein S9 le~07 

[PR0SITE3 MYRISTYL 1 

[PROSITE] CK2_PH0SPH0_SITE 2 

[PROSITE) TYR_PHOSPHO_SITE 1 

IPROSITEJ PKC_PHOSPH0_SITE 1 

IPFAMJ Ribosomal protein S4 

{KWJ All_Alpha 

[KW] LOW COMPLEXITY 6.52 % 


SEQ 
SEG 
PRO 

SEQ 
SEG 
PRO 

SEQ 
SEG 
PRO 

SEQ 
SEG 
PRO 


MVRKLKFHEQKLLKQVDFLNWEVTDHNLHELRVLRRYRLQRREDYTRYNQLSRAVRELAR 

xxxxxxxxxxxx 

ccchhhhhhhhhhhhhhhhhcccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

RLRDLPERDQFRVRASAALLDKLYALGLVPTRGSLELCDFVTASSFCRRRLPTVLLKLRM 

hhhhhccccchhhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

AQHLQAAVAFVEQGHVRVGPDVVTDPAFLVTRSMEDFVTWVDSSKIKRHVLEYNEERDDF 

hhhhhhhhhhhhhhhccccceeecccceeeeeccccceeeeeccchhhhhhhhhcccccc 

DLEA 

CCCC 


PS00005 
PS00006 
PS00006 
PS00007 
PS00008 


Prosite for DKF2phutel_1811 . 3 


163->166 
153->157 
159->163 
41->49 
87->93 


PKC_PH0SPHO__SITE 
CK2 PH0SPHO_SITE 
CK2~PH0SPHO SITE 
TYR_PH0SPHO~SITE 
MYRISTYL 


PDOC00005 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00008 


Pfam for DKFZphutel_1811 . 3 


HMMNAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 


Ribosomal protein 34 

*MSR. YRGPRWKIIRRPGElPWLTnK tklmrkYC . . IRPgQHgWR 

M+R ++ +++K+++++++L W ++++R Y R+++ ++ 

1 MVRKLKFHEQKLLKQVDFLNWEVTDHNLHELRVLRRYRLQRREDYTRYN 49 

qRk tLsKI RRmSQYr I RLQEKQKLRFMYGNI tERQLRRYvRidEdKRKlD 
Q + +R +++ + L+E + +R +++++L++++ +++ L 

50 QLSR— AVRELARRLRDLPERDQFRVRASAALLDKLYALGLVP-TRGSLE 96 

YsTGenLMQILEMRLDNIVFRMGMAPTIHHARQLINHRHIRVNdRIVNIP 
++ + ++++RL++++ ++ MA ++A+ +++++H+RV++ +V++P 
97 LCDFVTASSFCRRRLPTVLLKLRMAQHLQAAVAFVEQGHVRVGPDWTDP 146 

SYiCRPNDilSIRDkqrMQsHIkWnieSPegrmRPNHLErNnkkYeGtIN 
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++++++ + +++++W++ S+ ++R+ + Y'+ + 

Query 147 AFLVTRS M EDFVTWVDSSK IKRHVLEYNEERD 178 

HMM rllEReWiplklNElLVVEY* 
+++ + 
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DKFZphutel_19fl9 


group: transmembrane protein 

DKF2phutel_19f 19 encodes a novel 204 amino acid protein with similarity to murine p24 protein. 

Murine p24 is expressed only in brain where it is localized exclusively in neurons. It seems 
to be a neuron-specific membrane protein localised in intracellular organelles of highly 
differentiated neural cells and may play a role in the neural organelle transport system. As 
p24, the novel protein contains 2 transmembrane regions, but it contains not the sequence 
homologous to the microtubule-binding domain of microtubule-associated proteins present in 
p24. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the- expression profile of uterus-specific 
genes and as a new marker for uterine cells. 


similarity to mouse P24 protein ; 
membrane regions: 2 

Summary DKFZphutel_19f 19 encodes a novel 204 amino acid protein, with 
similarity to mouse P24 protein. 


similarity to mouse P24 protein 

complete cDNA, conplete cds, EST hits, 
2 TM-domains 

Sequenced by AGOWA 

Locus: /map-14.8 cR from top of Chr20 linkage group 
Insert length: 2042 bp 

Poly A stretch at pos. 1958, polyadenylation signal at pos. 1940 


1 GCAGGCAGAG AGATGAGGAA ACTGAGACCC AGAAAGGTGG AAGCACTTGT 

51 CTAAGGTCAC GCCTCCAGGA AGCAGTGTGT CCACGACTCC AGTCCAAGTG 

101 GTCAGGCTCC AGAGCCCACA GTCCCAGGGG TCCATGATGC CGAGCTGCAA 

151 TCGTTCCTGC AGCTGCAGCC GCGGCCCCAG CGTGGAGGAT GGCAAGTGGT 

201 ATGGGGTCCG CTCCTACCTG CACCTCTTCT ATGAGGACTG TGCAGGCACT 

251 GCTCTCAGCG ACGACCCTGA GGGACCTCCG GTCCTGTGCC CCCGCCGGCC 

301 CTGGCCCTCA CTGTGTTGGA AGATCAGCCT GTCCTCGGGG ACCCTGCTTC 

351 TGCTGCTGGG TGTGGCGGCT CTGACCACTG GCTATGCAGT GCCCCCCAAG 

401 CTGGAGGGCA TCGGTGAGGG TGAGTTCCTG GTGTTGGATC AGCGGGCAGC 

451 CGACTACAAC CAGGCCCTGG GCACCTGTCG CCTGGCAGGC ACAGCGCTCT 

501 GTGTGGCAGC TGGAGTTCTG CTCGCCATCT GCCTCTTCTG GGCCATGATA 

551 GGCTGGCTGA GCCAGGACAC CAAGGCAGAG CCCTTGGACC CCGAAGCCGA 

601 CAGCCACGTG GAGGTCTTCG GGGATGAGCC AGAGCAGCAG TTGTCACCCA 

651 TTTTCCGCAA TGCCAGTGGC CAGTCATGGT TCTCGCCACC CGCCAGCCCC 

701 TTTGGGCAAT CTTCTGTGCA GACTATCCAG CCCAAGAGGG ACTCCTGAGC 

751 TGCCCACATG GCCTAAGATG TGGGTCCTGG ATCCTTCCCC CTTCTCACCA 

801 TAACCCCCTC TCAGTGTTTC CCCAACTTCT CCCTTTAGAG CCCAACTCCA 

851 GGTCAAATCT GGAGCTCAAA TCCCAGTGCT CCCTCCCCAG GAGTGGGGCC 

901 CCAACTCTTC CAAGATACCA GCATTCCTCA AGTCCTCCCA AAACTTCCTA 

951 CCCACACCCT CTTCCCAAGG CCCTCAGGGG CAGAAAACAT CTCCTTCAAC 

1001 CCGTCCCCAC TCCTTCCTCT GCATGACCTT GGGCAAACCC TTGCCCTTTC 

1051 AAGCCATCAG CTCCTGCCTC TCTGCCATGA GGGCTTTGGA TCAGATTCCT 

1101 CTTCTCGCCA GGATGAGGAC ACGCACTGCC CTCCATAGAC ACAGATGAAG 

1151 GGGTGGGGGT CATTCAGCTC GAATGGGTCC CAGATGCTCA CTTGGCCTTT 

1201 CCCTGCAGGA TGAGTGAAGA CGTTTGCCTC TCACAGTGTG TCTTCTACCT 

1251 GCATTTTGGC ATCAGAGCCC CCCAGCCCAC CCACCACAGG CAATTACTAG 

1301 CCCTAGTTGA TAGGTGAGGT GGGTGAAGAA GGCTGGAGGT GACATGTCCG 

1351 AGGTCACACA ACAAAGCAGC ATGCAGGAAC TAGAAACACA TCTTCAGCCT 

1401 CCTCCTGGGC CAGCTCTTGT GCTACAGGTG GGGCGGAGCC AGCCCCTCAC 

1451 CTTCCTGGTT CCCTGAGGGT CCTCAGGGTG GAGGACAGGT TTGGCCCAGA 

1501 AAGACTAGCC AGAGGCCTGA TGGTCCCAGG TGGCTCTGGA TATACTTTGG 

1551 ATATGGATTT AAATGGTCTC TAAGAGCCGG GGGTAGGGGG CAGGAAAAGT 

1601 GGGTTGTCTT TGCCCCTCAA AGTCCACCTA CCTAGAAACC AAGCCCACGG 

1651 TCTTGGCCGT GACCCTGATA ATAAATGGGC TCTCTCAGAG GCGCCAGCCC 

1701 CTCCCTCCCC AGCCGGAGGC GTCATCTCTC TTCTGTACCA CTAGAGGGAG 

1751 CTCTGATGCA GCTGGAGAGC AGCGCTCAAG GCTCTCGCCC CTCCCCTCCC 

1801 TAACCCTTAC CTTCAGTCTC CACCAGCCTG AAGGGCCTCC TAGGGGATCC 

1851 TCAGGCGGCC CCCACCAGGG CACACCCTAC TGTCCTTGTG CCTCACGCCC 

1901 CCTCCTCATC CTGCACCCCT TCCATCCCAC CTTCCCTTTC AATAAACAGC 

1951 TGGGATGGAA AAAAAAAAAA AGAAAAAAAA AAAAAAAAAA AAAAAAAAAA 

2001 AAAAAAAAAA A7JUUUVAAAA AAAAAAAAAA AAAAAAAAAA AA 
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BLAST Results 


Entry HS417348 from database EMBL: 
human STS WI-14697. 
Length » 290 
Minus Strand HSPs: 

Score = 1254 (188.2 bits). Expect = 3.0e-50, P = 3.0e-50 
Identities = 262/273 (95%) 


Medline entries 


97334404: 

A newly identified membrane protein localized exclusively in 
intracellular organelles of neurons. 


Peptide information for frame 2 


ORF from 134 bp to 745 bp; peptide length: 204 
Category: similarity to known protein 


1 MMPSCNRSCS CSRGPSVEDG KWYGVRSYLH LFYEDCAGTA LSDDPEGPPV 
51 LCPRRPWPSL CWKISLSSGT LLLLLGVAAL TTGYAVPPKL EGIGEGEFLV 
101 LDQRAADYNQ ALGTCRLAGT ALCVAAGVLL AICLFWAMIG WLSQDTKAEP 
151 LDPEADSHVE VEXSDEPEQQL SPIFRNASGQ SWFSPPASPF GQSSVQTIQP 
201 KRDS 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_19f 19, frame 2 

TREMBL:HMP2000_1 product: "P24 protein"; Mouse mRNA for P24 protein, 
complete cds.,~*N - 1, Score = 295, P = 3.8e-26 


>TREMBL:MMP2000_1 product: "P24 protein"; Mouse mRNA for P24 protein, 
complete cds. 

Length = 196 

HSPs: 

Score = 295 (44.3 bits). Expect = 3.8e-26, P = 3.8e-26 
Identities « 58/139 (41%), Positives = 81/139 (58%) 

Query: 2 MPSCNRSCSCSRGPSVEDGKW— YGVRSYLHLFYEDCAGTALSDDPEGPPVLCPRRPWP 58 

M SC+ +C R + +G + YGVRSYLH FYEDC + + ♦ P R W 

Sbjct: 1 MTSCSNTCGSRRAQADTEGGYQQRYGVRSYLHQFYEDCTASIWEYEDDFQIQRSPNR-WS 59 

Query: 59 SLCWKISLSSGTLLLLLGVAALTTGYAVPPKLEGIGEGEFLVLDQRAADYNQALGTCRLA 118 

S+ WK+ L SGT+ ++LG+ L G+ VPPK+E GE +F+V+D A YN AL TC+LA 
Sbjct: 60 SVFWKVGLISGTVFVILGLTVLAVGFLVPPKIEAFGEADFMVVDTHAVKYNGALDTCKLA 119 

Query: 119 GTALCVAAGVLLAICLFWAM 138 

G L G +A CL ++ 
Sbjct: 120 GAVLFCIGGTSMAGCLLMSV 139 


Pedant information for DKFZphutel_19f 19, frame 2 


Report for DKF2phutel_19f 19 .2 


204 

21983.07 
4. 69 

TREMBL:MMP2000_1 product: "P24 protein"; Mouse mRNA for P24 protein, complete 
MYRISTYL 4 


(LENGTH] 

(MW) 

[pll 

(HOMOL) 

cds. 7e-19 

[PROSITEl 
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[PROSITEl CAMP_PHOSPHO_SITE 1 

(PROSITE) CK2_PH0SPH0_SITE 3 

(PROSITEl PKC_PH0SPHO_SITE 1 

[PROSITE) ASN_GLYCOSyLATION 2 

IKW] TRANSMEMBRANE 2 

[KW] LOW COMPLEXITY 10.29 % 


SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 


MMPSCNRSCSCSRGPSVED6KWYGVRSYLHLFYEDCAGTALSD0PEGPPVLCPRRPWPSL 

cccccccccccccccccccccceeehhhhhccccccccccccccccccccccccccccce 
MM 

CWKI SLSSGTLLLLLGVAALTTGYAVPPKLEG IGEGEFLVLDQRAADYNQALGTCRLAGT 

. . . . xxxxxxxxxxxxxxxxxxxxx 

eeeGeccccceeecccceeeecccccccccccccccceeeecccccccchhhhhhhhchh 

MMMMMMMMMMMMMMMMMMMMMMMMMMM MMMMMMM 

ALCVAAGVLLAICLFWAMIGWLSQDTKAEPLDPEADSHVEVFGDEPEQQLSPIFRNASGQ 

hhhhhhhhhhhhhhhhhhhhhhccccccccccccccceeeeccccccccccccccccccc 
MMMMMMMMMMMMMMMMMMMM^M 

S W FS P PAS P FGQS S VQT I QPKRDS 

ccccccccccccceeeeccccccc 


Prosite for DKFZphutel_19f 19 . 2 


PSOOOOl 

6->10 

PSOOOOl 

176->180 

PS00004 

201->205 

PS00005 

114-'>117 

PS00006 

16->20 

PS00006 

146->150 

PS00006 

157->161 

PS00008 

38->44 

PS00008 

92->98 

PS00008 

119->125 

PS00008 

127->133 


ASN_GLYCOSyLATION 

ASN_GL YCOSY LAT ION 

CAMP_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PH0SPH0_SITE 

C K2_PH0S PHO_S I TE 

CK2_PH0SPH0 SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 


PDOCOOOOl 
PDOCOOOOl 
PDOC00004 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOCOdOOB 
PDOC00008 
PDOCOOOOB 
PDOC00008 


(No Pfam data available for DKFZphutel_19f 19.2) 
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DKFZphutel_19gl9 


group: uterus derived 

DKF2phutel_19gl9 encodes a novel 400 amino acid protein, with strong but partial similarity 
a bovine elastin-related protein expressed in fetal calf ligamentum nuchae. 

The novel protein contains 2 RGD cell attachment sites. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of uterus-specific 
genes and as a new marker for uterine cells. 


similarity to bovine elastin fragment 
complete cDNA, complete cds, EST hits 
Sequenced by AGOWA 

Locus: map=54.9 cR from top of Chr3 linkage group 
Insert length: 3244 bp 

Poly A stretch at pos. 3227, polyadenylation signal at pos. 3216 


1 GTAACTGCAG TAAGTCCCGC TTGGCCCTGG AGTCCACGCG GATTTTCGAA 
51 GCTGGGGCTG GCAAGAGGCC GCTGGACACC ACGCTCCAGT CGTCAGCCCA 
101 CTTCCTAGCT GAACAGCGCG AGGCGGCGGC AGCGAGCCGG GTCCCACCAT 
151 GGCCGCGAAT TATTCCAGTA CCAGTACCCG GAGAGAACAT GTCAAAGTTA 
201 AAACCAGCTC CCAGCCAGGC TTCCTGGAAC GGCTGAGCGA GACCTCGGGT 
251 GGGATGTTTG TGGGGCTCAT GGCCTTCCTG CTCTCCTTCT ACCTAATTTT 
301 CACCAATGAG GGCCGCGCAT TGAAGACGGC AACCTCATTG GCTGAGGGGC 
351 TCTCGCTTGT GGTGTCTCCT GACAGCATCC ACAGTGTGGC TCCGGAGAAT 
401 GAAGGAAGGC TGGTGCACAT CATTGGCGCC TTACGGACAT CCAAGCTTTT 
451 GTCTGATCCA AACTATGGGG TCCATCTTCC GGCTGTGAAA CTGCGGAGGC 
501 ACGTGGAGAT GTACCAATGG GTAGAAACTG AGGAGTCCAG GGAGTACACC 
551 GAGGATGGGC AGGTGAAGAA GGAGACGAGG TATTCCTACA ACACTGAATG 
601 GAGGTCAGAA ATCATCAACA GCAAAAACTT CGACCGAGAG ATTGGCCACA 
651 ATAACCCCAG TGCCATGGCA GTGGAGTCAT TCACGGCAAC AGCCCCCTTT 
701 GTCCAAATTG GCAGGrPTTT CCTCTCGTCA GGCCTCATCG ACAAAGTCGA 
751 CAACTTCAAG TCCCTGAGCC TATCCAAGCT GGAGGACCCT CATGTGGACA 
801 TCATTCGCCG TGGAGACTTT TTCTACCACA GCGAAAATCC CAAGTATCCA 
851 GAGGTGGGAG ACTTGCGTGT CTCCTTTTCC TATGCTGGAC TGAGCGGCGA 
901 TGACCCTGAC CTGGGCCCAG CTCACGTGGT CACTGTGATT GCCCGGCAGC 
951 GGGGTGACCA GCTAGTCCCA TTCTCCACCA AGTCTGGGGA TACCTTACTG 
1001 CTCCTGCACC ACGGGGACTT CTCAGCAGAG GAGGTGTTTC ATAGAGAACT 
1051 AAGGAGCAAC TCCATGAAGA CCTGGGGCCT GCGGGCAGCT GGCTGGATGG 
1101 CCATGTTCAT GGGCCTCAAC CTTATGACAC GGATCCTCTA CACCTTGGTG 
1151 GACTGGTTTC CTGTTTTCCG AGACCTGGTC AACATTGGCC TGAAAGCCTT 
1201 TGCCTTCTGT GTGGCCACCT CGCTGACCCT GCTGACCGTG GCGGCTGGCT 
1251 GGCTCTTCTA CCGACCCCTG TGGGCCCTCC TCATTGCCGG CCTGGCCCTT 
1301 GTGCCCATCC TTGTTGCTCG GACACGGGTG CCAGCCAAAA AGTTGGAGTG 
1351 AAAAGACCCT GGCACCCGCC CGACACCTGC GTGAGCCCTA GGATCCAGGT 
1401 CCTCTCTCAC CTCTGACCCA GCTCCATGCC AGAGCAGGAG CCCCGGTCAA 
1451 TTTTGGACTC TGCACCCCCT CTCCTCTTCA GGGGCCAGAC TTGGCAGCAT 
1501 GTGCACCAGG TTGGTGTTCA CCAGCTCATG TCTTCCCCAC ATCTCTTCTT 
1551 GCCAGTAAGC AGCTTTGGTG GGCAGCAGCA GCCATGAATG GCAAGCTGAC 
1601 AGCTTCTCCT GCTGTTTCCT TCCTCTCTTG GACTGAGTGG GTACGGCCAG 
X651 CCACTCAGCC CATTGGCAGC TGACAACGCA GACACGCTCT ACGGAGGCCT 
1701 GCTGATAAAG GGCTCAGCCT TGCCGTGTGC TGCTTCTCAT CACTGCACAC 
1751 AAGTGCCATG CTTTGCCACC ACCACCAAGC ACATCTGTGA TCCTGAAGGG 
1801 CGGCCGTTAG TCATTACTGC TGAGTCCTGG GTCACCAGCA GACACACTGG 
1851 GCATGGACCC CTCAAAGCAG GCACACCCAA AACACAAGTC TGTGGCTAGA 
1901 ACCTGATGTG GTGTTTAAAA GAGAAGAAAC ACTGAAGATG TCCTGAGGAG 
1951 AAAAGCTGGA CATATACTGG GCTTCACACT TATCTTATGG CTTGGCAGAA 
2001 TCTTTGTAGT GTGTGGGATC TCTGAAGGCC CTATTTAAGT TTTTCTTCGT 
2051 TACTTTGCTG CTTCATGTGT ACTTTCCTAC CCCAAGAGGA AGTTTTCTGA 
2101 AATAAGATTT AAAAACAAAA CAAAAAAAAC ACTTAATATT TCAGACTGTT 
2151 ACAGGAAACA CCCTTTAGTC TGTCAGTTGA ATTCAGAGCA CTGAAAGGTG 
2201 TTAAATTGGG GTATGTGGTT TGATTGATAA AAAGTTACCT CTCAGTATTT 
2251 TGTGTCACTG AGAAGCTTTA CAATGGATGC TTTTGAAACA AGTATCAGCA 
2301 AAAGGATTTG TTTTCACTCT GGGAGGAGAG GGTGGAGAAA GCACTTGCTT 
2351 TCATCCTCTG GCATCGGAAA CTCCCCTATG CACTTGAAGA TGGTTTAAAA 
2401 GATTAAAGAA ACGATTAAGA GAAAAGGTTG GAAGCTTTAT ACTAAATGGG 
2451 CTCCTTCATG GTGACGCCCC GTCAACCACA ATCAAGAACT GAGGCCTGAG 
2501 GCTGGTTGTA CAATGCCCAC GCCTGCCTGG CTGCTTTCAC CTGGGAGTGC 
2551 TTTCGATGTG GGCACCTGGG CTTCCTAGGG CTGCTTCTGA GTGGTTCTTT 
2601 CACGTGTTGT GTCCATAGCT TTAGTCTTCC TAAATAAGAT CCACCCACAC 
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2651 CTAAGTCACA GAATTTCTAA 
2701 GATAAAGTAT GTTGTAACCA 
2751 TTCTGTCATA TTCAGAAACC 
2801 GCAATTAAGT ATTTGGTAGC 
2851 AGATCTGTAA TCATCTCTAT 
2901 TTCTTCACAT GAAAAACACA 
2951 GATTCAATAG TATTCACTAA 
3001 CTCCAAACTC TTAAAGGATG 
3051 ATGACTAAAT GCAAAATCCT 
3101 TAAATAACTG CCGACTTCAA 
3151 GAAAAATTTG AAAGCTTTGG 
3201 TTATTAAAAG TTTTGTAATA 


GTTCCCCAAC TACTCTCACA CCCTTTTAAA 
GGATGTCTTA AATGATTCTT TGTGTACCTT 
GTTTTGTGCC TGCTGGGAGT AATTCCTTTA 
TGAATAAGGG GTCAGAACTT CTGAAACCAG 
TGGCCTGGGG TGCCTGTGCT ATAAATGAGT 
GCCAGCCCAA GATGACTTAT CTGGGTTTAG 
CTGCTTATTA CATGAGCAAT TTCATCAAAT 
CTTTCGGAAA ACACGCTGTA TACCTAGATG 
TGGGCTTTGG TTTTTTTCTA GTAAGGATTT 
AAGTGTTCTT AAAACGAAAG ATAATGTTAA 
AAAACCAAAT TTGTAATATC ATTGTATTTT 
AATTTCTAAA AAAAAAAAAA AAAA 


BLAST Results 


Entry HS545355 from database EMBL: 
human STS WI-14815. 
Length = 436 
Minus Strand HSPs: 

Score = 2040 (306.1 bits), Expect 6.2e-86, P = 6.2e-86 
Identities « 420/426 (98%) 

Entry HS932147 from database EMBL; 
human STS WI-8531. 
Length = 341 
Minus Strand HSPs: 

Score = 1705 (255.8 bits). Expect = 4.7e-70, P = 4.7e-70 
Identities = 341/341 (100%) 


Medline entries 


86051793: 

Bovine elastin cDNA clones: evidence for the occurrence of a 
new elastin- related protein in fetal calf ligamentum nuchae. 


Peptide information for frame 2 


ORF from 149 bp to 1346 bp; peptide length: 400 
Category: similarity to known protein 


1 MAANYSSTST RREHVKVKTS SQPGFLERLS ETSGGMFVGL MAFLLSFYLI 
51 FTNEGRALKT ATSLAEGLSL VVSPDSIHSV APENEGRLVH IIG/aRTSKL 
101 LSDPNYGVHL PAVKLRRHVE MYQWVETEES REYTEDGQVK KETRYSYNTE 
151 WRSEIINSKN FDREIGHNNP SAMAVESFTA TAPFVQIGRF FLSSGLIDKV 
201 DNFKSLSLSK LEDPHVDIIR RGDFFYHSEN PKYPEVGDLR VSFSYAGLSG 
251 DDPDLGPAHV VTVIARQRGD QLVPFSTKSG DTLLLLHHGD FSAEEVFHRE 
301 LRSNSMKTWG LRAAGHMAMF MGLIiLMTRIL YTLVDHFPVF ROLVNIGLKA 
351 FAFCVATSLT LLTVAAGHLF YRPLWALLIA GLALVPILVA RTRVPAKKLE 

BLAST? hits 

Entry 145887 from database PIR: 
elastin - bovine (fragment) 
Length = 40 

Score - 131 (46,1 bits). Expect = 4.9e-08, P = 4,9e-08 
Identities = 31/41 (75%), Positives = 34/41 (82%) 


Alert BLASTP hits for DKFZphutel_19gl9, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphuteX_19gl9, frame 2 


Report for DKFZphutel_19gl9.2 


(LENGTH) 400 
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(MW) 

Ipl] 

[HOMOh] 

[PROSITEI 

(PROSITEJ 

(PROSITEI 

[PROSITEJ 

[PROSITEJ 

[PROSITE] 

[PROSITEJ 

[KWl 


44831.53 
7.23 

PIR:I45887 elastin 
RGD 2 
MYRISTYL 3 
CAMP_PHOSPHO_SITE 
CK2_PH0SPH0_SITE 
TYR~PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
ASN_GLYC0SYLATION 
TRANSMEMBRANE 4 


bovine (fragment) le-06 


SEQ 
PRO 
MEM 

SEQ 
PRD 
MEM 


MAANYSSTSTRREHVKVKTSSOPGFLERLSETSGGMFVGLMAFLLSrYLIFTNEGRALKT 
ccceeecccceeeeeeeecccccceeeecccccccchhhhhhhhhhheeeeecccchhhh 
MMMMMMMMMMMW4MMMMMMMMMMMMMMMM . . 

ATSLAEGLSLWSPDSIHSVAPENEGRLVHIIGALRTSKLLSDPNYGVHLPAVKLRRHVE 
hhhhhccceeeeccccceeeeccccceeeeeeeeeeceeeccccccccccchhhhhhhhh 


SEQ 
PRD 
MEM 

SEQ 
PRD 
HEM 


MYQWVETEESREYTEDGQVKKETRYSYNTEWRSEIINSKNFDREIGHNNPSAMAVESFTA 
hheeehhhhheeecccccccceeeccccccceeeeeeccccceeecccccceeeeeeecc 
M 

TAPFVQIGRFFLSSGLIDKVDNFKSLSLSKLEDPHVDIIRRGDFFYHSENPKYPEVGDLR 

ccceeeeeeeeeccccccccccceeeeeeeccccceeeeecccceeecccccccccccee 
MMMMMMMMMMMMMMMMM 


SEQ 
PRD 
MEM 


VSFSYAGLSGDDPDLGPAHVVTVIARQRGDQLVPFSTKSGDTLLLLHHGDFSAEEVFHRE 
eeccccccccccccccceeeeeeeeecccccccccccccceeeeeecccccchhhhhhhh 


SEQ 
PRD 
MEM 

SEQ 
PRD 
MEM 


LRSNSMKTWGLRAAGWMAMFMGLNLMTRILYTLVDWFPVFRDLVNIGLKAFArCVATSLT 
hhccccccccchhhhhhhhhhhchhhhhhhhheeecccccccccccceeeeeeeeehhhh 
MMMMMMMMMMMMMMMMMMMMMMMM MMMM 

LLTVAAGWL FYRPLWALL I AGLALVPI LVARTRVPAKKLE 
hhhhhccceeehhhhhhhhhhhhchhhhhhhhcccccccc 
MMMMMMMMMMMMMMMMMMMMMMMMM 


PSOOOOl 


4->8 

PS00004 

140- 

->144 

PS00005 

9->12 

PS00005 

10-M3 

PS00005 

97- 

■>100 

PS00005 

276- 

->279 

PS0OOO5 

305- 

->308 

PS00006 

10->14 

PS00006 

63->67 

PS00006 

209- 

•>213 

PS00006 

249- 

>253 

PS00006 

292- 

>296 

PS00006 

332- 

>336 

PS00007 

220- 

>227 

PS00007 

99- 

>107 

PS00008 

35 

->41 

PS00008 

93 

->99 

PS00008 

310- 

>316 

PS00016 

221- 

>224 

PS00016 

268- 

>271 


Prosite for DKF2phutel_19gl9 .2 

ASNGLYCOSYLATION PDOCOOOOl 

CAMPPHOSPHOSITE PDOC00004 

PKC_PHOSPHO_SITE PDOCO0O05 

PKC^PHOSPHO SITE PDOC00005 

PKC_PHOSPHO~SITE PDOC00005 

PKCPHOSPHOSITE PDOC00005 

PKCPHOSPHOSITE PDOC00005 

CK2_PH0SPH0_SITE PDOC00006 

CK2_PHOSPHO_SITE PDOC00006 

CK2_PH0SPH0_SITE PDOC00006 

CK2_PHOSPHO_SITE PDOC00006 

CK22phOSPHO_SITE PDOC00006 

CK2_PHOSPH0_SITE PDOC00006 

TYR~PHOSPHO SITE PDOC00007 

TYR_PHOSPH0~SITE PDOC00007 

MYRISTYL PDOC00008 

MYRISTYL PDOC00008 

MYRISTYL PDOC00008 

RGD PDOC00016 

RGD PDOC00016 


(No Pfam data available for DKFZphutel_19gl9.2) 
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DKFZphutel_19g22 


group: cell structure and motility 

DKFZphutel_19g22 encodes a novel 390 amino acid protein with very strong similarity to 
tuf telin/enamelin . 

Tuftelin/enamelin are matrix proteins of the teeth. As other proteins involved in 
calcification, these proteins are also expressed in the uterus matrix. 

The new protein can find application in modulation of tissue-calcification, especially the 
uterus. 


complete cDNA, complete cds start at Bp 51, EST hits in 3' UTR, 
human homolog of mouse tuftelin 

tuftelin is descriebed as a matrix protein of teeth but it seems also 
to be pressend in the uterus matrix 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 3110 bp 

Poly A stretch at pos, 3093, polyadenylation signal at pos. 3071 


1 GCAGACAGCG GGGTGGACAA GTGGCGTGTG TGCTGCGACC CCGAGGGAAG 
51 ATGAACGGGA CGCGGAACTG GTGTACCCTG GTGGACGTGC ACCCAGAGGA 
101 CCAGGCGGCG GGCAGCGTGG ACATTCTCAG GCTGACTCTC CAGGGTGAAC 
151 TGACAGGAGA TGAACTTGAA CACATAGCCC AGAAGGCGGG CAGGAAGACC 
201 TATGCCATGG TGTCCAGCCA CTCAGCTGGT CATTCTCTGG CTTCAGAACT 
251 GGTGGAGTCC CATGATGGAC ATGAGGAGAT CATTAAGGTG TACTTGAAGG 
301 GGAGGTCTGG AGACAAGATG ATTCACGAGA AGAATATTAA CCAGCTGAAG 
351 AGTGAGGTCC AGTACATCCA GGAGGCCAGG AACTGCCTAC AGAAGCTCCG 
401 GGAGGATATA AGTAGCAAGC TTGACAGGAA CCTAGGAGAT TCTCTCCATC 
451 GACAGGAGAT ACAGGTGGTG CTAGAAAAGC CAAATGGCTT TAGTCAGAGT 
501 CCCACAGCCC TGTACAGCAG CCCACCTGAG GTGGACACCT GTATAAATGA 
551 GGATGTTGAG AGCTTGAGGA AGACGGTGCA GGACTTGCTG GCCAAGCTTC 
601 AGGAGGCCAA GCGGCAACAC CAGTCAGACT GTGTGGCTTT TGAGGTCACA 
651 CTCAGCCGGT ACCAGAGGGA AGCAGAACAA AGTAATGTGG CCCTTCAGAG 
701 AGAGGAGGAC AGAGTGGAGC AGAAAGAGGC AGAAGTCGGA GAGCTGCAGA 
751 GGCGCTTGCT AGGGATGGAG ACGGAGCATC AGGCCTTACT GGCGAAAGTG 
801 AGGGAAGGGG AGGTGGCCCT AGAGGAACTT CGGAGCAACA ATGCTGACTG 
851 CCAAGCAGAA CGAGAAAAGG CTGCTACCCT GGAAAAGGAA GTGGCCGGGT 
901 TGCGGGAGAA GATCCACCAC TTGGATGACA TGCTCAAGAG CCAGCAGCGG 
951 AAAGTCCGGC AAATGATAGA GCAGCTCCAG AATTCAAAAG CTGTGATCCA 
1001 GTCAAAGGAC GCCACCATCC AGGAGCTCAA GGAGAAAATC GCCTATCTGG 
1051 AGGCAGAGAA TTTAGAGATG CATGACCGGA TGGAACACCT GATAGAAAAA 
1101 CAAATCAGTC ATGGCAACTT CAGCACCCAG GCCCGGGCCA AGACAGAGAA 
1151 CCCGGGCAGT ATTAGGATAT CCAAGCCGCC TAGCCCGAAG CCCATGCCTG 
1201 TCATCCGAGT GGTGGAAACC TGAGCTGCCT GGAGATGGTT GCTGCCATTG 
1251 CTGCTGCCTC TGCCTCGGAG AAGCCCACTG CCCCTGTTGG CTGTTAACAC 
1301 TGCCTTTGAC TTCCTGACTG TCCCCTGGCT GCACCCAGGA CTTCGGGCTC 
1351 CTGTGTCTCA CCATTCCCAA GCCCCTGGCC ACTCTAAGCT GGGCAGACGG 
1401 AGCACGAGCA CCTATTCAAG GCACTGCAGC CCTTTGG7UVG ACATTGTCCT 
1451 GCAAGCAGGA GCCAGGGCAA TATCTATATT CCTACAGTGA CTATTTTTCT 
1501 CTGTAGAGAG CCTCCCTTCT GTTGTAGACT GGACTCTGGC TGCGCCATAA 
1551 GCCAGGCCTT CATCAGATTG GGAGAGGTGA CAAGATTTGC CTCAGCCCTA 
1601 AAAGCTGGAG ACACAGATGT CCAGAGTGAT TGGAGAATGT CCTGGGGGAA 
1651 TGAAGTTCCT TCCACAAACA CAGCTCAGTT CTTAGCAACA AACTGTTTGT 
1701 TTTTCTACTT GCTCCATCTG CAGCCTACGC TGCCCTGGCC TCCTGCAGAC 
1751 AGATAGTGGG GTTACCTGGC AAGGCCTGGT GAGAGCCAGT GAACCTAAGC 
1801 TTTGACTGGG TGGCCTTGTC TTTCTGGGGA GGAGGGAATG TACATTCAGG 
1851 GAGTAGCCTT TTGCGGAAAA ATTCTCTAGG GCTACAGACA GTCATGTGTG 
1901 ACTTCTCTCT GCTGTGAAAA CTCCCAGAGT CTCTTTAGGG ATTTTCCCTA 
1951 AGGTGTACCA CCAGGCACAC CTCAGTCTTC TTGACCCAGA GCCTGAAAAC 
2001 TGTTTTCACT GGGTTCCACC AGTCCCAGCA AAATCCTCTT TGTATTTATT 
2051 TTGCTAAGTT ATTGGTGGTT TTGCTTACAT CTCATGATTG ATATAATACC 
2101 AAAGTTCTAT AGCCTTCTCT TGCAGTATTT GGATTTGCTT GAAACCGGGA 
2151 AAACTGTTCC CATTAGGCTT GTTAATGTCA GAGTGACACT ATTATGAATC 
2201 TTTCTCTCCC TTTCCTCTGC CTGTTTCTTC TCTCTTTCTC CTTCAAACTT 
2251 GCTCTGCAGC TAAGGAAGGT GAGTCTACTT TCCCTGAGGC TTTGGG6TCA 
2301 GAGTATATGT TGTTTGGAGA AAGAGGGCAA TCAGGACTCT' TCTGGGACCC 
2351 AGATGAGTTC TTCACTAGCC CTTCTGAACC CCTTGCTCCA TAATTGGTCT 
2401 TTTATCCTGG CTCTGAATGA CCCTGCAGGT CATCATGGTT TTCTTTTTTT 
2451 ATTGTTTTTT TTTTTTTCTG AGACAGAGTC TCACTCTGTC ACCCAGGCTG 
2501 GAGTGCAGTG GCGCGATCTC AGCTCACTGC AACCTCTGCC TCCCGGATTT 
2551 AAGCGATTCT TCTGCCTCAG CCTCCCGAGT AGCTGGGACT ACAGGTGTGC 


465 


wo 01/12659 


PCT/IB()0/01496 


2601 
2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 
3051 
3101 


CACCACGCCT 
TACTGGCTAG 
CGGCTTCCCA 
ATGGTGTTTT 
ATCAGAGTAT 
TAAGTGTTTA 
GTGTTTCTGT 
ATGCTCTGGG 
AGCAGGTGAT 
TATTCTTTGT 
AAAAAAAAAA 


GGCTGATTTT 
GCTGGTCTCG 
AAGTGCTAGG 
TCTTTAGGGC 
GGTACTATAG 
GGCTCTATGT 
GTCTCAAGAC 
ATTTCAGGGA 
ATCCATGTTT 
ATGGCGAATT 


TGTATTTTTA 
AATTCCTGAC 
ATTATAGGCT 
TCTTCCTACA 
GAATCAGAAA 
GGCTCACGCA 
TGGGCTCACA 
GTTCCCTCAT 
CTTCCCTTTC 
TAATAAATTA 


GTAGAGATGG 
CTCAGGTGAT 
TGAGCTACTG 
GCCTTGAGAA 
AATTCAAAAC 
GCCAGAATCC 
TTCTGGCTTT 
TTGTAAAATG 
TGATATTGTT 
TATT/UiTGTG 


GGTTTCACCA 
CCACCCACCT 
TGCCCGGCCC 
GTAGATAGGC 
AAATGTGGAT 
TTAAGTCTGT 
GTCCATAACA 
AGGGGGTCAG 
GTCTGTGGCA 
TCTAAAAAAA 


BLAST Results 


No BLAST result 


Medline entries 


98200312: 

Tuftelin — aspects of protein and gene structure 
97228909: 

developSent*"^ expression of enamel gene products during mouse tooth 

913^0750: 

protein^"^ bovine enamelin ("tuftelin**) a novel acidic enamel 


Peptide information for frame 3 


ORF from 51 bp co 1220 bp; peptide length: 390 
Category: strong similarity to known protein 


1 MNGTRWWCTL 
51 YAMVSSHSAG 
101 SEVQYIQEAR 
151 PTALYSSPPE 
201 LSRYQREAEQ 
251 REGEVALEEL 
301 KVRQMIEQLQ 
351 QISHGNFSTQ 


VDVHPEDQAA 
HSLASELVES 
NCLOKLREDI 
VDTCINEDVE 
SNVALQREED 
RSNNADCQAE 
NSKAVIQSKD 
ARAKTENPGS 


GSVDILRLTL 
HDGHEEIIKV 
SSKLDRNLGD 
SLRKTVQDLL 
RVEQKEAEVG 
REKAATLEKE 
ATIQELKEKI 
IRISKPPSPK 


QGELTGDELE 
YLKGRSGDKM 
SLHRQEIQW 
AKLQEAKRQH 
ELQRRLLGME 
VAGLREKIKH 
AYLEAENLEH 
PMPVIRWET 


HIAQKAGRKT 

IHEKNINQLK 
LEKPNGFSQS 
QSDCVAFEVT 
TEHQALT.AKV 
LDDMLKSQQR 
HDRMEHLIEK 


BLAST? hits 


NO BLASTP hits available 


Alert BLASTP hits for DKF2phutel_19g22, frame 3 
No Alert BLASTP hits found 

Pedant information for DKPZphutel_19g22, frame 3 


Report for DKFZphutel_19g22.3 


[LENGTH) 

(MW) 

[pll 

IHOMOL] 

cds. 0.0 

[FONCATJ 

2e-ll 

[FUNCATJ 

[FUNCAT) 

jannaschii , 

[FUMCATl 

[FUNCATJ 

[FONCAT] 

(FUNCAT] 


390 

44264.09 
5.68 

TREMBL:AF047704^1 product: -tuftelin"; Mus musculus tuftelin mRNA, complete 
08.07 vesicular transport (golgi network, etc.) (S. cerevisiae, YDLOSBw) 


[S. cerevisiae, YDLOSBwJ 2e-ll 
recombination and repair (m. 


30.03 organization of cytoplasm 
1 genome replication, transcription, 
MJ1643} 7e-ll 

nVll ni'^r??^^^ ?^ chromosome structure (s, cerevisiae, yLR086wJ le-08 

03.22.01 cell cycle check point proteins (s. cerevisiae Yrroftfi^i H no 

30.10 nuclear organization (s. cerevisiae/YGLSIewree-SB 
03.13 meiosis (s. cerevisiae, YNL250w) 7e-08 
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IFUNCAT) 03.19 recombination and dna repair [S. cerevisiae, YNL250v/) 7e-08 

[FUNCAT] 11.04 dna repair (direct repair, base excision repair and nucleotide excision 

repair) [S. cerevisiae, YKR095w] le-07 

IFUNCAT} 03.22 cell cycle control and mitosis (S, cerevisiae, YDR285w] 2e-07 

I FUNCAT 1 30.13 organization of chromosoine Structure IS. cerevisiae, YDR285w] 2e-07 

[FUNCAT] 99 unclassified proteins (S. cerevisiae, YOR216cl le-05 

( FUNCAT] 01.03.16 polynucleotide degradation [S. cerevisiae, YNL243w] le-04 

IFUNCAT} 03.04 budding, cell polarity and filament formation IS. cerevisiae, YNL243w] 

le-04 

IFONCAT] 30.04 organization of cytoskeleton IS. cerevisiae, YNL243w] le-04 

IFUNCAT] 03.07 pheromone response, mating-type determination, sex-specific proteins 

IS. cerevisiae, YNL243w] le-04 

[FUNCAT} 08.19 cellular import [S, cerevisiae, YNL243w] le-04 

[FUNCAT] 06.10 assembly of protein complexes IS. cerevisiae, YNL243w] le-04 

[FUNCAT] 08.22 cytoskeleton-dependent transport IS. cerevisiae, YHR023w MYOl - 

myosin-1 isoform] 4e-04 

IFUNCAT] 03.25 cytoJtinesis (S. cerevisiae, YHR023w MYOl - myosin-1 isoform] 4e-04 

IFUNCAT] 09.10 nuclear biogenesis (S. cerevisiae, YDR356w] 4e-04 

[FUNCAT] 30.05 organization of centrosome IS. cerevisiae, YMR294w] 7e-04 

[EC] 3.6.1-32 Myosin ATPase 8e-09 

IPIRKW] blocked amino end le-07 

IPIRKW] nucleus le-06 

IPIRKW] citruiline le-07 

IPIRKW] tandem repeat 8e-09 

IPIRKW] hieterodimer 3e~06 

IPIRKW] DNA repair 2e-06 

IPIRKW] heart 8e-09 

IPIRKW] endocytosis 3e-07 

IPIEUCW] transmembrane protein 4e-10 

IPIRKW] zinc finger 3e-07 

IPIRKW] metal binding 3e-07 

[PIRKW] muscle contraction 8e-09 

IPIRKW] acetyiated amino end le-06 

IPIRKW] actin binding 8e-09 

[PIRKW] microtubule binding le-06 

[PIRKW] cell division control le-06 

IPIRKW) ATP 8e-09 

IPIRKW] chromosomal protein 3e-06 

IPIRKW] thick filament 8e-09 

IPIRKW] phosphoprotein le-145 

IPIRKW] Skeletal muscle 8e-09 

(PIRKWl calcium binding le-07 

[PIRKW] meiosis 2e-06 

IPIRKW] alternative splicing 7e-08 

(PIRKW) DNA condensation 3e-06 

[PIRKWJ coiled coil 4e-10 

(PIRKW) P-loop 8e-09 

IPIRKW] heptad repeat le-07 

IPIRKW] methylated amino acid 8e-09 

IPIRKW] immunoglobulin receptor 2e-06 

[PIRKW] peripheral membrane protein 3e-07 

(PIRKW) cardiac muscle 8e-09 

(PIRKW) hydrolase 8e-09 

(PIRKW) muscle 7e-08 

IPIRKW) EF hand le-07 

IPIRKW] cytoskeleton 7e-08 

(PIRKW] hair le-07 

(PIRKW] smooth muscle 7e-06 

(PIRKW) calmodulin binding 3e-07 

ISUPFAM] conserved hypothetical P115 protein 2e-09 

(SOPFAM] myosin heavy chain 8e-09 

(SUPFAM) RAD50 protein 2e-06 

[SUPFAM] calmodulin repeat homology le-07 

(SUPFAM] myosin motor domain homology 8e-09 

(SUPFAM] alpha-actinin actin-binding domain homology le-06 

(SUPFAM) tropomyosin 7e-08 

ISUPFAM] protein-tyrosine kinase ret 3e-07 

ISUPFAM) plectin le-06 

ISUPFAM] trichohyalin le-07 

[SUPFAM] plec)cstrin repeat homology 2e-06 

(SUPFAM) ribosomal protein SIO homology le-06 

[SUPFAM] protein kinase homology 3e-07 

[SUPFAM] protein kinase C zinc-bindina repeat homology 2e-06 

[SUPFAM] giantin 4e-06 

(SUPFAM] kinesin-related protein KLPA le-06 

(SUPFAM) kinesin motor domain homology le-06 

(SUPFAM) human early endosome antigen 1 3e-07 

(SUPFAM) M5 protein 2e-06 

(PROSITE) MYRISTYL 1 

(PROSITEJ AMIDATION 1 

[PROSITE] CK2_PHOSPH0_SITE 6 
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(PROSITE] PKC_PHOSPHO_SITE 4 

[PROSITEJ ASNGLYCOSYLATION 2 

IKW] AllAlpha 

IKW] LOWCOMPLEXITY 4 . 62 % 

[KWJ COILED COIL 35.13 % 


SEQ 
SEG 
PRD 
COILS 


MNGTRNWCTLVDVHPEDQAAGSVDILRLTLQGELTGDELEHIAQKAGRKTYAMVSSHSAG 
cccccceeeeeeeccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 


SEQ 
SEG 
PRD 
COILS 


HSLASELVESHDGHEEIIKVYLKGRSGDKMIHEKNINQLKSEVQYIQEARNCLQKLREDI 
hhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 


SEQ 
SEG 
PRD 
COILS 


SSKLDRNLGDSLHRQEIQVVLEKPNGFSQSPTALYSSPPEVDTCINEDVESLRKTVQDLL 

hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

CCCCCCCCCCCCCCCCCC 


SEQ 
SEG 
PRD 
COILS 


AKLQEAKRQHQSDCVAFEVTLSRYQREAEQSNVALQREEDRVEQKEAEVGELQRRLLGME 

hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
CCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCCC 


SEQ 
SEG 
PRD 
COILS 


TEHQALLAKVREGEVALEELRSNNADCQAEREKAATLEKEVAGLREKIHHLDDMLKSQQR 

hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
CC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 


SEQ 
SEG 
PRD 
COILS 


KVRQMIEQLQNSKAVIQSKDATIQELKEKIAYLEAENLEMHDRMEHLIEKQISHGNFSTQ 

hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
CCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 


SEQ 
SEG 
PRD 
COILS 


ARAKTENPGSIRISKPPSPKPMPVIRWET 

xxxxxxxxxxxxxxxxxx . . . 

hhcccccccceeeecccccccccceeeccc 


Prosite for DKFZphutel_19g22 . 3 


PSOOOOl 

2->6 

PSOOOOl 

356->360 

PS00005 

121->124 

PS00005 

i7i->n4 

PS00005 

370->373 

PS00005 

378->381 

PS00006 

9->13 

PS0D006 

35->39 

PS00006 

122->126 

PS00006 

157->161 

PS00006 

175->179 

PS00006 

322->326 

PSOOOOB 

355->361 

PS00009 

46->50 


ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

PKC_PHOSPH0_SITE 

PKC_PHOSPHO_SITE 

P KC^PHOS PHO_S ITE 

P KC_PHOSPHO_S ITE 

CK2_PH0SPH0_SITE 

CK2_PHOSPH0_SITE 

CK2_PH0SPH0 SITE 

CK2_PH0SPH0~SITE 

CK2_PH0SPH02SITE 

CK2_PH0SPH0 SITE 

MYRISTYL ~ 

AMIDATION 


PDOCOOOOl 
PDOCOOOOl 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00009 


(No Pfam data available for DKFZphutel_19g22 . 3) 
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DKFZphutel_19hl7 


group: intracellular transport and trafficking 

DKFZphutel_19hl7 encodes a novel 879 amino acid protein, with similarity to N.crassa osbP 
oxysterol-binding protein. 

The novel protein contains a oxysterol-binding protein family signature. Mammalian oxysterol- 
binding protein (OSBP) is a protein binds a variety of oxysterols (oxygenated derivatives of 
cholesterol) . OSBP seems to play a complex role in the regulation of sterol metabolism. OSBP 
is a cytosolic/Golgi receptor for oxysterols such as 25-hydroxycholesterol, and thus a 
potential target of siphingomyelin turnover and cholesterol mobilization at the plasma 
membrane and/or Golgi apparatus. Therefore, the new protein seems to be involved in oxysterol 
metabolism. 

The new protein can find application in modulating the response of cells to oxysterols. The 
protein can be used as marker for the golgi system. The Protein might be used to direct drugs 
to the golgi system in response to oxidative stess. 


strong similarity to C.elegans ZK1086.1 and oxysterol-binding proteins 

complete cDNA, complete cds, few EST hits 

similarity to proteins involved in steroid biosynthesis 

Sequenced by AGOWA 

Locus: unknown 

Insert length: 3828 bp 

Poly A stretch at pos, 3811, polyadenylation signal at pos . 3784 


1 GCCCGCGCGC CCGGCCGGCC CGGAGCACCG AGCTCGCGGC ACGGTAGGAG 
51 AAGCCCCCGA GCGCCCACAG CATGAAGGAG GAGGCCTTCC TCCGGCGCCG 
101 CTTCTCCCTG TGTCCACCTT CCTCCACCCC TCAGAAAGTC GACCCCCGGA 
151 AGCTCACCCG GAACTTGCTC CTCAGCGGAG ACAATGAGCT CTACCCACTC 
201 AGCCCAGGGA AGGACATGGA GCCCAACGGC CCGTCGCTGC CCAGGGATGA 
251 AGGGCCCCCG ACCCCAAGCT CTGCCACGAA GGTGCCACCG GCAGAGTACA 
301 GGCTGTGCAA CGGGTCAGAC AAGGAATGTG TGTCCCCCAC CGCCAGGGTC 
351 ACCAAGAAGG AGACTCTCAA GGCGCAGAAG GAGAACTACC GGCAGGAGAA 
401 GAAGCGCGCC ACACGGCAGC TGCTCAGCGC TCTGACAGAC CCCAGCGTGG 
451 TCATCATGGC TGACAGCCTG AAGATCCGCG GCACCCTGAA GAGCTGGACC 
501 AAGCTGTGGT GCGTGCTGAA GCCGGG(3GTG CTGCTCATCT ACAAGACGCC 
551 CAAGGTGGGC CACTGGGTGG GCACGGTGCT GCTGCACTGC TGCGAGCTCA 
601 TCGAGCGGCC CTCCAAGAAG GACGGCTTCT GCTTCAAGCT CTTCCACCCG 
651 CTGGATCAGT CCGTCTGGGC CGTGAAGGGC CCCAAAGGTG AGAGCGTGGG 
701 CTCCATCACA CAGCCCCTGC CCAGCAGCTA CCTGArCTTC AGGGCCGCCT 
751 CCGAGTCAGA TGGTCGCTGC TGGCTGGACG CCCTGGAGCT GGCCCTGCGC 
801 TGCTCTAGCC TACTGAGACT GGGCACCTGC AAGCCGGGCC GAGACGGGGA 
851 GCCAGGGACC TCGCCAGACG CATCACCCTC ATCGCTCTGT GGGCTGCCAG 
901 CCTCAGCCAC TGTCCACCCA GACCAAGACC TGTTCCCACT GAACGGGTCT 
951 TCCCTGGAGA ACGATGCATT CTCAGACAAG TCGGAGAGAG AGAACCCTGA 
1001 GGAGTCAGAT ACCGAGACCC AGGACCATAG CCGGAAGACG GAGAGTGGCA 
1051 GCGACCAGTC AGAGACCCCT GGGGCCCCGG TGCGGAGAGG GACCACCTAT 
1101 GTGGAGCAGG TCCAGGAGGA GCTGGGGGAG CTGGGCGAGG CGTCCCAGGT 
1151 GGAGACAGTG TCAGAGGAGA ACAAGAGTCT GATGTGGACC CTGCTGAAGC 
1201 AGCTACGGCC AGGCATGGAC CTGTCCCGCG TGGTGCTACC CACGTTCGTA 
1251 CTGGAGCCGC GCTCCTTCCT GAACAAGCTC TCCGACTACT ACTACCACGC 
1301 AGACCTGCTC TCCAGGGCTG CGGTGGAGGA GGATGCCTAC AGCCGCATGA 
1351 AGCTGGTGCT GCGGTGGTAC CTGTCTGGCT TCTACAAGAA GCCCAAGGGA 
1401 ATCAAGAAGC CGTACAACCC CATCCTGGGG GAGACCTTCC GCTGCTGCTG 
1451 GTTCCACCCG CAGACTGACA GCCGCACATT CTACATAGCA GAGCAGGTGT 
1501 CCCACCACCC GCCCGTGTCT GCCTTCCACG TCAGCAACCG GAAGGACGGC 
1551 TTCTGCATCA GTGGCAGCAT CACAGCCAAG TCCAGGTTTT ATGGGAACTC 
1601 GCTGTCGGCG CTGCTGGACG GCAAAGCCAC GCTCACCTTC CTGAACCGAG 
1651 CCGAGGATTA CACCCTTACC ATGCCCTACG CCCACTGCAA AGGAATCCTG 
1701 TATGGCACGA TGACCCTGGA GCTGGGTGGG AAGGTCACCA TCGAGTGTGC 
1751 GAAGAACAAC TTCCAGGCCC AGCTGGAATT CAAACTCAAG CCCTTCTTCG 
1801 GGGGTAGCAC CAGCATCAAC CAGATCTCGG GAAAGATCAC GTCGGGAGAG 
1851 GAAGTCCTGG CGAGCCTCAG TGGCCACTGG GACAGGGACG TGTTTATCAA 
1901 GGAGGAAGGG AGCGGAAGCA GTGCGCTTTT CTGGACCCCG AGCGGGGAGG 
1951 TCCGCAGACA GAGGCTGAGG CAGCACACGG TGCCGCTGGA GGAGCAGACG 
2001 GAGCTGGAGT CCGAGAGGCT CTGGCAGCAC GTCACCAGGG CCATCAGCAA 
2051 GGGCGACCAG CACAGGGCCA CACAGGAGAA GTTTGCACTG GAGGAGGCAC 
2101 AGCGGCAGCG GGCCCGTGAG CGGCAGGAGA GCCTCATGCC CTGGAAGCCG 
2151 CAGCTGTTCC ACCTGGACCC CATCACCCAG GAGTGGCACT ACCGATACGA 
2201 GGACCACAGC CCCTGGGACC CCCTGAAGGA CATCGCCCAG TTTGAGCAAG 
2251 ACGGGATCCT GCGGACCTTG CAGCAGGAGG CCGTGGCCCG CCAGACCACC 
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2301 TTCCTGGGCA GCCCAGGGCC CAGGCACGAG AGGTCTGGCC CAGACCAGCG 
2351 GCTTCGCAAG GCCAGCGACC AGCCCTCCGG CCACAGCCAG GCCACGGAGA 
2401 GCAGCGGATC CACGCCTGAG TCCTGCCCAG AGCTCTCAGA CGAGGAGCAG 
2451 GATGGTGACT TTGTCCCTGG CGGTGAGAGC CCATGCCCTC GGTGCAGGAA 
2501 GGAGGCGCGG CGGCTGCAGG CCCTGCACGA GGCCATCCTC TCCATCCGAG 
2551 AGGCCCAGCA GGAGCTGCAC AGGCACCTCT CGGCCATGCT GAGCTCCACG 
2601 GCACGGGCAG CACAGGCACC GACCCCAGGC CTCCTGCAGA GCCCCCGATC 
2651 CTGGTTCCTG CTCTGCGTGT TCCTGGCGTG TCAGCTGTTC ATTAACCACA 
2701 TCCTCAAATA GGAGCCCTGG GGGCAGAGCT CCTGGCCAGT CCCGAGCCCT 
2751 CCCTCCCAGG CACCCAGCAC TTTAAGCCTG CTCCATGGAG GCAGAGAGGC 
2801 CCGGCAAGCA CAGCCACTGT GACGGGGAGT CCAGGCGCAG GAGGGACCCG 
2851 GGGCCACAAG GCGCTGCGGG CCCAGGTGTG CTGGGCCCCT CTCAGGGGCA 
2901 CTGGCCTCTC TGCAGGGCCT TCCGCCCAGC GCTGGCCTTA ATGCTAAAGC 
2951 CAAATGCAGC TTCTGCTGTG CGACGCACTC CTGGCCATCT TGCCGTGTCA 
3001 CCCCCTGTCC GGCCTCCACT TGCCATGGGG GATGGATGGA TTTAGGGTGG 
3051 GAGGGCCTGT GGGGGCCCTG GACAGTCACA CCCCAGCAGC AGTGAGTGGG 
3101 CAGGTTTGGA GGAGCAGCCA GGGAGCCCCG AGTGGCCCAG GAGTCCCCCC 
3151 ACACACAGAT GCATAGGCCT GCCTTCCGGA GACCCTGTCC ACATTGCCGG 
3201 GACCACCCTG GTGGGGCCAC TGGTGGGTGC CAGGGACAGG TTAGGGCCAC 
3251 TCTGGGGAAG GCATTTTGGT TTTTTATTCC ACGCTCTGCT GTTTGGATGG 
3301 GAGCCCCACA GAGGCAGGTC CTGGAACCAC CCCACCCCCA CACCTGGAC6 
3351 CTCGCTCTGG TGGGGGCACA CGCAGGTGGA GGTGGTTGTG GGTGCAGGTG 
3401 TGTGCAGGGG TGTGGGGGGC GCAGGGGTGT GGCTTAGCTG GCCCCGCACC 
3451 CAGGCCGGGG AGGCTCAAGT TCGCCACTTT ACTCAGACCG ATGCACAGTC 
3501 TTCCCATTTT ACACTTTTTT AATAAACATA ATTGCAATAT TTTAGGTGGG 
3551 CTGCGAGCTG CAGTCAGCCT TCACGTCTGG CCTCAGTCCC CGTGTCAGTG 
3601 CCGCTCTGCG TGTGCGTGTG CGCGTGTGTG AGCCTCTACA CATATATATA 
3651 TGTACAGAGC CTTAAACCAC ATCGTGGCGG TGCCGTCTGA GCTGTAGCGG 
3701 GTGGCTTTGT TTCCAGTTTT TGTACCCGTG TCCTTGTCTC CCCTCCTCCC 
3751 CCATCTGGGG ATGTGTCTGT GTTCCACACC TTGAAATAAA CAGACACATA 
3801 CGTGTTCTCT TAAAAAAAAA AAAAAAAA 


BLAST Results 


No BLAST result 


Medline entries 


98315477: 

The pleckstrin homology domain of oxysterol-binding 
protein recognises a determinant specific to Golgi 
membranes , 

98146266: 

A Drosophila homologue of oxysterol binding protein 
(OSBP) — implications for the role of 
OSBP. 

98146266: 

A Drosophila homologue of oxysterol binding protein 
(OSBP) — implications for the role of 
OSBP. 


Peptide information for frame 3 


ORF from 72 bp to 2708 bp; peptide length: 879 
Category: strong similarity to known protein 


1 MKEEAFLRRR FSLCPPSSTP QKVDPRKLTR NLLLSGDNEL YPLSPGKDME 
51 PNGPSLPRDE GPPTPSSATK VPPAEYRLCN GSDKECVSPT ARVTKKETLK 
101 AQKENYRQEK KRATRQLLSA LTDPSWIMA DSLKIRGTLK SWTKLWCVLK 
151 PGVLLIYKTP KVGQWVGTVL LHCCELIERP SKKDGFCFKL FHPLDQSVWA 
201 VKGPKGESVG SITQPLPSSY LIFRAASESD GRCWLDALEL ALRCSSLLRL 
251 GTCKPGRDGE PGTSPDASPS SLCGLPASAT VHPDQDLFPL NGSSLENDAF 
301 SDKSERENPE ESDTETQDHS RKTESGSDQS ETPGAPVRRG TTYVEQVQEE 
351 LGELGEASQV ETVSEENKSL MWTLLKQLRP GMDLSRVVLP TFVLEPRSFL 
401 NKLSDYYYHA DLLSRAAVEE DAYSRMKLVL RWYLSGFYKK PKGIKKPYNP 
451 ILGETFRCCW FHPQTDSRTF YIAEQVSHHP PVSAFHVSNR KDGFCISGSI 
501 TAKSRFYGNS LSALLDGKAT LTFLNRAEDY TLTMPYAHCK GILYGTMTLE 
551 LGGKVTIECA KNNFQAQLEF KLKPFFGGST SINQISGKIT SGEEVLASLS 
601 GHWDRDVFIK EEGSGSSALF WTPSGEVRRQ RLRQHTVPLE EQTELESERL 
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651 WQHVTRAISK GDQHRATQEK FALEEAQRQR ARERQESLMP WKPQLFHLDP 

701 ITQEWHYRYE DHSPWDPLKD lAQFEQDGIL RTLQQEAVAR QTTFLGSPGP 

751 RHERSGPDQR LRKASDQPSG HSQATESSGS TPESCPELSD EEQDGDFVPG 

801 GESPCPRCRK EARRLQALHE AILSIREAQQ ELHRHLSAML SSTARAAQAP 
851 TPGLLQSPRS WFLLCVFLAC QLFINHILK 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKF2phutel_19hl7, frame 3 

TREMBL:CEZK1086_2 gene: "ZK1086.1-; Caenorhabditis elegans cosmid 
ZK1086, N = 1, Score « 1495, P - 2.7e-153 

PIR:S25324 hypothetical protein YKR003w - yeast (Saccharomyces 
cerevisiae), N = 2, Score - 574, P = 8.5e-57 

TREMBL:CEAF195^7 gene: "C32F10.1"; Caenorhabditis elegans cosniid 
C32F10., N 1, Score = 588, P = 8.6e-57 

PIK:S46796 hypothetical protein yKR003w homolog YHROOlw - yeast 
(Saccharomyces cerevisiae), N = 1, Score - 585, P - 1.9e-56 

TREMBL:NC0SBP_1 gene: -osbP-; product: "oxysterol-binding protein"; 
N.crassa mRNA for putative oxysterol-binding protein, N = 1, Score = 
571, p = 7e-55 

TREMBL:AB017026_1 product: "oxysterol-binding protein"; Mus musculus 
mRNA for oxysterol-binding protein, complete cds., N = 2, Score = 328 
P = 3e-35 


>TREMBL:CE2K1086_2 gene: "2K1086.1"; Caenorhabditis elegans cosmid ZK1086 
Length == 751 

HSPs: 

Score = 1495 (224.3 bits), Expect - 2.7e-153, P = 2.7e-153 
Identities - 327/663 (49%), Positives » 430/663 (64%) 

Query: 129 MADSLKIRGTLKSWTKLWCVLKPGVLLIYKTPKV— GQWVGTVLLHCCELIERPSKKDGF 186 

MAD+LKIRG LK W + +CVLKPG+L++YK K G WVGTVLL+ CELIERPSKKDGF 
Sbjct: 1 MADTLKIRGALKRWNRYYCVLKPGLLILYKHKKADRGDWVGTVLLNHCELIERPSKKDGF 60 

Query: 187 CFKLFHPLDQSVWAVKGPKGESVGSIT-QPLPSSYLIFRAASESDGRCWLDALELALRCS 245 

CFKLFHP+D S+W +GP G+S GS T PL +S+Lr RA S+ GRCW+DALEL+ +C+ 
Sbjct: 61 CFKLFHPMDMSIWGNRGPLGQSFGSFTLNPLNTSFLICRAPSDQAGRCWMDALELSFKCT 120 

Query: 246 SLLRLGTCKPGRDGEPGTSPDASPSSLCGLPASATVHPDQDLFPLNGSSLENDAFSDK-S 304 

LL+ T D + G D+S +G ++DD G AS+ + 

Sbjct: 121 GLLKK-TMNE-LDDKNG DSSMND— GQRDESRMSRDSD GDDTRELAVSETDA 168 

Query: 305 ERENPEESDTETQDHSRKTESGSDQSETPGAPVRRGTT— YVEQVQEELGELGEASQVE 361 

E+ E D + +DH E G SET +R T ++ +E G G S E 
Sbjct: 169 EKHFQEIDDVQDEDH EDGK-MSETSDT-IREAFTESAWIPSPKEVFGPDG—SLTE 220 

Query: 362 TVSEENKSLMWTLLKQLRPGMDLSRVVLPTFVLEPRSFLNKLSDYYYHADLLSRAAVEED 421 

V EENKSL+WTLLKQ+RPGMDLS+VVLPTF+LEPRSFL KL+DYYYHADL+S A E D 
Sbjct: 221 EVGEENKSLIWTLLKQIRPGMDLSKWLPTFILEPRSFLEKLADYYYHADLISEAVAEPD 280 

Query: 422 AYSRMKLVLRWYLSGFYKKPKGIKKPYNPILGETFRCCWFHPQTDSRTFYIAEQVSHHPP 481 

+ R+ V +++LSGFYKKPKG+KKPyNPILGETFRC W HP S TFY+AEQVSHHPP 
Sbjct: 281 PFQRIVKVTKFFLSGFYKKPKGLKKPYNPILGETFRCKWEHPD-GSTTFYMAEQVSHHPP 339 

Query: 482 VSAFHVSNRKDGFCISGSITAKSRFYGNSLSALLDGKATLTFLNRAEDYTLTMPYAHCKG 541 

VS+ ++NRK GF ISG+I AKS++YGNSLSA+L GK LT LN E Y + +PYA+CKG 
Sbjct: 340 VSSLFITNRKAGFNISGTILAKSKYYGNSLSAILAGKLRLTLLNLGETYIVNLPYANCKG 399 

Query: 542 ILYGTMTLELGGKVTIECAKNNFQAQLEFKLKPFFGGSTSINQISGKITSGEEVLASLSG 601 

1+ GTMT+ELGG+V IEC K ++ L+FKLKP GG+ NQI G I G + LAS+ G 
Sbjct: 400 IMIGTMTMELGGEVNIECEKTGYRTTLDFKLKPMLGGA— YNQIEGSIKYGSDRLASIEG 457 

Query: 602 HWDRDVFIKEEGSGSSALEWTPSGEVRRQRLRQHTVPLEEQTELESERLWQHVTRAISKG 661 

, WD + IK G W P+ EV + RL ++ + ++EQ E ES +LW+HVT AIS 

Sb3ct: 458 AWDGVIRIK— GPDGKKELWNPTPEVIKTRLPRYEINMDEQGEWESAKLWRHVTEAISNE 515 

Query: 662 DQHRATQEKFALEEAQRQRARERQESLMPWKPQLFHLDPITQEWHYRYEDHSPWDPLKDI 721 

DQ++AT+EK ALE QR RA+ S +P + + F ++ Y + D+ PWD DI 

Sbjct: 516 DQYKATEEKTALENDQRARAK SGIPHETKFFKKQH-GDDYVYIHADYRPWDNNNDI 570 
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Query: 722 AQFEQDGILRTLQQEAVAR— QTTFLGSPGPRHERSGPDQRLRKASDQPSGHSQATESSG 779 

0 E + +++T+ + + + + LGS E S D+ + +P + + 

Sbjct: 571 QQIENNYVVKTISRHSKRKTGNSEQLGSDNTS-EASESDEEVI EPKIKKKEIVPAK 625 

Query: 780 STPESCPELSDE 791 

S P + PE++DE 
Sbjct: 626 SKPIT-PEVADE 636 


Pedant information for DKrzphutel_19hl7, frame 3 


Report for DKF2phutel_19hl7.3 


f LENHTHl 

879 


{MW} 

98616.79 



7.29 


1 n\jn\JXj J 

TREMBL;CEZK1086 2 gene: "ZK1086.1" 

t Caenorhabditis elegans cosmid 2K1086 le-157 


01.06.16 lipid and fatty-acid binding {s. cerevisiae, YHROOlwJ 3e-55 

{FUNCATJ 

01.06.01 lipid, fatty-acid and sterol biosynthesis (S. cerevisiae, YHROOlvr] 

3e-55 



[FUNCATJ 

30.03 organization of cytoplasm 

[S. cerevisiae, YPH45cJ 3e-23 

[FUNCATI 

08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YPL145cl 

3e-23 



[FUNCAT) 

04.05.01.07 chromatin modification 

(S. cerevisiae, YAR044wJ 5e-20 

(BLOCKS! 

BL00168F 

(BLOCKS) 

BL01013D Oxysterol -binding protein 

family proteins 

(BLOCKS) 

BL01013C Oxysterol "-binding protein 

family proteins 

(BLOCKS 1 

3L01013B Oxysterol-binding protein 

family proteins 

I BLOCKS! 

BL01013A Oxysterol-binding protein 

family proteins 

(PIRKWl 

transmembrane protein le-19 


(SUPFAMl 

pleclcstrin repeat homology 8e-18 


[SUPFAMI 

anJcyrin repeat homology le-19 


(SUPFAM! 

unassigned ankyrin repeat proteins 

le-19 

(PROSITE! 

MYRISTYL 12 


[PROSITEJ 

CAMP PHOSPHO SITE 6 


(PROSITE] 

OSBP"" 1 


(PROSITEJ 

CK2 PHOSPHO SITE 21 


(PROSITE] 

PROKAR LIPOPROTEIN 1 


(PROSITE] 

TYR PHOSPHO SITE 2 


(PROSITEJ 

PKC PHOSPHO SITE 20 


[PROSITEJ 

ASN^GLYCOS YLAT ION 3 


[PFAMJ 

PH (pleckstrin homology) domain 


[KWJ 

TRANSMEMBRANE 1 


(KWJ 

LOW COMPLEXITY 2.96 % 


(KW] 

COILED COIL 3.53 % 



SEQ MKEEAFLRRRFSLCPPSSTPQKVDPRKLTRNLLLSGDNELYPLSPGKDMEPNGPSLPRDE 

SEG 

PRO ccchhhhhhhhccccccccccccccccccccccccccccccccccccccccccccccccc 

COILS ■ 

MEM 

SEQ GPPTPSSATKVPPAEYRLCNGSDKECVSPTARVTKKETLKAQKENYRQEKKRATRQLLSA 

SEG 

PRO cccccccccccccceeeecccccceeeeccccchhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCC 

MEM 

SEQ LTDPSWIMADSLKIRGTLKSWTKLWCVLKPGVLLIYKTPKVGQWVGTVLLHCCELIERP 

SEG 

PRO hcccceeeecccccccccccccceeeeeeccceeeeecccccccceeeeecccccccccc 

COILS CCC 

MEM 

SEQ SKKDGFCFKLFHPLDQSVWAVKGPKGESVGSITQPLPSSYLIFRAASESDGRCWLDALEL 

SEG 

PRO ccccceeeeecccccceeeeecccccceeecccccccceeeeeeehhhhhhhhhhhhhhh 

COILS 

MEM 

SEQ ALRCSSLLRLGTCKPGRDGEPGTSPDASPSSLCGLPASATVHPDQDLFPLNGSSLENDAF 

SEG 

PRD hhhhhhhhhhhhcccccccccccccccccccccccccccccccccccccccccccccccc 

COILS 

MEM 

SEQ SDKSERENPEESDTETQDHSRKTESGSDQSETPGAPVRRGTTYVEQVQEELGELGEASQV 
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SEG 
PRD 
COILS 
MEM 

SEQ 

SEG 

PRD 

COILS 

MEM 

SEQ 

SEG 

PRD 

COILS 

MEM 

SEQ 

SEG 

PRD 

COILS 

MEM 

SEQ 

SEG 

PRD 

COILS 

MEM 

SEQ 

SEG 

PRD 

COILS 

MEM 

SEQ 

SEG 

PRD 

COILS 

MEM 

SEQ 

SEG 

PRD 

COILS 

MEM 

SEQ 

SEG 

PRD 

COILS 

MEM 

SEQ 

SEG 

PRD 

COILS 

MEM 


xxxxxxxxxxxxx . . . . 

cccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhccccccc 


ETVSEENKSLMWTLLKQLRPGMDLSRVVLPTFVLEPRSFLNKLSOyYYHADLLSRAAVEE 
cccccccchhhhhhhhhhcccccceeeccceeeecccchhhhhhhhhccccccccccccc 

DAYSRMKLVLRWYLSGFYKKPKGIKKPYNPILGETFRCCWFHPQTDSRTFYIAEQVSHHP 
chhhhhhhhhhhhhhhcccccccccccccccccceeeeeecccccccceeeeeccccccc 

PVSAFHVSNRKDGFCISGSITAKSRFYGNSLSALLDGKATLTFLNRAEDYTLTMPYAHCK 
cceeeeecccccccccccccccccccccccccccccceeeeeeccccceeeeccccceee 

GILYGTMTLELGGKVTIECAKNNFQAQLEFKLKPFFGGSTSINQISGKITSGEEVLASLS 
eeeeeccccccccceeeeeccccccceeeecccccccccccceeeeeccccccceeeeec 

GHWDRDVFIKEEGSGSSALFWTPSGEVRRQRLRQHTVPLEEQTELESERLWQHVTRAISK 
cccccceeeeeccccceeeeeccccccccccccccccccchhhhhhhhhhhhhhhhhhhh 


GDQHRATQEKFALEEAQRQRARERQESLMPWKPQLFHLDPITQEWHYRYEDHSPWDPLKD 

xxxxxxxxxxxxx 

cchhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccccccceeeeccccccccchh 


lAQFEQDGILRTLQQEAVARQTTFLGSPGPRHERSGPDQRLRKASDQPSGHSQATESSGS 
hhhhhhhhhhhhhhhhhhhhhhhhccccccccccccchhhhhcccccccccccccccccc 

TPESCPELSDEEQDGDFVPGGESPCPRCRKEARRLQALHEAILSIREAQQELHRHLSAML 
ccccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SSTARAAQAPTPGLLQS PRSWFLLC VFLACQLFINH I LK 
hhhhhhhcccccccccccceeeeehhhhhhhhhhhhccc 
.................... . MMMMMMMMMMMMMMMMM • 


Prosit e for DKF2phutel_19hl7.3 


PSOOOOl 

80->84 

PSOOOOl 

291->295 

PSOOOOl 

367->371 

PS00004 

9->13 

PS00004 

26->30 

PS00004 

95->99 

PS00004 

111->115 

PS00004 

338->342 

PS00004 

762->766 

PS00005 

82->85 

PS00005 

90->93 

PS00005 

94->97 

PS00005 

98->101 

PS00005 

132->135 

PS00005 

138->141 

PS00005 

159->162 

.PS00005 

181->184 

PS00005 

252->255 


ASN_GLYCOSYLATI0N 
ASN__GLYCOSYLATI0N 
ASNGLYCOS YLATI ON 
CAMP_PH0SPHO_SITE 
CAMP_PHOS PHO_S ITE 
CAMP_PHOS PHO_S I TE 
CAMP_PHOS PHO_S ITE 
CAMP_PHOS PHO_S ITE 
CAMP_PH0S PHO_S I TE 
PKC__PHOSPHO_SITE 
PKC_PHOSPHO_S ITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC PHOSPHO SITE 


PDOCOOOOl 
PDOCOOOOl 
PDOCOOOOl 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 


473 


wo 01/12659 


PCT/IBOO/01496 


PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00007 

PS00007 

PS00008 

PS00008 

PS00008 

PS00008 

PS00008 

PS00008 

PS00008 

PS00008 

PS00008 

PS00008 

PS00008 

PS00008 

PS00013 

PS01013 


301->304 
304->307 
320->323 
455->458 
488->491 
501->504 
586->589 
647->650 
824->827 
843->846 
857->860 
82->86 
94->98 
181->185 
227->231 
263->267 
293->297 
3O4->308 
312->316 

325- >329 
342->346 
358->362 
362->366 
590->594 
643->647 
659->663 
713->717 
755->759 
780->784 
784->788 
789->793 
824->828 
402->409 
415->424 
137->143 
163->169 
274->280 

326- >332 
381->387 
498->504 
508->514 
541->547 
552->558 
577->583 
613->619 
728->734 
860->871 
474->485 


PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC PHOSPHORS I TE 

PKC~PHOSPHO SITE 

PKC^PHOSPHO^SITE 

PKC_PHOSPHO_SITE 

PKC_PHOS PHO_SI TE 

PKC_PHOSPHO_SITE 

PKC PHOSPHORS I TE 

CK2"PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PHOS PHO_S ITE 

CK2_PHOSPHO_SITE 

CK2 PHOSPHO_SITE 

CK2~PHOSPHO_SITE 

CK2_PHOS PHOS ITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PHOS PHOSITE 

CK2_PH0SPH0_SITE 

CK2_PH0S PHO_S ITE 

CK2_PHOSPHO_SITE 

CK2_PH0S PHO_S ITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PHOSPHO_SITE 

CK2_PH0SPH0_SITE 

TYR_PHOSPHO_SITE 

TYR_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

PROKAR_LIPOPROTEIN 
OSBP 


PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 

pDcxroooos 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

POOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

P0OC00006 

PDOC00006 

PDOC00006 

PDOC00006 

POOC00006 

PDOC00006 

PDOC00006 

POOC00006 

PDOC00006 

PDOC00007 

PDOC00007 

PDOC00008 

POOC00008 

POOC00008 

PDOC00008 

PDOC00008 

P0OC00008 

P0OC00008 

PDOC00008 

PDocooooa 

PDOC00008 
P0OC00008 
PDOC00008 
PDOC00013 
PDOC00774 
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HMM^NAME 

HMM 
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HMM 
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HMM 

Query 


PH (pleckstrin homology) domain 


126 


*dvIREGWMyKWgswrkstgnWqrRWFvLrndpnrLiYYkddkdekPrYM 
+VI+ +++++G + W + W+VL++ ++L+ YK + + + ++ 

WIMADSLKIRGTLKS WTKLWCVLKP—GVLLIYKTP-KVGQWVG 


lldldcWrMidVEidWmradndHCFilWtrq 

L+C+ +1+ ++ ++ +CF+++ + 
168 TVLLHCCELIERPSKKD GFCFKLFHPLDQSVWAVKGPKGESVGSITQ 


167 


214 


rtYYFQAeNeEEMmeWMsalrRalw* 

+ ++F+A++E++ + W++A++ A++ 
215 PLPSSYLIFRAASESDGRCWLDALELALR 


243 
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DKrZphutel_19jll 


group: uterus derived 

DKF2phutel_19Jll encodes a novel 708 amino acid protein with C-terminal similarity to several 
known proteins, such as human KIAA0231 or murine ras binding protein Sur8. 

No informative B1*AST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of uterus-specific 
genes . . ^ 


Strong similarity to KIAA0231, similarity to ras binding protein Sur8 

EST AA854189 extendes the sequence (294 Bp), with this sequence 

complete cDNA, 

Sequenced by AGOWA 

Locus: unknown 

Insert length: 2343 bp 

Poly A stretch at pos. 2323, polyadenylation signal at pos. 2295 


1 GCTCCTGCTA ACCCCATCAC TGTGGAAATG AAAGGCCTGA AGACAGATTT 
51 GGACCTTCAG CAGTACAGCT TTATAAATCA GATGTGTTAT GAGCGAGCCC 
101 TCCACTGGTA TGCCAAGTAT TTCCCTTACC TTGTCCTCAT CCATACCCTG 
151 GTCTTTATGC TCTGCAGTAA CTTTTGGTTC AAATTCCCTG GTTCCAGCTC 
201 CAAAATAGAA CATTTCATCT CCATTCTGGG GAAGTGTTTT GACTCTCCTT 
251 GGACCACACG GGCTTTATCT GAAGTGTCTG GGGAGGACTC AGAAGAAAAG 
301 GACAACAGGA AGAACAACAT GAACAGGTCC AACACCATCC AATCTGGTCC 
351 AGAAGGCAGC CTGGTCAACT CTCAGTCTTT AAAGTCCATT CCTGAGAAGT 
401 TTGTAGTTGA TAAATCCACT GCAGGGGCTC TGGATAAAAA GGAAGGTGAG 
451 CAGGCTAAGG CCTTATTTGA GAAGGTGAAG AAGTTCAGGC TGCATGTGGA 
501 AGAAGGTGAT ATTCTATATG CCATGTATGT TCGCCAGACT GTACTTAAAG 
551 TTATCAAATT CCTAATCATC ATTGCATATA ATAGTGCTCT GGTTTCCAAG 
601 GTCCAGTTTA CAGTGGACTG TAATGTGGAC ATTCAGGACA TGACTGGATA 
651 TAAAAACTTT TCTTGCAATC ATACCATGGC ACACTTGTTC TCAAAACTGT 
701 CCTTTTGCTA TCTGTGCTTT GTTAGTATCT ATGGATTGAC GTGCCTTTAT 
751 ACCTTATACT GGCTGTTCTA CCGTTCTCTA CGGGAATATT CCTTTGAGTA 
801 TGTCCGTCAG GAGACTGGAA TTGATGATAT TCCAGATGTG AAAAATGACT 
851 TTGCTTTTAT GCTTCATATG ATAGATCAGT ATGACCCTCT CTATTCCAAG 
901 AGATTTGCAG TGTTCCTGTC TGAAGTCAGT GAAAACAAAT TAAAGCAGCT 
951 GAACTTAAAT AACGAATGGA CTCCTGATAA ACTGAGGCAG AAGCTACAGA 
lOOl CAAATGCCCA TAATCGACTG GAATTGCCTC TTATCATGCT CTCTGGCCTT 
1051 CCAGACACTG TTTTTGAAAT CACAGAGTTG CAATCTCTAA AACTTGAAAT 
1101 CATTAAGAAC GTAATGATAC CAGCCACCAT TGCACAGCTA GACAATCTTC 
1151 AAGAGCTCTC TCTGCACCAG TGTTCTGTCA AAATCCACAG TGCGGCGCTC 
1201 TCTTTCCTGA AGGAATIACCT CAAGGTCTTG AGCGTCAAGT TTGATGACAT 
1251 GAGGGAACTC CCCCCCTGGA TGTATGGGCT CCGAAATCTG GAAGAGCTGT 
1301 ACCTAGTTGG CTCTCTAAGT CATGATATTT CCAG7AATGT CACCCTTGAG 
1351 TCTCTGCGGG ATCTCAAAAG CCTTAAAATT CTCTCTATCA AAAGCAACGT 
1401 TTCCAAAATC CCTCAGGCAG TGGTTGATGT TTCCAGCCAT CTCCAGAAGA 
1451 TGTGCATACA TAATGATGGC ACCAAGCTGG TGATGCTCAA CAACTTAAAG 
X501 AAGATGACCA ATCTGACAGA GCTGGAGCTG GTCCACTGTG ACCTGGAGCG 
1551 TATTCCTCAT GCTGTGTTCA GCCTACTCAG CCTCCAGGAA TTGGACCTGA 
1601 AGGAAAACAA TCTGAAATCT ATAGAAGAAA TCGTTAGCTT TCAGCACTTA 
1651 AGAAAGTTGA CAGTGCTAAA ACTGTGGCAT AACAGCATCA CCTACATCCC 
1701 AGAGCATATA AAGAAACTCA CCAGCCTGGA ACGCCTGTCC TTTAGTCACA 
1751 ATAAAATAGA GGTGCTGCCT TCCCACCTCT TCCTATGCAA CAAGATCCGA 
1801 TACTTGGACT TATCGTACAA TGACATTCGA TTTATCCCCC CTGAAATTGG 
1851 AGTTCTACAA AGTTTACAGT ATTTTTCCAT CACATGTAAC AAAGTGGAZA 
1901 GCCTTCCAGA TGAACTCTAC TTCTGCAAGA AACTTAAAAC TCTGAAGATT 
1951 GGAAAAAACA GCCTATCTGT ACTTTCACCG AAAATTGGAA ATTTGCTATT 
2001 TCTTTCCTAC TTAGATGTAA AAGGTAATCA CTTTGAAATC CTCCCTCCTG 
2051 AACTGGGTGA CTGTCGGGCT CTGAAGCGAG CTGGTTTAGT TGTAGAAGAT 
2101 GCTCTGTTTG AAACTCTGCC TTCTGACGTC CGGGAGCAAA TGAAAACAGA 
2151 ATAACTTATT TTTCGTTAAA GTTTGACTGA AACACGCTTC TACCAAATAC 
2201 AGTATAAATA ATTAGGTAGT CTTAATGCCT TTCCTATTTT TTTTTCCTTT 
2251 TCACACAAAA TGTACACAAA GATCGCGTAA GGAGTATGTA TTTTTAATAA 
2301 AAATTTAATT GTATTTTTTC AATATTAAAA AAAAAAAAAA AAA 


BLAST Results 


No BLAST result 
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Medline entries 


96421675: 

Characterization of ciensin-180, a new brain-specific synaptic protein 
of the 

0-sialoglycoprotein family. 
98337190: 

SUR~8, a conserved Ras-binding protein with leucine-rich 
repeats, positively regulates Ras -mediated signaling in C. 
elegans. 


Peptide information for frame 1 


ORF from 28 bp to 2151 bp; peptide length: 708 
Category: similarity to known protein 
Classification: Cell signaling/communication 


1 MKGLKTDLDL QQYSFINQMC YERALHWYAK YFPYLVLIHT LVFMLCSNFW 
51 FKFPGSSSKI EHFISILGKC FDSPWTTRAL SEVSGEDSEE KDNRKNNMNR 
101 SNTIQSGPEG SLVNSQSLKS IPEKFVVDKS TAGALDKKEG EQAKALFEKV 
151 KKFRLHVEEG DILYAMYVRQ TVLKVIKFLI IIAYNSALVS KVQFTVDCNV 
201 DIQDMTGYKN FSCNHTMAHL FSKLSFCYLC FVSIYGLTCL YTLYWLFYRS 
251 LREYSFEYVR QETGIDDIPD VKNDFAFMLH MIDQYDPLYS KRFAVFLSEV 
301 SENKLKQLNL NNEWTPDKLR QKLQTNAHNR LELPLIMLSG LPDTVFEITE 
351 LQSLKLEIIK NVMIPATIAQ LDNLQELSLH QCSVKIHSAA LSFLKENLKV 
401 LSVKFDDMRE LPPWMYGLRN LEELYLVGSL SHDISRNVTL ESLRDLKSLK 
451 ILSIKSNVSK IPQAWDVSS HLQKMCIHND GTKLVMLNNL KKMTNLTELE 
501 LVHCDLERIP HAVFSLLSLQ ELDLKENNLK SIEEIVSFQH LRKLTVLKLW 
551 HNSITYIPEH IKKLTSLERL SFSHNKIEVL PSHLFLCNKI RYLDLSYNDI 
601 RFIPPEIGVL QSLQYFSITC NKVESLPDEL YFCKKLKTLK IGKNSLSVLS 
651 PKIGNLLFLS YLDVKGNHFE ILPPELGDCR ALKRAGLWE DALFETLPSD 
701 VREQMKTE 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_19j 11, frame 1 

TREMBL:HSD984_1 gene: "KIAA0231''; Human mRNA for KIAA023i gene, 
partial cds., N = l, Score = 1408, P = 4.5e-144 

TREMBL:AF054827_1 gene: "soc-2"; product: "leucine-rich repeat protein 
SOC-2"; Caenorhabditis elegans leucine-rich repeat protein SOC-2 
{soc-2) mRNA, complete cds., N = 1, Score » 304. P = 5.7e-24 

TREMBL:RNU66707_1 product: "densin-180"; Rattus norvegicus densin-180 
mRNA, complete cds., N =• 1, Score - 311, P =» 7.4e-24 

TREMBL:AF068921_1 product: "Ras-binding protein SUR-8"; Mus musculus 
Ras-binding protein SUR-8 mRNA, complete cds., N » 1, Score « 302, P = 
l.le-23 


>TREMBL:HSD984_1 gene: "KIAA0231"; Human mRNA for KIAA0231 gene, partial 
cds. 

Length = 476 

HSPS: 

Score = 1408 (211.3 bits), Expect = 4.5e-144, P « 4.5e-144 
Identities = 265/471 (56%), Positives = 361/471 (76%) 

Query: 237 LTCLYTLYWLFYRSLREYSFEYVRQETGIDDIPDVKNDFAFMLHMIDQYDPLYSKRFAVF 296 

LT Y+L+W+ SL++YSFE 4R+++ DIPDVKNDFAF+LH+ DQYDPLYSKRF++F 
Sbjct: 1 LTSSYSLWWMLRSSLKQYSFEALREKSNYSDIPDVKNDFAFILHLADQYDPLYSKRFSIF 60 

Query: 297 LSEVSENKLKQLNLNNEWTPDKLRQKLQTNAHNRLELPLIMLSGLPDTVFEITELQSLKL 356 

LSEVSENKLKQ+NLNNEWT +KL+ KL NA +++EL L ML+GLPD VFE+TE++ L L 
Sbjct: 61 LSEVSENKLKQINLNNEWTVEKLKSKLVKNAQDKIELHLFMLNGLPDNVFELTEMEVLSL 120 
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Query: 357 EIIKNVMIPATIAQLDNLQELSLHQCSVKIHSAALSFLKENLKVLSVKFDDMRELPPWMY 416 

E+I V +P+ ++QL NL+EL ++ S+ + AL+FL+ENLK+L +KF +M ++P W++ 
Sbjct: 121 ELIPEVKLPSAVSQLVNLKELRVYHSSLVVDHPALAFLEENLKILRLKFTEKGKIPRHVF 180 

Query: 417 GLRNLEELYLVGSLSHDISRNVTLESLRDLKSLKILSIKSNVSKIPQAWDVSSHLQKMC 476 

L+NL+ELYL G + + + LE +DLK+L+ L +KS++S+IPQ V D+ LQK+ 
Sbjct: 181 HLKNLKELYLSGCVLPEQLSTMQLEGFQDLKNLRTLYLKSSLSRIPQVVTDLLPSLQKLS 240 

Query: 477 IHNDGTKLVMLNNLKKMTNLTELELVHCDLERIPHAVFSLLSLQELDLKENNLKSIEEIV 536 

+ N+G+KLV+LNNLKKM NL LEL+ CDLERIPH++FSL +L ELDL+ENNLK++EEI+ 
Sbjct: 241 LDNEGSKLVVLNNLKKMVNLKSLELISCDLERIPHSIFSLNNLHELDLRENNLKTVEEII 300 

Query: 537 SFQHLRKLTVLKLWHNSITYIPEHIKKLTSLERLSFSHNKIEVLPSHLFLCNKIRYLDLS 596 

SFQHL+ L+ LKLWHN+I YIP I L++LE+LS HN IE LP LFLC K+ YLDLS 
Sbjct: 301 SFQHLQNLSCLKLWHNNIAYIPAQIGALSNLEQLSLDHNNIENLPLQLFLCTKLHYLDLS 360 

Query: 597 YNDIRFIPPEIGVLQSLQYFSITCNKVESLPDELYFCKKLKTLKIGKNSLSVLSPKIGNL 656 

YN + FIP EI L +LQYF++T N +E LPD L+ CKKL+ L +GKNSL LSP +G L 
Sbjct: 361 YNHLTFIPEEIQYLSNLQYFAVTNNNIEMLPDGLFQCKKLQCLLLGKNSLMNLSPHVGEI. 420 

Query: 657 LFLSYLDVKGNHFEILPPELGDCRALKRAGLVVEDALFETLPSDVREQMKT 707 

L++L++ GN+ E LPPEL C++LKR L+VE+ L TLP V E+++T 
Sbjct: 421 SNLTHLELIGNYLETLPPELEGCQSLKRNCLIVEENLLNTLPLPVTERLQT 471 


Pedant information for DKFZphutel_19jll, frame 1 


Report for DKFZphutel_19jll .1 


t LENGTH 1 

(MW) 

tpl] 

(HOMOLl 

le-149 

(FUNCATl 

(FUNCATJ 

[FUNCATJ 

[FUNCAT) 

YJL005WJ 3e-17 

( FUNCAT 1 

I FUNCAT J 

[FUNCAT] 

palmitylation, 

[FUNCATJ 

( FUNCAT] 

9e-08 

[FUNCAT) 

9e-08 

[FUNCAT] 

(BLOCKS] 

[BLOCKS] 

[EC] 

[EC] 

[PIRKW] 

[PIRKW] 

(PIRKW) 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

(PIRKW) 

[PIRKW] 

[PIRKW J 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 


708 

81812.82 
7.55 

TREMBL:HSD984_1 gene: ••KIAA0231"; Human mRNA for KIAA0231 gene, partial cds. 

30.02 organization of plasma membrane (S. cerevisiae, YJLOOSw] 3e-17 

03.22 cell cycle control and mitosis [S. cerevisiae, YJL005w] 3e-17 

10.04.03 second messenger formation (S. cerevisiae, YJLOOSw] 3e-17 
01.03.10 metabolism of cyclic and unusual nucleotides [S. cerevisiae, 

03.10 sporulation and germination (S- cerevisiae, YJLOOSw] 3e-17 
30.10 nuclear organization [s. cerevisiae, YKL193cJ 3e-09 
06.07 protein modification (glycolsylation, acyiation, myristylation, 
farnesylation and processing) (s. cerevisiae, YKL193c] 3e-09 

04.05.01.04 transcriptional control (S. cerevisiae, YAL021cl 9e-08 

01.05.04 regulation of carbohydrate utilization [S. cerevisiae, YAL021c] 


01.01.04 regulation of amino-acid metabolism 


[S. cerevisiae, YAL02lc) 


IS. cerevisiae, YOR353c] 3e-07 


99 unclassified proteins 
BL00868F 

BL00985B Spermadhesins family proteins 
3.4,17.3 Lysine carboxypeptidase le-08 
4.6.1.1 Adenylate cyclase 3e-18 
blocked amino end le-10 
phosphotransferase le-09 
nucleus 6e-08 
duplication 3e-18 
platelet le-10 
tandem repeat 7€-16 
keratan sulfate 7e-07 
metallo-carboxypeptidase le-08 
transmembrane protein le-10 

serine/threonine-specific protein Icinase le-09 

autophosphorylation le-09 

cartilage 7e-07 

connective tissue 7e-07 

magnesium le-09 

CAMP biosynthesis 3e-18 

ATP le-09 

receptor le-09 

leucine zipper 3e-13 

glycoprotein 5e-12 

extracellular matrix 7e-07 

chondroitin sulfate proteoglycan 7e-07 

cell adhesion le-08 

hydrolase le-08 

sulfoprotein 7e-07 

membrane protein le-08 

phosphorus-oxygen lyase 3e-18 
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(PIRKW) collagen binding ^e-O? 

(SUPFAMj leucine-rich alpha-2-glycoprotein repeat homology 3e-21 

[SUPFAMI chaoptin le-08 

(SUPFAM) gelsolin repeat homology 3e-21 

[SUPFAMI protein kinase homology le-09 

(SUPFAMI protein kinase Xa21 le-09 

[SUPFAMJ fibroraodulin 4e-12 

(SUPFAM) yeast adenylate cyclase catalytic domain homology 3e-18 

(SUPFAMI yeast adenylate cyclase 3e-18 

[KW] TRANSMEMBRANE 3 

[KW) LOW_COMPLEXITY 1.41 % 


SEQ MKGLKTDLDLQQVSFINQMCYERALHWYAKYFFYLVLIHTLVFMIiCSNFWFKFPGSSSKI 

SEG 

PRO ccccchhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhhccceeeeccccccee 

MEM MMMMMMMMMMMMMMMMM 

SEQ EHFISILGKCFDSPWTTRALSEVSGEDSEEKDNRKNNMNRSNTIQSGPEGSLVNSQSLKS 

SEG 

PRD eeeeeeeecccccccceeeeecccccccccccccccccccccccccccccceeeeccccc 


MEM 


SEQ I PEKFVVDKSTAGALDKKEGEQAKALFEKVKKFRLHVEEGDI L YAMYVRQTVLKVI KFLI 

SEG 

PRD cccceeecccccccccchhhhhhhhhhhhhhhhhhhhcccceeeehhhhhhhhhhhhhhh 

MEM MMMMMMMMM 

SEQ I lAYNSALVSKVQFTVDCNVDIQDMTGYKNFSCNHTMAHLFSKLSFCYLCFVSI YGLTCL 

SEG 

PRD hhhhcchhhhheeeeeccccccccccccccccccchhhhhhhhheeeeeeeeeeccceee 

MEM MMMMMMMM MMMMMMMMMMMMMMMMM 

SEQ YTLYWLFYRSLREYSFEYVRQETGIDDIPDVKNDFAFMLHMIDQYDPLYSKRFAVFLSEV 

SEG , 

PRD hhhhhhhhhhhhhhhhhhhhhhccccccccccchhhhhhhhhhcccchhhhhhhhhhhhh 

MEM 

SEQ SENKLKQLNLNNEWTPDKLRQKLQTNAHNRLELPLIMLSGLPDTVFEITELQSLKLEIIK 

SEG ..xxxxxxxxxx 

PRD hhhhhhhhhccccccccchhhhhhhhhhhhhhhhhhhcccccccchhhhhhhhhhhhhhh 

MEM 

SEQ NVMI PATI AQLDMLQELSLHQCSVKIHSAALSFLKENLKVLSVKFDDMRELPPWMYGLRN 

SEG 

PRD hccccccchhhhhhhhhhhhccccccccccccchhhhhhhhhhccccccccccccchhhh 

MEM 

SEQ LEELYLVGSLSHDISRNVTLESLRDLKSLKILSIKSNVSKIPQAVVDVSSHLQKMCIHND 

SEG 

PRD hhhhhhccccccccccccccchhhhhhhhhhhhcccccccccccchhhhhhhhhhhcccc 

MEM 

SEQ GTKLVMLNNLKKMTNLTELELVHCDLERI PHAVFSLLSLQELDLKENNLKSIEEIVSFQH 

SEG 

PRD ceeeecccccccchhhhhhhhhccccccccccchhhhhhhhhhhccccccccccccccch 

MEM 

SEQ LRKLTVLKLWHNSITYIPEHIKKLTSLERLSFSHNKIEVLPSHLFLCNKIRYLDLSYNDI 

SEG 

PRD hhhhhhhcccccceeecccccchhhhhheeeccccceeecccccchhhhhhhhhhccccc 

MEM 

SEQ RFIPPEIGVLQSLQYFSITCNKVESLPDELYFCKKLKTLKIGKNSLSVLSPKIGNLLFLS 

SEG 

PRD cccccccchhhhhhhhhhhccccccccccccchhhhhcccccccceeecccccccchhhh 

MEM 

SEQ YLDVKGNHFEILPPELGDCRALKRAGLWEDALFETLPSDVREQMKTE 

SEG 

PRD hhhccccccccccccchhhhhhhhheeeeccccccccccccccccccc 

MEM 


(No Prosite data available for DKFZphutel_19jll .1) 
(No Pfam data available for DKFZphutel_19jll . 1) 
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DKFZphutel_li2 


group: transcription factor 

DKFZphutel_li2 encodes a novel 594 amino acid protein similar to signal transducing proteins. 

The protein contains 2 WD-40 repeats, which is typical for the beta-transducin subunit of G- 
proteins. In addition, the protein contains a C3HC4 zinc finger and a leucine zipper. The beta 
subunits seem to be required for the replacement of GDP by GTP as well as for membrane 
anchoring and receptor recognition. Due to the zinc finger the novel protein seems to be a new 
molecule involved in signal transduction and transcription . 

The new protein can find application in modulating /bloc king gene expression of genes 
controlled by this molecule. 


similarity to Dictosteliura myosin heavy chain kinase 

complete cDNA, complete cds, EST hits 

[PFAMl Zinc finger, C3HC4 type (RING finger) 

{PFAMl WD domain, G-beta repeats 

ISCOP] dltbgc_ 2.46.3.1.1 betal-subunit of the 

signal-transducing G protei 3e-07 

Sequenced by BMFZ 

Locus: /map="16pl3 .3" 

Insert length: 3584 bp 

PcXy A stretch at pos. 3555, polyadenylation signal at pos. 3537 


1 GGGCGGGAGG TGCTTCCCAA GGACCGTAGA TGCCTCTCTA GAGCATGAGC 
51 TCAGGCAAGA GTGCCCGCTA CAACCGCTTC TCCGGGGGGC CCAGCAATCT 
101 TCCCACCCCA GACGTCACCA CAGGGACCAG AATGGAAACG ACCTTCGGAC 
151 CCGCCTTTTC AGCCGTCACC ACCATCACAA AAGCTGACGG GACCAGCACC 
201 TACAAGCAGC ACTGCAGGAC AGCATGCCCC CCATCAGCAC TCCCCGCCGC 
251 TCCGACTCCG CCATCTCTGT CCGCTCCCTG CACTCAGAGT CCAGCATGTC 
301 TCTGCGCTCC ACATTCTCAC TGCCCGAGGA GGAGGAGGAG CCGGAGCCAC 
351 TGGTGTTTGC GGAGCAGCCC TCGGTGAAGC TGTGCTGTCA GCTCTGCTGC 
401 AGCGTCTTCA AAGACCCCGT GATCACCACG TGTGGGCACA CGTTCTGTAG 
4 51 GAGATGCGCC TTGAAGTCAG AGAAGTGTCC CGTGGACAAC GTCAAACTGA 
501 CCGTGGTGGT GAACAACATC GCGGTGGCCG AGCAGATCGG GGAGCTCTTC 
551 ATCCACTGCC GGCACGGCTG CCGGGTAGCG GGCAGCGGGA AGCCCCCCAT 
601 CTTTGAGGTG GACCCCCGAG GGTGCCCCTT CACCATCAAG CTCAGCGCCC 
651 GGAAGGACCA CGAGGGCAGC TGTGACTACA GGCCTGTGCG GTGTCCCAAC 
701 AACCCCAGCT GCCCCCCGCT GCTCAGGATG AACCTGGAGG CCCACCTCAA 
751 GGAGTGCGAG CACATCAAAT GCCCCCACTC CAAGTACGGG TGCACGTTCA 
801 TCGGGAACCA GGACACTTAC GAGACCCACC TGGAGACTTG CCGCTTCGAG 
851 GGCCTGAAGG AGTTTCTGCA GCAGACGGAT GACCGCTTCC ACGAGATGCA 
901 CGTGGCTCTG GCCCAGAAGG ACCAGGAGAT CGCCTTCCTG CGCTCCATGC 
951 TGGGAAAGCT CTCGGAGAAG ATCGACCAGC TAGAGAAGAG CCTGGAGCTC 
1001 AAGTTTGACG TCCTGGACGA AAACCAGAGC AAGCTCAGCG AGGACCTCAT 
1051 GGAGTTCCGG CGGGACGCAT CCATGTTAAA TGACGAGCTG TCCCACATCA 
1101 ACGCGCGGCT GAACATGGGC ATCCTAGGCT CCTACGACCC TCAGCAGATC 
1151 TTCAAGTGCA AAGGGACCTT TGTGGGCCAC CAGGGCCCTG TGTGGTGTCT 
1201 CTGCGTCTAC TCCATGGGTG ACCTGCTCTT CAGTGGCTCC TCTGACAAGA 
1251 CCATCAAGGT GTGGGACACA TGTACCACCT ACAAGTGTCA GAAGACACTG 
1301 GAGGGCCATG ATGGCATCGT GCTGGCTCTC TGCATCCAGG GGTGCAAACT 
1351 CTACAGCGGC TCTGCAGACT GCACCATCAT TGTGTGGGAC ATCCAGAACC 
14 01 TGCAGAAGGT GAACACCATC CGGGCCCATG ACAACCCGGT GTGCACGCTG 
14 51 GTCTCCTCAC ACAACGTGCT CTTCAGCGGC TCCCTGAAGG CCATCAAGGT 
1501 CTGGGACATC GTGGGCACTG AGCTGAAGTT GAAGAAGGAG CTCACAGGCC 
1551 TCAACCACTG GGTGCGGGCC CTGGTGGCTG CCCAGAGCTA CCTGTACAGC 
1601 GGCTCCTACC AGACAATCAA GATCTGGGAC ATCCGAACCC TTGACTGCAT 
1651 CCACGTCCTG CAGACGTCTG GTGGCAGCGT CTACTCCATT GCTGTGACAA 
1701 ATCACCACAT TGTCTGTGGC ACCTACGAGA ACCTCATCCA CGTGTGGGAC 
1751 ATTGAGTCCA AGGAGCAGGT GCGGACCCTC ACGGGCCACG TGGGCACCGT 
1801 GTATGCCCTG GCGGTCATCT CGACGCCAGA CCAGACCAAA GTCTTCAGTG 
1851 CATCCTACGA CCGGTCCCTC AGGGTCTGGA GTATGGACAA CATGATCTGC 
1901 ACGCAGACCC TGCTGCGTCA CCAGGGCAGT GTCACCGCGC TGGCTGTGTC 
1951 CCGGGGCCGA CTCTTCTCAG GGGCTGTGGA TAGCACTGTG AAGGTTTGGA 
2001 CTTGCTAACA GGATCCAGGC CAGGCTGTGG TTTCCCCTGA ACCAGCCCTG 
2051 GACCTTTCTG AGCCAGGCTG GCCACATGGG GTGGTCTCGG GGTTTCTGCC 
2101 TGCCCCGTGG GCATAGGTGG ACAGGCTCTG GCAGCCGGGC AGTGCCCTCC 
2151 CCGTCCCATG CTCGGCGAGC CTCCCTCTAC TCGGCACTGT CCTTGCTGCC 
2201 CAGCCCCTCT CTGGGTGCCA GGTACGACGC TTGCCCCGGC CCACCCTCCA 
2251 TCCCCACCCT CCATCCCCAC CCTAGATGGA GCGAGGGCCT TTTTACTCAC 
2301 CTTTTCTACC GTTTTTAGAC TGTATGTAGA TTTGGTTACC TCCTGGTTGA 
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2351 AATAAATGCT CCACAGACTG TGGCTGTGAG TGGGGACAGC TCCTCGGGAC 
24 01 AAGGGGGCTG TGTGTGGCCT TGAGGTTGGT GTGCACAGGC ACTGGCTGCT 
2451 GTGAGTGGGG GGGCATGGGG CAGTTTCCTT TGGTGGACCC CAGGACTTCG 
2501 GCCCACTCCG GGGCCTCCCC TCCCTGCTAG GAGGCAACTC GTCACACCCA 
2551 AGCTGCTGGC CTCCAGTCCC ATCTCCCCCA ACACATGTGC CCCCAAAAAG 
2601 TGAGCCAGGC ACCTCTGTTT CCTGCTGTTT ATTGACAGCC GACGGCAGCG 
2651 CCTTGCCCAG ACCTCCCCTG CCCACCTGCT GGAGCCCAGC CTGTGCCGCC 
2701 CTCTGAGGAG AGGCCTGGGG GGACAGCTGG GCACGTCCAC TCGCAGGGAA 
2751 ACACGGGGTG AGACAGCAGG AAGGGGCCCT GCACGCCGGG ACGCCACCTC 
2801 CGCCAGCCGC CTCCACCCGC CCCACACCAC AATCGCTGGT TTTCGGCATT 
2851 TTTTAAATTT TTTTTTTAAG AAACGTCAAA GTTGTGCCCA ACACTGTGGA 
2901 TCAGCAAACA CGATAGAGGA GACCAGTCAG TACTTCTTGG AGGGGGCAGG 
2951 AGGAGAGAGG AAAAGGGAGG GCGAGAATGA CCACACAACA CAGCCTTGGA 
3001 CCATGAGCAG AAGCGTCCGT GGGAACTCCA CTGGGGTGGA TGGGCTGCCT 
3051 GCACAGCCCC TGGAGAGGGG GCCAGGCACA CCCTCAGAGG AGCTGCAAGC 
3101 CCGTGGCCTG GCCTGCTACA TGCCCTGCTT CCACGTGGCT GCCACGCTGA 
3151 CACACCCACA TTCACCAAAC CCACCCGCGC CCTGGGACGC AGCCACGCCA 
3201 GGAGGAGGAC ACGGCCGCCG AGAGCAAGGC ACAACCTCGA GTTCTTGGGG 
3251 CGCAGAGAAC TTAGGAGAGA AGCACGGAGG AGCCCCCGGC AGAGCACCCG 
3301 CCCCCGGGCC CCAGCCTTCC ACCTGTGCTA GCAGCCTGGG GCCTCCACTC 
3351 TGGCCGGAGG AAGGACCGCA GGCAGACAGC CTGGGCCTCT AACAGCTTTT 
3401 GTCCGGAGCT AGACTTCGTG TCCTTTCAGT TGGTAAATGG TTTTCTATAG 
3451 AATCAATAAT ATTTCTTTCT TTAAATATAT ATTTGTTAAA GTTATACCTT 
3501 TTTGTTTCTC TGGGGAAATC CGCCTCAGCT CATTCCCAAT AAATTAATAC 
3551 TCTTGATAAA AAAAAAAAAA AGAAAAAAAA AAAA 


BLAST Results 


Entry HSBE from database EMBL: 

Homo sapiens (clone exon trap dS) chromosome 16pl3.3 gene, exon. 
Score « 2375, P = 7.1e-101, identities = 475/475 

Entry HSBD from database EMBL: 

Homo sapiens (clone exon trap d32) chromosome 16pl3.3 gene, exon. 
Score = 876, P 3.0e-3l, identities = 176/177 


Medline entries 


95122486: 

Structural analysis of myosin heavy chain kinase A from 
Dictyostelium. Evidence for a highly divergent protein kinase 
domain, an amino-terminal coiled-coil domain, and a 
domain homologous to the beta-subxinit of heterotrimeric G 
proteins . 

96149460: 

Dictyostelium myosin heavy chain kinase A regulates myosin localization 
during growth and 
development . 

97277316: 

Identification of a protein kinase from Dictyostelium with homology to 
the novel catalytic 

domain of myosin heavy chain kinase A. 

96009891: 

A gene responsible for vegetative incompatibility in the fungus 
Podospora anserina encodes a 

protein with a GTP-binding motif and G beta homologous domain. 


Peptide information for frame 2 


ORE from 224 bp to 2005 bp; peptide length: 594 
Category: similarity to known protein 
Prosite motifs: ZINC_FINGER_C3HC4 (70-80) 
LE0CINE_2IPPER (436-458) 
LEUCINE ZIPPER (436-458) 
G_BETA_REPEATS (335-355) 
G BETA REPEATS (376-391) 
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1 MPPISTPRRS DSAISVRSLH SESSMSLRST FSLPEEEEEP EPLVFAEQPS 

51 VKLCCQLCCS VFKDPVITTC GHTFCRRCAL KSEKCPVDNV KLTWVNNIA 

101 VAEQIGELFI HCRHGCRVAG SGKPPIFEVD PRGCPFTIKL SARKDHEGSC 

151 DYRPVRCPNN PSCPPLLRMN LEAHLKECEH IKCPHSKYGC TFIGNQDTYE 

201 THLETCRFEG LKEFLQQTDD RFHEMHVALA QKDQEIAFLR SMLGKLSEKI 

251 DQLEKSLELK FDVLDENQSK LSEDLMEFRR DASMLNDELS HINARLNMGI 

301 LGSYDPCX3IF KCKGTFVGHO GPVWCLCVYS MGDLLFSGSS DKTIKVWDTC 

351 TTYKCQKTLE GHDGIVLALC IQGCKLYSGS ADCTIIVWDI QNLQKVNTIR 

401 AHDNPVCTLV SSHNVLFSGS LKAIKVWDIV GTELKLKKEL TGLNHWVRAL 

4 51 VAAQSYLYSG SYQTIKIWDI RTLDCIHVLQ TSGGSVYSIA VTNHHIVCGT 

501 YENLIHVWDI ESKEQVRTLT GHVGTVYALA VISTPDQTKV FSASYDRSLR 

551 VWSMDNMICT QTLLRHQGSV TALAVSRGRL FSGAVDSTVK VWTC 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_li2 , frame 2 

SWISSPROT:KMHB_DICDI MYOSIN HEAVY CHAIN KINASE B (EC 2,7.1.129) (MHCK 
B)., N = 1, Score = 419, P = 3.6e-37 

SWISSPROT:HET1_PODAN VEGETATIBLE INCOMPATIBILITY PROTEIN HET-E-I., N = 
1, Score =392, P « 3.1e-33 

SWISSPR0T:YDJ5_SCHP0 HYPOTHETICAL 67.1 KD TRP-ASP REPEATS CONTAINING 
PROTEIN C57A10.05C IN CHROMOSOME I., N - 1, Score = 357, P = 4 . le-30 

TREMBL:AF032878_1 gene: "slimb"; product: "Slimb"; Drosophila 
melanogaster Slimb (slimb) mRNA, complete cds., N = 1, Score ^ 347, P = 
1.7e-29 

>SWISSPROT:KMHB_DICDI MYOSIN HEAVY CHAIN KINASE B (EC 2.7.1.129) (MHCK B) . 
Length = 732 

HSPs: 

Score = 419 (62.9 bits). Expect - 3.6e-37, p = 3.6e-37 
Identities = 96/268 (35%), Positives = 158/268 (58%) 

Query: 325 CLCVYSMGDLLFSGSSDKTIKVWD-TCTTYKCQKTLEGHDGIVLALCIQGCKLYSGSADC 383 

C+C +LLF+G SD +I+V+D +C +TL+GH+G V ++C L+SGS+D 

Sbjct: 467 CIC DNLLFTGCSDNSIRVYDYKSQNMECVQTLKGHEGPVESICYNDQYLFSGSSDH 522 

Query: 384 TIIVWDIQNLQKVKTIRAHDNPVCTLVSSHNVLFSGSL-KAIKVWDIVGTELKLKKELTG 442 

+1 VWD++ L+ + T+ HD PV T++ + LFSGS K IKVWD+ L+ K L 
Sbjct: 523 SIKVWDLKKLRCIFTLEGHDKPVHTVLLNDKYLFSGSSDKTIKVWDL— -KTLECKYTLES 580 

Query: 443 LNHWVRALVAAQSYLYSGSY-QTIKIWDIRTLDCIHVLQTSGGSVYSIAVTNHHIVCGTY 501 
V+ L + YL+SGS +TIK+WD++T C + L+ V +1 + ++ G+Y 

Sbjct: 581 HARAVKTLCISGQYLFSGSNDKTIKVWDLKTFRCNYTLKGHTKWVTTICILGTNLYSGSY 640 

Query: 502 ENLIHVWDIESKEQVRTLTGHVGTVYALAVISTPDQTKVFSASYDRSLRVWSMDNMICTQ 561 

+ I VW+++S E TL GH V + + D+ +F+AS D ++++W ++ + C 
Sbjct: 641 DKTIRVWNLKSLECSATLRGHDRWVEHMVIC— DKL-LFTASDDNTIKIWDLETLRCNT 696 

Query: 562 TLLRHQGSVTALAVSRGR— LFSGAVDSTVKVW 592 

TL H +V LAV + + S + D +++VW 
Sbjct: 697 TLEGHNATVQCLAVWEDKKCVISCSHDQSIRVW 729 

Score » 415 (62.3 bits). Expect - 1.2e-36, P - 1.2e-36 
Identities « 113/303 (37%), Positives «= 166/303 (54%) 

Query: 255 KSLEL-KFDVLDENQSKLSEDLMEFRRDASMLNDEL-SHINARLNMGILGS YD 305 

KS++L K ++L N+ K S +L + ++ + SH+ N+ G YD 
Sbjct: 427 KSIDLEKPEILINNKKKESINLETIKLIETIKGYHVTSHLCICDNLLFTGCSDNSIRVYD 486 

Query: 306 -PQQIFKCKGTFVGHQGPVWCLCVYSMGDLLFSGSSDKTIKVWDTCTTYKCQKTLEGHDG 364 

Q +C T GH+GPV +C Y+ LFSGSSD +IKVWD +C TLEGHD 

Sbjct: 487 YKSQNMECVQTLKGHEGPVESIC-YN-DQYLFSGSSDHSIKVWDL-KKLRCIFTLEGHDK 543 

Query: 365 IVLALCIQGCKLYSGSADCTIIVWDIQNLQKVNTIRAHDNPVCTLVSSHNVLFSGSL-KA 423 

V + + L+SGS+D TI VWD++ L+ T+ +H V TL S LFSGS K 
Sbjct: 544 PVHTVLLNDKYLFSGSSDKTIKVWDLKTLECKYTLESHARAVKTLCISGQYLFSGSNDKT 603 

Query: 424 IKVWDIVGTELKLKKELTGLNHWVRALVAAQSYLYSGSY-QTIKIWDIRTLDCIHVLQTS 482 

IKVWD+ + L G WV + + LYSGSY +TI++W++++L+C L+ 

Sbjct: 604 IKVWDL— KTFRCNYTLKGHTKWVTTICILGTNLYSGSYDKTIRVWNLKSLECSATLRGH 661 
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Query: 

483 

GGSVySIAVTNHHIVCGTYENLIHVWDIESKEQVRTLTGHVGTVYALAVISTPDQTKVFS 542 



V+ + + ^ ++NI »WDfE» TL GH TV LAV D+ V S 

Sbjct : 

662 

DRWVEHMVICDKLLFTASDDNTIKIWDLETLRCNrTLEGHNATVQCLAVWE— DKKCVIS 719 

Query: 

543 

ASYDRSLRVW 552 



S+D+S+RVW 

Sbjct: 

.720 

CSHDQSIRVW 729 

Score 

= 262 

(39.3 bits), Expect = 3.2e-19, P - 3,2e-19 

Identities = 

= 60/184 (32%), Positives = 109/184 (59%) 

Query: 

352 

TYKCQKTLEGHDGIVLALCIQGCKLYSGSADCTUVWDI— QNLQKVNTIRAHDNPVCTL 409 



T K +T++G+ + LCI L++G +D +1 V+D QN++ V T++ H+ PV ++ 

Sbjct: 

"450 TIKLIETIKGYH-VTSHLCICDNLLFTGCSDNSIRVYDYKSQNMECVQTLKGHEGPVESI 508 

Query: 

410 

VSSHNVLFSGSLK-AIKVWDIVGTELKLKKELTGLNHWVRALVAAQSYLYSGSY-QTIKI 467 



+ LFSGS +IKVWD+ +L+ L G + V ++ YL+SGS +TIK+ 

Sbjct: 

509 

CYNDQYLFSGSSDHSIKVWDL— KKLRCIFTLEGHDKPVHTVLLNDKYLFSGSSDKTIKV 566 

Query: 

468 

WDIRTLDCIHVLQTSGGSVYSIAVTNHHIVCGTYENLIHVWDIESKEQVRTLTGHVGTVy 527 



WIH+TL+C + L++ +V ++ ++ ++ G+ + I VWD+++ TL GH V 

Sbjct: 

567 

WDLKTLECKYTLESHARAVKTLCISGQYLFSGSNDKTIKVWDLKTFRCNYTLKGHTKWVT 626 

Query: 

528 

ALAVIST 534 



+ ++ T 

Sbjct: 

627 

TICILGT 633 

Score 

= 173 

(26.0 bits). Expect = 1.7e-09, P = l,7e-09 

Identities ^ 43/118 (36%), Positives = 65/liB (55%) 

Query: 

310 

FKCKGT FVGHQG PVWCLC V YSMG DLLFSGS S DKT I KVWDTCTT YKCQKTLEGH DG I VLAL 369 



F+C T GH V +C+ +G L+SGS DKTI+VW+ + +C TL GHD V + 

Sbjct: 

612 

FRCNYTLKGHTKWVTTICI — LGTNLYSGSYDKTIRVWNL-KSLECSATLRGHDRWVEHM 668 


Query: 370 CIQGCKLYSGSADCTIIVWDIQNLQKVNTIRAHDNPV-CTLVSSHN— VLFSGSLKAIKV 426 

I L++ S D TI +WD++ L+ T+ H+ V C V V+ ++I+V 

Sbjct: 669 VICDKLLFTASDDNTIKIWDLETLRCNTTLEGHNATVQCLAVWEDKKCVISCSHDQSIRV 728 

Query: 427 W 427 

Sbjct: 729 W 729 


Pedant information for 0KFZphutel_li2, frame 2 


Report for DKFZphutel_li2 .2 


[LENGTH] 
(MWJ 
IpU 
(HOMOL] 

(FUNCAT) 

[FUNCAT) 

(FUNCAT) 

(FUNCAT J 

(FUNCAT J 

5e-21 

(FUNCAT) 

2e-15 

(FUNCAT] 

(FUNCAT] 

le-14 

(FUNCAT] 

(FUNCAT] 

(FUNCAT] 

YDL145cl le-13 

(FUNCAT] 

le-13 

[FUNCAT) 

[FUNCAT] 

(FUNCAT] 

TAF90 - TFIID 

(FUHCAT] 

(FUNCAT) 

(FUNCAT) 

(FUNCAT) 

(FUNCAT] 

YMR116C] 5e-07 
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66541.94 
6.64 

SWISSPROT:KMHB_DICDI MYOSIN HEAVY CHAIN KINASE B (EC 2.7.1.129) (MHCK B) . 


3e- 


03.22 cell cycle control and mitosis [S. cerevisiae, YIL046wJ 5e-21 

06.13.01 cytoplasmic degradation [s. cerevisiae, YlL046w] 5e-21 

04.05.01.04 transcriptional control [S. cerevisiae, YIL046wi 5e-21 

30.10 nuclear organization (S. cerevisiae, YIL046w) 5e-21 

01.01.04 regulation of amino-acid metabolism (S. cerevisiae, YIL046w] 


99 unclassified proteins 


(S. cerevisiae, YCR072c beta-transducin family) 


30.04 organization of cytoskeleton (S. cerevisiae, yFL009w) le-14 

03.04 budding, cell polarity and filament formation (S. cerevisiae, YFL009w) 

03.10 sporulation and germination [S. cerevisiae, YFLOOSwJ le-14 
03.16 dna syntliesis and replication (S. cerevisiae, YFL009w) le-14 
30.09 organization of intracellular transport vesicles {S. cerevisiae. 


08.07 vesicular transport (golgi networic, etc.) 


(S. cerevisiae, YDL145c] 


04.05.03 rarna processing (splicing) [S. cerevisiae, YPR178wJ 2e-ll 
06.10 assembly of protein complexes (S. cerevisiae, YPR178wl 2e-ll 
04.05.01.01 general transcription activities [S, cerevisiae, YBR198c 

subunit) 3e-ll 
03.13 meiosis [S. cerevisiae, YLR129w) 8e-09 

30.03 organization of cytoplasm (S. cerevisiae, yCR057c) 2e-07 
03.25 cyto)cinesis [S. cerevisiae, YCR057c] 2e-07 
02.16 fermentation (S. cerevisiae, YMRll6cJ 5e-07 

05.04 translation (initiation, elongation and termination) (S. cerevisiae, 
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[FUNCAT] 
[FUNCATl 
[FUNCAT) 
(FUNCATl 
(FUNCAT) 
(FUNCAT] 


{S. 


[BLOCKS! 

(BLOCKS) 

[SCOPJ 

(EC) 

[PIRKW] 

(PIRKW) 

(PIRKW) 

(PIRKW) 

(PIRKW) 

(PIRKW) 

(PIRKW) 

(PIRKW) 

(PIRKW) 

(PIRKW) 

(PIRKW) 

(PIRKW) 

(PIRKW) 

(PIRKW] 

(PIRKW) 

(PIRKW) 

(PIRKW) 

[PIRKW] 

(PIRKW) 

(SUPFAM) 

[SUPFAM] 

(SUPFAM) 

(SOPFAM) 

[SUPFAM] 

(SUPFAM) 

(SUPFAM) 

(PROSITE) 

(PROSITE) 

[PROSITE) 

(PROSITE] 

[PROSITE] 

(PROSITE) 

(PROSITE) 

(PFAM) 

(PFAMJ 

(KW) 

(KW) 

(KW) 

(KW) 


06-13 proteolysis (S. cerevisiae, YGL003c) 3e-06 

03-01 cell growth (S. cerevisiae, YKL021c] 2e-04 

01.03.07 deoxyribonucleotide metabolism (S. cerevisiae, YOR269w] 2e-04 

30.02 organization of plasma membrane (S. cerevisiae, YOR212w] 0.001 

10.05.07 g~proteins [S. cerevisiae, yOR212wI 0.001 

03.07 pheromone response, mating-type determination, sex-specific proteins 
cerevisiae, yOR212w) 0.001 
BL00678 

BL00518 Zinc finger, C3HC4 type, proteins 

dltbgd_ 2.46-3.1-1 betal-subunit of the signal-transducing 3e-10 

2-7.1.129 Myosin-heavy-chain Icinase 3e-26 

phosphotransferase 3e-26 

nucleus le-06 

plasma 9e-08 

duplication 3e-25 

hormone 9e-08 

zinc 3e-09 

cell cycle control 4e-13 
transmembrane protein 3e-12 
zinc finger le-08 
stomach 9e-08 
DNA binding 9e-66 
autophosphorylation 3e-26 
phosphoprotein 3e-26 
signal transduction 5e-08 
heterotrimer 5e-08 
coiled coil 3e-26 
multimer 3e-26 

transcription regulation 4e-10 

GTP binding 5e-08 

chromobox homology 9e-06 

RING finger homology 3e-09 

coatomer complex beta' chain le-07 

WD repeat homology 3e~26 

yeast coatomer complex alpha chain 3e~12 

GTP-binding regulatory protein beta chain 5e-0d 

PRLl protein 2e-09 

WD_RE PEATS 2 

LEUCINE__ZIPPER 1 

MYRISTYL 14 

CK2_PH0SPH0_SITE 4 

ZINC_FINGER__C3HC4 1 

PKC_PHOSPHO_SITE 18 

ASN_GLYC0SYLATION 1 

Zinc finger, C3HC4 type (RING finger) 

ND domain, G-beta repeats 

Irregular 

3D 

LOW_C0MPLEXITY 6.23 % 

COILED COIL 6.73 % 


SEQ HPPISTPRRSDSAISVRSLHSESSMSLRSTFSLPEEEEEPEPLVFAEQPSVKLCCQLCCS 

SEG xxxxxxxxxxxxxxx .... xxxxxxxxx 

COILS 

lgg2B 

SEQ VFKDPV ITTCGHTFCRRCALKSEKC PVDNVKLTVWNNI AVAEQIGELFIHCRHGCRVAG 

SEG 

COILS ; 

lgg2B 

SEQ SGKPPIFEVDPRGCPFTIKLSARKDHEGSCDYRPVRCPNNPSCPPLLRMNLEAHLKECEH 

SEG 

COILS 

lgg2B 

SEQ IKCPHSKYGCTFIGNQDTYETHLETCRFEGLKEFLQQTDDRFHEMHVALAQKDQEIAFLR 

SEG 

COILS cccccccccccccc 

lgg2B 

SEQ SMLGKLSEKIDQLEK5LELKFDVLDENQSKLSEDLMEFRRDASMLN0ELSHINARLNMGI 

SEG 

COILS cccccccccccccccccccccccccc 

lgg2B 

SEQ LGSYDPQQI FKCKGTFVGHQGPVWCLCVYSMGDLLFSGSSDKTI KVWDTCTTYKCQKTLE 

SEG 

COILS 

lgg2B EECCCCCCEEEEEETTTTCEEEEEETTTEEEEEEG-GGCEEEEEEE 
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SEQ 
SEG 
COILS 
19928 

SEQ 
SEG 
COILS 
lgg2B 

SEQ 
SEG 
COILS 
19928 

SEQ 
SEG 
COILS 
lgg2B 


GHDGIVLALCIQGCKLySGSADCTIIVWDIQNLQKVNTIRAHDNPVCTLVSSHNVLFSGS 


CCCCCEEEEEETTCEEEEEETTTCEEEEETTTTEEEEEE-CTTTTCCCEEE. 


LKAIKVWDIVGTELKLKKELTGLNHWVRALVAAQSYLYSGSYQTIKIWDIRTLDCIHVLQ 
xxxxxxxxxxxxx 


TSGGSVYSIAVTNHHIVCGTYENLIHVWDIESKEQVRTLTGHVGTVYALAVISTPDQTKV 


FSASYDRSLRVWSMDNMICTQTLLRHQGSVTALAVSRGRLFSGAVDSTVKVWTC 


Prosite for DKFZphutel_li2.2 


Tic? Art ^ A 1 

267 

->271 

PSOOOOS 


6->9 


15-M8 

f^uty' w yJ J 

26->29 

PSOOOOS 

50->53 

PSOOOOS 

82->85 

PSOOOOS 

121 

->124 

PSOOOOS 

137 

->140 

PSOOOOS 

141 

->144 

PSOOOOS 

205 

->208 

PSOOOOS 

247 

->250 

PSOOOOS 

340- 

->343 

PSOOOOS 

34 3- 

->346 

PSOOOOS 

352- 

->355 

PSOOOOS 

398- 

->401 

PSOOOOS 

420- 

->423 

PSOOOOS 

464- 

->467 

PSOOOOS 

548- 

->S51 

PSOOOOS 

588- 

->591 

PS00006 

32->36 

PS00006 

201- 

->205 

PS00006 

330- 

->334 

PS00006 

533- 

->537 

PS00008 

115- 

>>121 

PS00008 

133- 

->139 

PS00008 

194- 

->200 

PS0D008 

299- 

•>305 

PS00008 

314- 

>>320 

PSOOOOS 

364- 

->370 

PSOOOOS 

379- 

•>385 

PSOOOOS 

419- 

>425 

PSOOOOS 

460- 

>466 

PSOOOOS 

484- 

>490 

PSOOOOS 

499- 

>505 

PSOOOOS 

524- 

>530 

PSOOOOS 

568- 

>574 

PSOOOOS 

583- 

>589 

PS00518 

70 

i->80 

PS00029 

436- 

>458 

PS00678 

335- 

>350 

PS00678 

376- 

>391 


ASN_GLYCOSYLATION 

PKC_PHOSPH0 SITE 

PKC_PHOSPHO~SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPH0_SITE 

PKC_PHOSPH0_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOS PHOS I TE 

PKC_PHOSPH0_SITE 

PKCPHOSPHOSITE 

PKC_PH0SPHO_SITE 

PKC^PHOS PHO_S I TE 

PKC PHOSPHO_SITE 

PKC~PHOS PHO_S ITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOS PHO_S ITE 

PKC_PHOSPHO_SITE 

CK2_PH0SPH0_SITE 

C K2_PHOS PHO_S ITE 

CK2_PH0SPH0 SITE 

CK2_PH0SPH02SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

ZINC_FINGER_C3HC4 
LEUCINE_ZIPPER 
WD_RE PEATS 
WD REPEATS 


PDOCOOOOl 

PDOC000Q5 

PDOC00005 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

P£XX:00005 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOC00006 

PDOCOOOOS 

PDOC00006 

PDOC00006 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOCOOOOS 

PDOC00449 

PDOC00029 

PDOC00574 

PDOC00574 


Pfam for DKFZphutel_li2 .2 


HMM_NAME WD domain, G-beta repeats 

HMM *MrGHnnWVWCVaFSPDGrWFIvSGSWDgTCRLWD* 

++GH ++VWC+ + G + ■»-+SGS D■^T-^■^-^WD 

Query 316 FVGHQGPVWCLCVYSMGDL-LFSGSSDKTIKVWD 348 

22.93 519 553 1 34 dlcf 2phutel_li2 . 2 similarity to Dictostelium myosin heavy chain 

kinase 

Alignment to HMM consensus: 
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Query *MrGHnnWVWCVaF. . SPDGrWFIvSGSWDgTCRLWD* 

++GH 4-+V+++A+ +PD ++S+S D+++R+W4- 
dkfzphutel 519 LTGHVGTVYALAVISTPDQTK-VFSASYDRSLRVWS 553 


HMM_NAME Zinc finger, C3HC4 type (RING finger) 

HMM *CPICFcTFQlDyPWPFdePmMlPCgHsFCypCIrrW. .CPmC* 

C++C + F++P++++CGH+FC+ C +++ CP+ 

Query 55 CQLC CSV-— FKDPVITTCGHTFCRRCALKSEKCPVD 


88 
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DKFZphutel_20bl9 


group: metabolism 

DKFZphutei 20bl9 encodes a novel 486 amino acid protein with similarity to bacterial sarcosine 

oxidases (EC 1.5.3.1.) 

The novel protein seems to be a novel enzyme with sarcosine oxidase activity. 

The new protein can find application in modulation of sarcosine metabolism and as a new enzyme 
for biotechnologic production processes. 


similarity to sarcosine oxidases 
membrane regions : 1 

Summary DKFZphutel_20bl9 encodes a novel 486 amino acid protein, with 
similarity to sarcosine oxidases. 


similarity to sarcosine oxidases 

complete cDNA?, complete cds potential start at Bp 48, EST hits, 

Sequenced by AGOWA 

Locus: unknown 

Insert length: 1967 bp 

Poly A stretch at pos. 1950, no polyadenylation signal found 


1 AGCGAGGCAG CAGTGCAGCT TTCAGAGGGT CCGGGCTCAG AGGGGTTATG 
51 ATTCGGAGGG TTCTGCCGCA CGGCATGGGC CGGGGCCTCT TGACCCGGAG 
101 GCCAGGCACG CGCAGAGGAG GCTTTTCTCT GGACTGGGAT GGAAAGGTGT 
151 CTGAGATTAA GAAGAAGATC AAGTCGATCC TGCCTGGAAG GTCCTGTGAT 
201 CTACTGCAAG ACACCAGCCA CCTGCCTCCC GAGCACTCGG ATGTGGTGAT 
251 CGTGGGAGGT GGGGTGCTTG GCTTGTCTGT GGCCTATTGG CTGAAGAAGC 
301 TGGAGAGCAG ACGAGGTGCT ATTCGAGTGC TAGTGGTGGA ACGGGACCAC 
351 ACGTATTCAC AGGCCTCCAC TGGGCTCTCA GTAGGTGGGA TTTGTCAGCA 
401 GTTCTCATTC CCTGAGAACA TCCAGCTCTC CCTCTTTTCA GCCAGCTTTC 
451 TACGGAACAT CAATGAGTAC CTGGCCGTAG TCGATGCTCC TCCCCTGGAC 
501 CTCCGGTTCA ACCCCTCGGG CTACCTCTTG CTGGCTTCAG AAAAGGATGC 
551 TGCAGCCATG GAGAGCAACG TGAAAGTGCA GAGGCAGGAG GGAGCCAAAG 
601 TTTCTCTGAT GTCTCCTGAT CAGCTTCGGA ACAAGTTTCC CTGGATAAAC 
651 ACAGAGGGAG TGGCTTTGGC GTCTTATGGG ATGGAGGACG AAGGTTGGTT 
701 TGACCCCTGG TGTCTGCTCC AGGGGCTTCG GCGAAAGGTC CAGTCCTTGG 
751 GAGTCCTTTT CTGCCAGGGA GAGGTGACAC GTTTTGTCTC TTCATCTCAA 
801 CGCATGTTGA CCACAGATGA CAAAGCGGTG GTCTTGAAAA GGATCCATGA 
851 AGTCCATGTG AAGATGGACC GCAGCCTGGA GTACCAGCCT GTGGAATGCG 
901 CCATTGTGAT CAACGCAGCC GGAGCCTGGT CTGCGCAAAT CGCAGCACTG 
951 GCTGGTGTTG GAGAGGGGCC GCCTGGCACC CTGCAGGGCA CCAAGCTACC 
1001 TGTGGAGCCG AGGAAAAGGT ATGTGTATGT GTGGCACTGC CCCCAGGGAC 
1051 CAGGCCTAGA GACTCCGCTT GTTGCAGACA CCAGTGGAGC CTATTTTCGC 
1101 CGGGAAGGAT TAGGTAGCAA CTACCTAGGT GGTCGTAGCC CCACTGAGCA 
1151 GGAAGAACCG GACCCGGCGA ACCTGGAAGT GGACCATGAT TTCTTCCAGG 
1201 ACAAGGTGTG GCCCCATTTG GCCCTGAGGG TCCCAGCTTT TGAGACTCTG 
1251 AAGGTTCAGA GCGCCTGGGC CGGCTATTAC GACTACAACA CCTTTGACCA 
1301 GAATGGCGTG GTGGGCCCCC ACCCGCTAGT TGTCAACATG TACTTTGCTA 
1351 CTGGCTTCAG TGGTCACGGG CTCCAGCAGG CCCCTGGCAT TGGGCGAGCT 
1401 GTAGCAGAGA TGGTACTGAA GGGCAGGTTC CAGACCATCG ACCTGAGCCC 
1451 CrrcCTCTTT ACCCGCTTTT ACTTGGGAGA GAAGATCCAG GAGAACAACA 
1501 TCATCTGAGC ATGTGTGCTC TGCACTGGCT CCACTGGCTT GCATCCTGGC 
1551 TGTGTTCACA GCCTTGTTTG CTGCTTCCAT CTTCCCCAGT ACTGTGCCAG 
1601 GCCTTCTCCC CCTCCCCAGT GTCCTCTCCT CTCAGGCAGG CCATTGCACC 
1651 CATATGGCTG GGCAGGCACA GGCAGTGAGG CCGAGGCCAA TAGCGAGTGA 
1701 TGAGCGGGAT CCTAGGACTG ATCTGTAGCC CATGCTGATG TCACCCACCA 
1751 GGGCAATCCA TCTGGAGGCC TGAGCACCCT GGCCCAGGAC TGGCTTCATC 
1801 CTGGCACTGA CCAGGAAAGA CTGCCTCTGA CCCTCTTAGC AGACAGAGCC 
1851 CAGGCATGGG AGCACTCTGG GGCAGCCTGG CTCAGGTTTA TTGATTTTCG 
1901 TCTGTTTACC CTATCCATTA ATCAATACAT GTAATTAACT CCTTCCCTCC 
1951 AAAAAAAAAA AAAAAAA 


BLAST Results 


No BLAST result 
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Medline entries 


No Medline entry 


Peptide Information for frame 3 


ORF from 48 bp to 1505 bp; peptide length: 486 
Category: similarity to known protein 


1 MIRRVLPHGM GRGLLTRRPG TRRGGFSLDW DGKVSEIKKK IKSILPGRSC 

51 DLLQDTSHLP PEHSDVVIVG GGVLGLSVAY WLKKLESRRG AIRVLVVERD 

101 HTYSQASTGL SVGGICQQFS LPENIQLSLF SASFLRNINE YLAVVDAPPL 

151 DLRFNPSGYL LLASEKDAAA MESNVKVQRQ EGAKVSLMSP DQLRNKFPWI 

201 NTEGVALASY GMEDEGWFDP WCLLQGLRRK VQSLGVLFCQ GEVTRFVSSS 

251 QRMLTTDDKA VVLKRIHEVH VKMDRSLEYQ PVECAIVINA AGAWSAQIAA 

301 LAGVGEGPPG TLQGTKLPVE PRKRYVYVWH CPQGPGLETP LVADTSGAYF 

351 RREGLGSNYL GGRSPTEQEE PDPANLEVDH DFFQDKVWPH LALRVPAFET 

401 LKVQSAWAGY YDYNTFDQNG WGPHPLVVN MYFATGFSGH GLQQAPGIGR 

451 AVAEMVLKGR FQTIDLSPFL FTRFYLGEKI QENNII 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_20bl9, frame 3 

TREMBL:CEM04B2_4 gene: ••M04B2.4"; Caenorhabditis elegans cosmid M04B2, 
N = 1, Score = 801, P = 9.2e-80 

PIR:B71184 probable sarcosine oxidase - Pyrococcus horikoshii, N « 2, 
Score - 194, P « 2e-26 

PIR:B69284 sarcosine oxidase, subunit beta (soxB) homolog - 
Archaeoglobus fulgidus, N = 3, Score » 189, P = 8.2e-22 

TREMBL: AF042732_1 gene: "Bb"; product: "unknown protein"; Anopheles 
gambiae (Bb) gene, partial cds; and TU37B2 {TU37B2) and diphenol 
oxida5e*A2 (Dox-A2) genes, conqslete cds., N « 1, Score - 386, P - 
8.7e-36 

PIR:F71008 probable sarcosine oxidase - Pyrococcus horikoshii, N « 2, 
Score = 200, P = 4e-25 


>TREMBL:CEM04B2_4 gene: ••M04B2.4"; Caenorhabditis elegans cosmid M04B2 
Length « 527 

HSPs: 

Score = 801 (120.2 bits). Expect = 9.2e-80, P - 9.2e-80 
Identities = 171/433 (39%), Positives = 260/433 (60%) 

Query: 61 pehsdwivgggvlglsvaywlkklesrrgairvlwerdhtysqastglsv(;gicqqfs 120 

P +++VI+GGG+ G S A+WLK+ R +V+WE + ++++ST LS GGl QQFS 
Sbjct: 91 PYRAEIVIIGGGLSGSSTAFWLKE-RFRDEDFKWWENNDVFTKSSTMLSTGGITQQFS 149 

Query: 121 LPENIQLSLFSASFLRNINEYLAWDAPPLDLRFNPSGYLLLA-SEKDAAAMESNVKVQR 179 

+PE + +SLF+ FLR+ E+L ++D+ D+ F P+GYL LA ++++ M S KVQ 
Sbjct: 150 IPEFVDMSLFTTEFLRHAGEHLRILDSEQPDINFFPTGYLRLAKTDEEVEMMRSAWKVQI 209 

Query: 180 QEGAKVSLMSPDQLRNKFPWINTEGVALASYGMEDEGWFDPWCLLQGLRRKVQSLGVLFC 239 

+ GAKV L+S D+L ++P++N + V LAS G+E+EG D W LL +R K +LGV + 
Sbjct: 210 ERGAKVQLLSKDELTKRYPYMNVDDVLLASLGVENEGTIDTWQLLSAIREKNITLGVQYV 269 

Query: 240 QGEVTRFVSSSQRM LTTDDKAVVLKRIHEVHVKMDRS-LEYQPVECAIVI 288 

+GEV F R T D+ + +RI V V+ + +P+ +++ 

Sbjct: 270 KGEVEGFQFERHRASSEVHAFGDDATADENKLRAQRISGVLVRP<^DASARPIRAHLIV 329 

Query: 289 NAAGAWSAQIAALAGVGEGPPGTLQGTKLPVEPRKRYVYVWHCPQGPGLETPLVADTS-G 347 

NAAG W+ Q+A +AG+G+G G L +P++PRKR V+V P P 4P+DSG 
Sbjct: 330 NAAGPWAGQVAKMAGIGKGT-GLL-AVPVPIQPRKRDVFVIFAPDVPS-OLPFIIDPSTG 386 

Query: 348 AYFRREGLGSNYLGGRSPTEQEEP— DP/^LEVDHDFFQDKVWPHLALRVPAFETLKVQS 405 

+ R+ G +L GR+P->-++E+ D +NL+VD+D F K+WP L RVP F+T KV+S 
Sbjct: 387 VFCRQTDSGQTFLVGRTPSKEEDAKRDHSNLDVDYDDFYQKIWPVLVDRVPGFQTAKVKS 446 
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Query: 406 awagyydyntfixjngvvgphplvvnmyfatgfsghglqqapgigravaemvlkgrfqtid 4 65 

AW+GY D NTFD V+G HPL N++ GF G+ + RA AE + G + ++ 
Sbjct: 447 AWSGYQDINTFDDAPVIGEHPLYTNLHMMCGFGERGVMHSMAAARAYAERIFDGAYINVN 506 

Query: 4 66 LSPFLFTRFYLGEKIQE 4 82 

L F R + I E 
Sbjct: 507 LRKFDMRRIVKMDPITE 523 

Pedant information for DKFZphutel_20bl9, frame 3 


Report for DKFZphutel_20bl9.3 

[LENGTH] 4 86 

(MWJ 53811.85 

[pi] 7.66 

[HOMOLj TREMBL:CEM04B2_4 gene: "M04B2.4"; Caenorhabditis elegans cosmid M04B2 le-78 

(FUNCATJ c energy conversion [H. influenzae, HI0499J 8e-05 

[BLOCKS) BL00677A D-amino acid oxidases proteins 

(BLOCKS) BL00623A QIC oxidoreductases proteins 

[BLOCKS) BL01304A 

[EC) 1.5.99.2 Dimethylglycxne dehydrogenase 2e-07 

(PIRKW) flavoprotein 2e-07 

[PIRKW] oxidoreductase 2e>07 

(PROSITE) MYRISTYL 12 

(PROSITEJ CK2_PH0SPHO_SITE 5 

(PROSITE) GLYCOSAMINOGLYCAN 1 

[PROSITE) PKC_PHOSPHO_SITE 6 

[KW] TRANSMEMBRANE 1 

[KW] LOW_COMPLEXITY 7.00 % 

SEQ MI RRVLPHGMGRGLLTRRPGTRRGGFSLDWDGKVSEIKKKIKSILPGRSCDLLQDTSHLP 

SEG xxxxxxxxxxxxxxx xxxxxxxx 

PRO ccceeecccccceeecccccccccccccccccchhhhhhhhhhccccccceeeccccccc 
MEM 

SEQ PEHSDVVIVGGGVLGLSVAYWLKKLESRRGAIRVLWERDHTYSQASTGLSVGGICQQFS 

SEG xxxxxxxxxxx 

PRO cccceeeeeccccchhhhhhhhhhhhhhcccceeeeeeccccccccccccccccceeeec 

MEM MMMMMMMMMMMMMMMMM 


SEQ LPENIQLSLFSASFLRNINEYLAWDAPPLDLRFNPSGYLLLASEKDAAAMESNVKVQRQ 

SEG 

PRD ccchhhhhhhhhhhhhhhhhhhhhhhccccceeecccceeeehhhhhhJihhhhhhhhhhh 

MEM 


SEQ EGAKVSLMSPDQLRNKFPWINTEGVALASYGMEDEGWFDPWCLLQGLRRKVQSLGVLFCQ 

SEG 

PRD cccceeecccchhhhhhccccccccccccccccccccccccchhhhhhhhhhhheeeeec 

MEM 


SEQ GEVTRFVSSSQRMLTTDDKAWLKRIHEVHVKMDRSLEYQPVECAIVINAAGAWSAQIAA 

SEG 

PRD ceeeeecccccccccccchhhhhhhhhheeeecccccccccceeeeeeecccchhhhhhh 

MEM 


SEQ LAGVGEGP PGTLQGTKL PVEPRKRYV YVWHC PQG PGLET PLVA DTSGA Y FRREGLGS N YL 

SEG 

PRD hhccccccccccccccccccccceeeeeeecccccccccceeeccccceeeeccccccee 

MEN 


SEQ GGRSPTEQEEPDPANLEVDHDFFQDKVWPHLALRVPAFETLKVQSAWAGYYDYNTFDQNG 

SEG 

PRD ecccccccccccccccccccchhhhhhhhhhh)ih)icchhhhhhhhh)iheeeeeccccccc 

MEM 


SEQ VVGPHPLWNMYFATGFSGHGLQQAPGIGRAVAEMVLKGRFQTIDLSPFLFTRFYLGEKI 

SEG 

PRD cccccccccceeeecccccccccchhhhhlihhhhJihhhccceeeeccccccccccccccc 

MEM 

SEQ QENNII 

SEG 

PRD CCCCCC 

MEM 
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Prosite for DKFZphutel_20bl9 . 3 


IT OUUU 

438->442 

GLYCOSAMINOGLYCAN 

PDflCOOOO? 

IT OwUUUD 


F KC_PHOS rnO_ 

SITE 

Er KJkJ^ \J \J \J \J -J 

It dUUUUD 

21->24 

PKC_PnOS PnO_ 

SITE 

IT L/V^v> \J\J\Jyj J 


87~>90 


SITE 

pnnrnnnn s 


1 64— >167 

PKC_PHOS PHO_ 

SITE 

C U\J\-r \J\J\J\J J 



PKC_PHOSPHO_^ 

_SITE 




PKC_PHOS PHO_ 

_SITE 

Dr»rtf rt ft rt n 
zrUUL.UUuU3 



CK^ FHOSPHO 

SITE 

IrUUUUUUUO 



CK2 PHOSPHO" 

"site 


PS00006 

255->259 

CK2 PHOSPHO" 

"site 

PDOC00006 

PS00006 

364->368 

CK2 PHOSPHO" 

'site 

PDOC00006 

PS00006 

366->370 

CK2 PHOSPHO" 

"site 

PDOC00006 

PS00008 

9->15 

MYRISTYL 


PDOC00008 

PS00008 

20->26 

MYRISTYL 


POOC00008 

PS00008 

71->77 

MYRISTYL 


PDOC00008 

PS00008 

75->81 

MYRISTYL 


PDOC00008 

PS00008 

109->115 

MYRISTYL 


PDOC00008 

PS00008 

182->188 

MYRISTYL 


.PDOC00008 

PS00008 

204->210 

MYRISTYL 


PDOC00008 

PS00008 

235->241 

MYRISTYL 


PDOC00008 

PS00008 

292->298 

MYRISTYL 


PDOC00008 

PS00008 

310->316 

MYRISTYL 


PDOC00008 

PS00008 

3S4->360 

MYRISTYL 


PDOC00008 

PS00008 

447->453 

MYRISTYL 


PDOC00008 


(No Pfam data available for DKFZphutel_20bl9.3) 
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DKF2phucel_20g21 

group: signal transduction 

Ras is a signal transducting molecule involved in the receptor tyrosine kinase/RAS/Mao kina«» 
signalling cascade. Ras proteins bind GDP/GTP and show intrinsic^G?Pasl acttvi^f Mutations !n 
lnl\Ttl.ttT./^ " ""^ " potential of ras to transform ^ufiured ce??" 

fnhibltoi^rotein ^ ""'^'^ °' ""^^ ^''^ P"'"" '° "e a new ras 

pathCf ys^''°"^" ^PP^^^^'^i"" " modulating/blocking ras dependent signal transduction 

Ras inhibitor 

additional 1188 Bp at 5» and 1107 at 3' end in comparison to 122483 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 4137 bp 

Poly A stretch at pos . 4116, no polyadenylation signal found 

1 GGGAGAACTG AAACAGGAGA TGGTGCGGAC AGATGTCAAC CTGGAAAATG 
51 GCCTGGAACC CGCTGAAACC CACAGCATGG TAAGACACAA GGATGGTGGC 
101 TATTCCGAGG AAGAGGACGT GAAGACCTGT GCCCGGGACT CAGGCTATGA 
151 CAGCCTCTCC AACAGGCTCA GCATCTTGGA CCGGCTCCTC CACACCCACC 
201 CCATATGGCT GCAGCTGAGT CTGAGTGAGG AGGAGGCAGC AGAGGTCCTG 
251 CAGGCCCAGC CTCCGGGGAT CTTCCTGGTT CATAAATCTA CCAAGATGCA 
301 GAAGAAAGTC CTCTCCCTCC GCCTGCCCTG TGAATTTGGG GCCCCACTCA 
351 AGGAATTTGC CATAAAGGAA AGCACATACA CCTTTTCCCT GGAAGGCTCA 
401 GGAATCAGTT TCGCAGATTT ATTCCGGCTC ATTGCTTTCT ACTGCATCAG 
451 CAGGGATGTT CTACCATTTA CCTTGAAGTT GCCTTATGCC ATTTCAACAG 
501 CCAAGTCGGA GGCTCAGCTT GAAGAACTGG CCCAGATGGG ACTAAATTTC 
551 TGGAGCTCCC CAGCTGACAG CAAACCCCCG AACCTTCCAC CTCCCCATAG 
601 GCCTCTTTCC TCCGACGGTG TCTGTCCTGC CTCCCTGCGT CAGCTCTGCC 
651 TTATAAATGG AGTGCATTCT ATCAAAACCA GGACGCCTTC AGAGCTGGAG 
701 TGCAGCCAGA CCAACGGGGC CCTGTGCTTT ATTAATCCCC TTTTCTTGAA 
751 AGTGCACAGC CAGGACCTCA GTGGAGGCCT GAAACGGCCG AGCACAAGGA 
801 CTCCCAACGC GAATGGCACG GAGCGGACTC GGTCCCCCCC ACCCAGGCCC 
851 CCGCCACCCG CTATTAATAG TCTCCACACA AGCCCTCGGC TGGCCAGGAC 
901 TGAAACCCAG ACGAGCATGC CAGAAACAGT CAACCATAAC AAACATGGGA 
951 ACGTAGCTCT GCCTGGAACG AAACCAACTC CCATCCCTCC ACCCCGGCTG 
1001 AAGAAGCAGG CTTCTTTTCT GGAAGCAGAG GGCGGTGCAA AGACCTTGAG 
1051 CGGCGGCCGG CCGGGCGCAG GCCCGGAGCT GGAGCTGGGC ACAGCTGGCA 
1101 GCCCAGGTGG GGCCCCGCCT GAGGCCGCCC CGGGGGATTG CACAAGGGCC 
1151 CCGCCGCCCA GCTCTGAATC ACGGCCCCCG TGCCATGGAG GCCGGCACCG 
1201 GCTGAGCGAC ATGAGCATTT CTACTTCCTC CTCCGACTCG CTGGAGTTCG 
1251 ACCGGAGCAT GCCTCTGTTT GGCTACGAGG CGGACACCAA CAGCAGCCTG 
1301 GAGGACTACG AGGGGGAAAG TGACCAAGAG ACCATGGCGC CCCCCATCAA 
1351 GTCCAAAAAG AAAAGGAGCA GCTCCTTCGT GCTGCCCAAG CTCGTCAAGT 
1401 CCCAGCTGCA GAAGGTGAGC GGGGTGTTCA GCTCCTTCAT GACCCCGGAG 
1451 AAGCGGATGG TCCGCAGGAT CGCCGAGCTT TCCCGGGACA AATGCACCTA 
1501 CTTCGGGTGC TTAGTGCAGG ACTACGTGAG CTTCCTGCAG GAGAACAAGG 
1551 AGTGCCACGT GTCCAGCACC GACATGCTGC AGACCATCCG GCAGTTCATG 
1601 ACCCAGGTCA AGAACTATTT GTCTCAGAGC TCGGAGCTGG ACCCCCCCAT 
1651 CGAGTCGCTG ATCCCTGAAG ACCAAATAGA TGTGGTGCTG GAAAAAGCCA 
1701 TGCACAAGTG CATCTTGAAG CCCCTCAAGG GGCATGTGGA GGCCATGCTG 
1751 AAGGACTTTC ACATGGCCGA TGGCTCATGG AAGCAACTCA AGGAGAACCT 
1801 GCAGCTTGTG CGGCAGAGGA ATCCGCAGGA GCTGGGGGTC TTCGCCCCGA 
1851 CCCCTGATTT TGTGGATGTG GAGAAAATCA AAGTCAAGTT CATGACCATG 
1901 CAGAAGATGT ATTCGCCGGA AAAGAAGGTC ATGCTGCTGC TGCGGGTCTG 
1951 CAAGCTCATT TACACCGTCA TGGAGAACAA CTCAGGGAGG ATGTATGGCG 
2001 CTGATGACTT CTTGCCAGTC CTGACCTATG TCATAGCCCA GTGTGACATG 
2051 CTTGAATTGG ACACTGAAAT CGAGTACATG ATGGAGCTCC TAGACCCATC 
2101 GCTGTTACAT GGAGAAGGAG GCTATTACTT GACAAGCGCA TATGGAGCAC 
2151 TTTCTCTGAT AAAGAATTTC CAAGAAGAAC AAGCAGCGCG ACTGCTCAGC 
2201 TCAGAAACCA GAGACACCCT GAGGCAGTGG CACAAACGGA GAACCACCAA 
2251 CCGGACCATC CCCTCTGTGG ACGACTTCCA GAATTACCTC CGAGTTGCAT 
2301 TTCAGGAGGT CAACAGTGGT TGCACAGGAA AGACCCTCCT TGTGAGACCT 
2351 TACATCACCA CTGAGGATGT GTGTCAGATC TGCGCTGAGA AGTTCAAGGT 
2401 GGGGGACCCT GAGGAGTACA GCCTCTTTCT CTTCGTTGAC GAGACATGGC 
2451 AGCAGCTGGC AGAGGACACT TACCCTCAAA AAATCAAGGC GGAGCTGCAC 
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2501 AGCCGACCAC AGCCCCACAT CTTCCACTTT GTCTACAAAC GCATCAAGAA 
2551 CGATCCTTAT GGCATCATTT TCCAGAACGG GGAAGAAGAC CTCACCACCT 
2601 CCTAGAAGAC AGGCGGGACT TCCCAGTGGT GCATCCAAAG GGGAGCTGGA 
2651 AGCCTTGCCT TCCCGCTTCT ACATGCTTGA GCTTGAAAAG CAGTCACCTC 
2701 CTCGGGGACC CCTCAGTGTA GTGACTAAGC CATCCACAGG CCAACTCGGC 
2751 CAAGGGCAAC TTTAGCCACG CAAGGTAGCT GAGGTTTGTG AAACAGTAGG 
2801 ATTCTCTTTT GGCAATGGAG AATTGCATCT GATGGTTCAA GTGTCCTGAG 
2851 ATTGTTTGCT ACCTACCCCC AGTCAGGTTC TAGGTTGGCT TACAGGTATG 
2901 TATATGTGCA GAAGAAACAC TTAAGATACA AGTTCTTTTG AATTCAACAG 
2951 CAGATGCTTG CGATGCAGTG CGTCAGGTGA TTCTCACTCC TGTGGATGGC 
3001 TTCATCCCTG CCTTCCTTCC TTTCTTTTTC CTTTTTTTTT tTTTTTTTTT 
3051 TTTTTACAAA GAGCCTTCAT GTTTTTATAT ATTTCATAGA AATTTTTATA 
3101 GCAGTTGCAG GTAAACTGTC AGGATTGGTT TTAAAATATT TTTGTAACTT 
3151 TAAAATATTC TATAATTATG CATGTGATTT TAACATTTAA TATTCAAAAA 
3201 TAAATCTCTT GCTGGATTTG AGAGTATTGC ATTTTTAAAG TCTCTCTTCT 
3251 GTAACTGGAT GTTTTGGCAA CTTTGTGGGG AGAGACTGCT GGATTTCTTA 
3301 AAGCAACGTA TTCCTGACAC TGGCCACAGA ATGCCTTTGG AAATCGGATG 
3351 TACTGTTCTC TTGTTCACGT TTAGTGGTGT TTTGCTGTTT TGTTTTTTAA 
3401 ACAAATGATG CTGAGAATAA GGAGAGAAAT GAATGTAGAG AGAGGTAGAG 
3451 AGAGAAATAT GAACTCTAAC AAAGGACTGA GGAGTGCAGT CTGCTGGTTC 
3501 AGGCTCTTCA AAAGATGTAG AAAAAGAGAT AGAAGGAACC ACCTATGCTT 
3551 AAAATACTGT AAATATGCAG TGAGGTTTGG CAAAATCTAT TCCATGTGTG 
3601 ATTTGCTTGT AGAAACAATT TTGAAAGCCC CTTGAGGAAA ATAAAAATCA 
3651 AGAAGAACAC TTTTCTCCCT TTTCCATACA AATTAAAACT TAACAGCATC 
3701 AAATTATTGG GACCAGAAAC CAAGTAATGT ATAATGTGGC TTTTGTTGAG 
3751 TTAAATAAGA TGCTATATAA TGGAGAAGAA TTTGAAAATG CACAAAAAAA 
3801 TCAATCTACA TTATCAGAAC CTGCAGTGAA ATTAAACTTA TGTTAAATAA 
3851 AACCAGTTTG CAGGTGCACA AACTATGAGG GTCTTGTATC CACGTAACAC 
3901 AGGTAGTTAC TUU^CATGT TATTGTACTG TGTAAAGATG CATAGTCATC- 
3951 TCATTTGGTT GGCTTTGTAC CTTGTACCTT TTTTAGCCTT GGCTTTTGTT 
4001 GAACTAGAAC CCTCAGCACA TACTGTGTTG TACTTTTGTA AATGATTTTT 
4051 TAAATGGAAT TTTGCACATA ATACATTGTA ATACTGTATG ATAATCATGT 
4101 GTGAAAATAA TTTTTGAAAT AAAAAAAAAA AAAAAAA 


BLAST Results 


Entry 122483 from database EMBL: 
Sequence 15 from patent US 5527896. 

Length = 1829 
Plus Strand HSPs : 

Score = 9097 (1364.9 bits). Expect = 0.0, P = 0.0 
Identities = 1821/1823 (99%), Positives « 1821/1823 199%), 


Medline entries 


No Medline entry 


Peptide information for frame 2 


ORF from 20 bp to 2602 bp; peptide length: 861 
Category: known protein 

Classification: Cell signaling/communication 


1 MVRTDVNLEN GLEPAETHSM VRHKDGGYSE EEDVKTCARD SGYDSLSNRL 

51 SILDRLLHTH PIWLQLSLSE EEAAEVLQAQ PPGIFLVHKS TKMQKKVLSL 

101 RLPCEFGAPL KEFAIKESTY TFSLEGSGIS FADLFRLIAF YCISRDVLPF 

151 TLKLPYAIST AKSEAQLEEL AQMGLNFWSS PADSKPPNLP PPHRPLSSDG 

201 VCPASLRQLC LINGVHSIKT RTPSELECSQ TNGALCFINP LFLKVHSQDL 

251 SGGLKRPSTR TPNANGTERT RSPPPRPPPP AINSLHTSPR LARTETQTSM 

301 PETVNHNKHG NVALPGTKPT PIPPPRLKKQ ASFLEAEGGA KTLSGGRPGA 

351 GPELELGTAG SPGGAPPEAA PGDCTRAPPP SSESRPPCHG GRQRLSDMSI 

401 STSSSDSLEF DRSMPLFGYE ADTNSSLEDY EGESDQETMA PPIKSKKKRS 

451 SSFVLPKIiVK SQLQKVSGVF SSFMTPEKRM VRRIAELSRD KCTYFGCLVQ 

501 DYVSFLQENK ECHVSSTDML QTIRQFMTQV KNYLSQSSEL DPPIESLIPE 

551 DQIDWLEKA MHKCILKPLK GHVEAMLKDF HMADGSWKQL KENLQLVRQR 

601 NPQELGVFAP TPDFVDVEKI KVKFMTMQKM YSPEKKVMLL LRVCKLIYTV 

651 MENNSGRMYG ADDFLPVLTY VIAQCDMLEL DTEIEYMMEL LDPSLLHGEG 

701 GYYLTSAYGA LSLIKNFQEE QAARLLSSET RDTLRQWHKR RTTNRTIPSV 

751 DDFQNYLRVA FQEVNSGCTG KTLLVRPYIT TEDVCQICAE KFKVGDPEEY 

801 SLFLFVDETW QQLAEDTYPQ KIKAELHSRP QPHIFHFVYK RIKNDPYGII 

851 FQNGEEDLTT S 
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BLASTP hits 


No BLASTP hits available 


Alert BLASTP hits for DKF2phutel_20g21 , frame 2 


TREMBL:RNU80076_1 product: "RINl"; Rattus norvegicus RINl mRNA, 
complete cds., N « 3, Score « 606, P = 6.8e-97 

PIR:A38637 Ras interactor RINl - human, N = 3, Score * 587, P = 1.9e-92 

TREMBL:HSRASINIi_l product: "ras inhibitor**; Human ras inhibitor mRNA, 
3' end., N. - 2, Score - 592, P « 9.8e-61 

SWISSPROT:RINl_HUMAN RAS INTERACTION/ INTERFERENCE PROTEIN 1 (RAS 
INHIBITOR JC99) (FRAGMENT)., N = 2, Score 587, p = 4.1e-60 

PIR:B38637 Ras inhibitor (clone JC265) - human (fragment), N = 1, Score 
« 2446, P = 4.6e-254 


>PIR:B38637 Ras inhibitor (clone JC265) - human (fragment) 
Length = 471 


Score = 2446 (367.0 bits). Expect = 4.6e-254, P = 4.6e-254 
Identities = 471/471 (100%), Positives = 471/471 (100%) 

Query: 391 GRQRLSDMSISTSSSDSLEFDRSMPLFGYEADTNSSLEDYEGESDQETMAPPIKSKKKRS 450 

GRQRLSDMSISTSSSDSLEFDRSMPLFGYEADTNSSLEDYEGESDQETMAPPIKSKKKRS 
Sbjct: 1 GRQRLSDMSISTSSSDSLEFDRSMPLFGYEADTNSSLEDYEGESDQETMAPPIKSKKKRS 60 

Query: 451 SSFVLPKLVKSQLQKVSGVFSSFMTPEKRMVRRIAELSRDKCTYFGCLVQDYVSFLQENK 510 

SSFVLPKLVKSQLQKVSGVFSSFMTPEKRMVRRIAELSRDKCTYFGCLVQDYVSFLQENK 
Sbjct: 61 SSFVLPKLVKSQLQKVSGVFSSFMTPEKRMVRRIAELSRDKCTYFGCLVQDYVSFLQENK 120 

Query: 511 ECHVSSTDMLQTIRQFMTQVKNYLSQSSELDPPIESLIPEDQIDWLEKAMHKCILKPLK 570 

ECHVSSTDMLQTIRQFMTQVKNYLSQSSELDPPIESLIPEDQIDWLEKAMHKCILKPLK 
Sbjct: 121 ECHVSSTDMLQTIRQFMTQVKNYLSQSSELDPPIESLIPEDQIDVVLEKAMHKCILKPLK 180 

Query: 571 GHVEAMLKDFHMADGSWKQLKENLQLVRQRNPQELGVFAPTPDFVDVEKIKVKFMTMQKM 630 

GHVEAMLKDFHKADGSWKQLKENLQLVRQRNPQELGVFAPTPDFVDVEKIKVKFMTMQKM 
Sbjct: 181 GHVEAMLKDFHMADGSWKQLKENLQLVRQRNPQELGVFAPTPDFVDVEKIKVKFMTMQKM 240 

Query: 631 YSPEKKVMLLLRVCKLIYTVMENNSGRMYGADDFLPVLTYVIAQCDMLELDTEIEYMMEL 690 

YSPEKKVMLLLRVCKLI YTVMENNSGRMYGADDFLPVLTYVI AQCDMLELDTEI EYMMEL 
Sbjct: 241 YSPEKKVMLLLRVCKLI YTVMENNSGRMYGADDFLPVLTYVI AQCDMLELDTEI EYMMEL 300 

Query: 691 LDPSLLHGEGGYYLTSAYGALSLIKNFQEEQAARLLSSETRDTLRQWHKRRTTNRTIPSV 750 

LDPSLLHGEGGYYLTSAYGALSLIKNFQEEQAARLLSSETRDTLRQWHKRRTTNRTIPSV 
Sbjct: 301 LDPSLLHGEGGYYLTSAYGALSLIKNFQEEQAARLLSSETRDTLRQWHKRRTTNRTIPSV 360 

Query: 751 DDFQNYLRVAFQEVNSGCTGKTLLVRPYITTEDVCQICAEKFKVGDPEEYSLFLFVDETW 810 

DDFQNYLRVAFQEVNSGCTGKTLLVRPYITTEDVCQICAEKFKVGDPEEYSLFLFVDETW 
Sbjct: 361 DDFQNYLRVAFQEVNSGCTGKTLLVRPYITTEDVCQICAEKFKVGDPEEYSLFLFVDETW 420 

Query: 811 QQLAEDTYPQKIKAELHSRPQPHIFHFVYKRIKNDPYGIIFQNGEEDLTTS 861 

* QQLAEDT YPQKI KAELHSRPQPHI FHFVYKRI KNDPYGI I FQNGEEDLTTS 
Sbjct: 421 QQLAEDTYPQKIKAELHSRPQPHIFHFVYKRIKNDPYGIIFQNGEEDLTTS 471 


HSPs: 


Pedant information for DKF2phutel_20g21, frame 2 


Report for DKFZphutel_20g21.2 


(LENGTH) 
[MW] 


861 

96380,26 
6.15 

PIR:B38637 Ras inhibitor (clone JC265) - human (fragment) 0.0 

08.13 vacuolar transport (S. cerevisiae, YML097c) 3e-10 

06.04 protein targeting, sorting and translocation (S. cerevisiae, YML097cl 


rpi] 


(HOMOL) 
(FUNCAT) 
IFUNCATI 
3e-10 


(FUNCAT) 
I FUNCAT) 
3e-10 


30.03 organization of cytoplasm (S. cerevisiae, YML097c) 3e-10 

08.07 vesicular transport (golgi networJc, etc.) [S. cerevisiae, YML097c) 


(PIRKW) 
(SUPFAMJ 


alternative splicing 3e-59 
Ras interactor RINl 3e-59 
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IKW] All_Alpha 

(KW] LOW_COMPLEXITY 11.27 % 

SEQ MVRTDVNLENGLEPAETHSMVRHKDGGYSEEEDVKTCARDSGYDSLSNRLSILDRLLHTH 

SEG 

PRO ccccceeeccccccccceeeeeecccccccccceeeeeeccccccchhhhhhhhhhhhhh 

SEQ PIWLQLSLSEEEAAEVLQAQPPGIFLVHKSTKMQKKVLSLRLPCEFGAPLKEFAIKESTY 

SEG . . .xxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhccccceeeeechhhhhhhhhhhcccccccccceeeeeeecc 

SEQ TFSLEGSGISFADLFRLIAFYCISRDVLPFTLKLPYAISTAKSEAQLEELAQMGLNFWSS 
SEG 

PRD ceeecccccchhhhhhhhhhhhhcceeeeeecccchhhhhhhhhhhhhhhhhhccccccc 

SEQ PADSKPPNLPPPHRPLSSDGVCPASLRQLCLINGVHSIKTRTPSELECSQTNGALCFINP 

SEG xxxxxxxxxx 

PRD cccccccccccccccccccccccchhhhhhcccccccccccccccccccccccceeeecc 

SEQ LFLKVKSQOLSGGLKRPSTRTPNANGTERTRSFPPRPPPPAINSLHTSPRLARTETQTSM 

SEG xxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ PETVNHNKHGNVALPGTKPTPI PPPRLKKQASFLEAEGGAKTLSGGRPGAGPELELGTAG 

SEG XXXXXXXXXXX XX 

PRD eeeeeccccccccccccccccccccchhhhhhhhhhhccccccccccccccceeeeeccc 

SEQ SPGGAPPEAAPGDCTRAPPPSSESRPPCHGGRQRLSDMSISTSSSDSLEFDRSMPLFGYE 

SEG XXXXXXXXXXXX XXXXXXXXXX XXXXXXXXXXXXXXXXXX 

PRD ccccccccccccccccccccccccccccccccccccccccccccccceeeccccccceee 

SEQ ADTNSSLEDYEGESDQETMAPPIKSKKKRSSSFVLPKLVKSQLQKVSGVFSSFMTPEKRM 

SEG xxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhcchhhh 

SEQ VRRIAELSRDKCTYFGCLVQDYVSFLQENKECHVSSTDMLQTIRQFMTQVKNYLSQSSEL 

SEG 

PRD hhhhhhhhhhchhhhhhhhhhhhhhhhcccccccccchhhhhhhhhhhhhhhhhhhhhcc 

SEQ DPPIESLI PEDQI DVVLEKAMHKC I LKPLKGHVEAMLKDFHMADGSWKQLKENLQLVRQR 

SEG 

PRD ccccccccccchhhhhhhhhhhhhccccchhhhhhhhhhhhhccccchhhhhhhhhhhhh 

SEQ NPQELGVFAPTPDFVDVEKI KVKFMTMQKMYSPEKKVMLLLRVCKLI YTVMENMSGRMYG 

SEG 

PRD ccccccccccccccchhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhcccccc 

SEQ ADDFLPVLTYVIAQCDMLELDTEIEYMMELLDPSLLHGEGGYYLTSAYGALSLIKNFQEE 

SEG 

PRD cccccccceeecccccchhhhhhhhhhhhhhcccccccccceeeeehhhhhhhhhhhhhh 

SEQ QAARLLSSETRDTLRQWHKRRTTNRTIPSVDDFQNYLRVAFQEVNSGCTGKTLLVRPYIT 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhcccccccchhhhhhhhhhhccccccceeeeecccccc 

SEQ TEDVCQICAEKFKVGDPEEYSLFLFVDETWQQLAEDTYPQKIKAELHSRPQPHIFHFVYK 

SEG 

PRD chhhhhhhhhheeecccccceeeeehhhhhhcccccccchhhhhhhhhccccceeeehhh 

SEQ RIKNDPYGIIFQNGEEDLTTS 

SEG 

PRD hhccccceeeeeccccccccc 

(No Prosite data available for DKF2phutel_20g21 .2) 
(No Pfam data available for DKFZphutel_20g21 .2) 
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DKFZphutel_20hl3 


group: intracellular transport and trafficking 

DKFZphutel_20hl3 encodes a novel 955 amino acid protein with similarity to alpha^adaptins . 

Adaptins are components of the adaptor complexes which link clathrin to receptors in coated 
vesicles. The alpha-adaptins, which are found exclusively, in endocytic coated vesicles, 
separate into two bands on SDS gels, designated A and C. The novel protein is very similar to 
both alpha adaptin A and C. The novel protein is a new human alpha-adaptin. 

The new protein can find application in modulating endocytosis and vesicle trafficking in 
cells. 


strong similarity to alpha-adaptins 

complete cDNA, complete cds start at Bp 78, EST hits 

Sequenced by AGOWA 

Locus: unknown 

Insert length: 3352 bp 

Poly A stretch at pos. 3297, polyadenylation signal at pos. 3279 


1 GCGCCCGGTC CCCGCTTGCC AGCCCCCGCT GCTCTGTGCC CTGTCCGGCC 
51 AGGCCTGGAG CCGACACCAC CGCCATCATG CCGGCCGTGT CCAAGGGCGA 
101 TGGGATGCGG GGGCTCGCGG TGTTCATCTC CGACATCCGG AACTGTAAGA 
151 GCAAAGAGGC GGAAATTAJ^G AGAATCAACA AGGAACTGGC CAACATCCGC 
201 TCCAAGTTCA AAGGAGACAA AGCCTTGGAT GGCTACAGTA AGAAAAAATA 
251 TGTGTGTAAA CTGCTTTTCA TCTTCCTGCT TGGCCATGAC ATTGACTTTG 
301 GGCACATGGA GGCTGTGAAT CTGTTGAGTT CCAATAAATA CACAGAGAAG 
351 CAAATAGGTT ACCTGTTCAT TTCTGTGCTG GTGAACTCGA ACTCGGAGCT 
401 GATCC6CCTC ATCAACAACG CCATCAAGAA TGACCTGGCC AGCCGCAACC 
451 CCACCTTCAT GTGCCTGGCC CTGCACTGCA TCGCCAACGT GGGCAGCCGG 
501 GAGATGGGCG AGGCCTTTGC CGCTGACATC CCCCGCATCC TGGTGGCCGG 
551 GGACAGCATG GACAGTGTCA AGCAGAGTGC GGCCCTGTGC CTCCTTCGAC 
601 TGTACAAGGC CTCGCCTGAC CTGGTGCCCA TGGGCGAGTG GACGGCGCGT 
651 GTGGTACACC TGCTCAATGA CCAGCACATG GGTGTGGTCA CGGCCGCCGT 
701 CAGCCTCATC ACCTGTCTCT GCAAGAAGAA CCCAGATGAC TTCAAGACGT 
751 GCGTCTCTCT GGCTGTGTCG CGCCTGAGCC GGATCGTCTC CTCTGCCTCC 
801 ACCGACCTCC AGGACTACAC CTACTACTTC GTCCCAGCAC CCTGGCTCTC 
851 GGTGAAGCTC CTGCGGCTGC TGCAGTGCTA CCCGCCTCCA GAGGATGCGG 
901 CTGTGAAGGG GCGGCTGGTG GAATGTCTGG AGACTGTGCT CAACAAGGCC 
951 CAGGAGCCCC CCAAATCCAA GAAGGTGCAG CATTCCAACG CCAAGAACGC 
1001 CATCCTCTTC GAGACCATCA GCCTCATCAT CCACTATGAC AGTGAGCCCA 
1051 ACCTCCTGGT TCGGGCCTGC AACCAGCTGG GCCAGTTCCT GCAGCACCGG 
1101 GAGACCAACC TGCGCTACCT GGCCCTGGAG AGCATGTGCA CGCTGGCCAG 
1151 CTCCGAGTTC TCCCATGAAG CCGTCAAGAC GCACATTGAC ACCGTCATCA 
1201 ATGCCCTCAA GACGGAGCGG GACGTCAGCG TGCGGCAGCG GGCGGCTGAC 
1251 CTCCTCTACG CCATGTGTGA CCGGAGCAAT GCCAAGCAGA TCGTGTCGGA 
1301 GATGCTGCGG TACCTCGAGA CGGCAGACTA CGCCATCCGC GAGGAGATCG 
1351 TCCTGAAGGT GGCCATCCTG GCCGAGAAGT ACGCCGTGGA CTACAGCTGG 
1401 TACGTGGACA CCATCCTCAA CCTCATCCGC ATTGCGGGCG ACTACGTGAG 
1451 TGAGGAGGTG TGGTACCGTG TGCTACAGAT CGTCACCTUVC CGTGATGACG 
1501 TCCAGGGCTA TGCCGCCAAG ACCGTCTTTG AGGCGCTCCA GGCCCCTGCC 
1551 TGTCACGAGA ACATGGTGAA GGTTGGCGGC TACATCCTTG GGGAGTTTGG 
1601 GAACCTGATT GCTGGGGACC CCCGCTCCAG CCCCCCAGTG CAGTTCTCCC 
1651 TGCTCCACTC CAAGTTCCAT CTGTGCAGCG TGGCCACGCG GGCGCTGCTG 
1701 CTGTCCACCT ACATCAAGTT CATCAACCTC TTCCCCGAGA CCAAGGCCAC 
1751 CATCCAGGGC GTCCTGCGGG CCGGCTCCCA GCTGCGCAAT GCTGACGTGG 
1801 AGCTGCAGCA GCGAGCCGTG GAGTACCTCA CCCTCAGCTC AGTGGCCAGC 
1851 ACCGACGTCC TGGCCACGGT GCTGGAGGAG ATGCCGCCCT TCCCCGAGCG 
1901 CGAGTCGTCC ATCCTGGCCA AGCTGAAACG CAAGAAGGGG CCAGGGGCCG 
1951 GCAGCGCCCT GGACGATGGC CGGAGGGACC CCAGCAGCAA CGACATCAAC 
2001 GGGGGCATGG AGCCCACCCC CAGCACTGTG TCGACGCCCT CGCCCTCCGC 
2051 CGACCTCCTG GGGCTGCGGG CAGCCCCTCC CCCGGCAGCA CCCCCGGCTT 
2101 CTGCAGGAGC AGGGAACCTT CTGGTGGACG TCTTCGATGG CCCGGCCGCC 
2151 CAGCCCAGCC TGGGGCCCAC CCCCGAGGAG GCCTTCCTCA GCCCAGGTCC 
2201 TGAGGACATC GGCCCTCCCA TTCCGGAAGC CGATGAGTTG CTGAATAAGT 
2251 TTGTGTGTAA GAACAACGGG GTCCTGTTCG AGAACCAGCT GCTGCAGATC 
2301 GGAGTCAAGT CAGAGTTCCG ACAGAACCTG GGCCGCATGT ATCTCTTCTA 
2351 TGGCAACAAG ACCTCGGTGC AGTTCCAGAA TTTCTCACCC ACTGTGGTTC 
2401 ACCC6GGAGA CCTCCAGACT CAGCTGGCTG TGCAGACCAA GCGCGTGGCG 
2451 GCGCAGGTGG ACGGCGGCGC GCAGGTGCAG CAGGTGCTCA ATATCGAGTG 
2501 CCTGCGGGAC TTCCTGACGC CCCCGCTGCT GTCCGTGCGC TTCCGGTACG 
2551 GTGGCGCCCC CCAGGCCCTC ACCCTGAAGC TCCCAGTGAC CATCAACAAG 
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2601 TTCTTCCAGC CCACCGAGAT GGCGGCCCAG GATTTCTTCC AGCGCTGGAA 
2651 GCAGCTGAGC CTCCCTCAAC AGGAGGCGCA GAAAATCTTC AAAGCCAACC 
2701 ACCCCATGGA CGCAGAAGTT ACTAAGGCCA AGCTTCTGGG GTTTGGCTCT 
2751 GCTCTCCTGG ACAATGTGGA CCCCAACCCT GAGAACTTCG TGGGGGCGGG 
2801 GATCATCCAG ACTAAAGCCC TGCAGGTGGG CTGTCTGCTT CGGCTGGAGC 
2851 CCAATGCCCA GGCCCAGATG TACCGGCTGA CCCTGCGCAC CAGCAAGGAG 
2901 CCCGTCTCCC GTCACCTGTG TGAGCTGCTG GCACAGCAGT TCTGAGCCCT 
2951 GGACTCTGCC CCGGGGGATG TGGCCGGCAC TGGGCAGCCC CTTGGACTGA 
3001 GGCAGTTTTG GTGGATGGGG GACCTCCACT GGTGACAGAG AAGACACCAG 
3051 GGTTTGGGGG ATGCCTGGGA CTTTCCTCCG GCCTTTTGTA TTTTTATTTT 
3101 TGTTCATCTG CTGCTGTTTA CATTCTGGGG GGTTAGGGGG AGTCCCCCTC 
3151 CCTCCCTTTC CCCCCCAAGC ACAGAGGGGA GAGGGGCCAG GGAAGTGGAT 
3201 GTCTCCTCCC CTCCCACCCC ACCCTGTTGT AGCCCCTCCT ACCCCCTCCC 
3251 CATCCAGGGG CTGTGTATTA TTGTGAGCGA ATAAACAGAG AGACGCTAAA 
3301 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
3351 AA 


BLAST Results 


No BLAST result 


Medline entries 


89155572: 

Cloning of cDNAs encoding two related 100-kD coated vesicle proteins 
(alpha-adaptins) . 

97431776: 

Alpha-adaptin, a marker for endocytosis, is expressed in complex 
patterns during Drosophila 
development. 


Peptide information for frame 3 


ORF from 78 bp to 2942 bp; peptide length: 955 
Category: strong similarity to knovm protein 


1 MPAVSKGDGM RGLAVFISDI RNCKSKEAEI KRINKELANI RSKFKGDKAL 
51 DGYSKKKYVC KLLFIFLLGH DIDFGHMEAV NLLSSNKYTE KQIGYLFISV 
101 LVNSNSELIR LINNAIKNDL ASRNPTFMCL ALHCIANVGS REMGEAFAAD 
151 IPRILVAGDS MDSVKQSAAL CLLRLYKASP DLVPMGEWTA RVVHLLNDQH 
201 MGVVTAAVSL ITCLCKKNPD DFKTCVSLAV SRLSRIVSSA STDLQDYTYY 
251 FVPAPWLSVK LLRLLQCYPP PEDAAVKGRL VECLETVLNK AQEPPKSKKV 
301 QHSNAKNAIL FETISLIIHY DSEPNLLVRA CNQLGQFLQH RETNLRYLAL 
351 ESMCTLASSE FSHEAVKTHI DTVINALKTE RDVSVRQRAA DLLYAMCDRS 
401 NAKQIVSEML RYLETADYAI REEIVLKVAI LAEKYAVDYS WYVDTILNLI 
451 RIAGDYVSEE VWYRVLQIVT NRDDVQGYAA KTVFEALQAP ACHENMVKVG 
501 GYILGEFGNL lAGDPRSSPP VQFSLLHSKF HLCSVATRAL LLSTYIKFIN 
551 LFPETKATIQ GVLRAGSQLR NADVELQQRA VEYLTLSSVA STDVLATVLE 
601 EMPPFPERES SILAKLKRKK GPGAGSALDD GRRDPSSNDI NGGMEPTPST 
651 VSTPSPSADL LGLRAAPPPA APPASAGAGN LLVDVFDGPA AQPSLGPTPE 
701 EAFLSPGPED IGPPIPEADE LLNKFVCKNN GVLFENQLLQ IGVKSEFRQN 
751 LGRMYLFYGM KTSVQFQNFS PTVVHPGDLQ TQLAVQTKRV AAQVDGGAQV 
801 QQVLNIECLR DFLTPPLLSV RFRYGGAPQA LTLKLPVTIN KFFQPTEMAA 
851 QDFFQRWKQL SLPQQEAQKI FKANHPMDAE VTKAKLLGFG SALLDNVDPN 
901 PENFVGAGII QTKALQVGCL LRLEPNAQAQ MYRLTLRTSK EPVSRHLCEL 
951 LAQQF 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_20hl3, frame 3 

PIR:B30111 alpha-adaptin C - mouse, N « 1, Score = 3990, P « 0 

PIR:S11276 alpha-adaptin c - rat, N = 1, Score • 3987, P « 0 

SWISSPROT:ADAC_RAT ALPHA-ADAPTIN C (CLATHRIN ASSEMBLY PROTEIN COMPLEX 2 
ALPHA-C LARGE CHAIN) (100 KD COATED VESICLE PROTEIN C) (PLASMA MEMBRANE 
ADAPTOR HA2/AP2 ADAPTIN ALPHA C SOBUNIT) ., N 1, Score « 3982, P = 0 
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SWISSPR0T;ADAC__MOUSE ALPHA-ADAPTIN C (CLATHRIN ASSEMBLY PROTEIN COMPLEX 
2 ALPHA-C LARGE CHAIN) (100 KD COATED VESICLE PROTEIN C) (PLASMA 
MEMBRANE ADAPTOR HA2/AP2 ADAPTIN ALPHA C SUBUNIT) . , N = 1, Score - 
3976, P = 0 

TREMBL:AB02 0706^1 gene: "KIAA0899"; product: "KIAA0899 protein"; Homo 
sapiens mRNA for KIAA0899 protein, partial cds., N = 1, Score = 3932, P 
= 0 


>PIR:B301H alpha-adaptin C - mouse 
Length * 938 

HSPs: 

Score - 3990 (598.6 bits), Expect - O.Oe+00, P - O.Oe+00 
Identities = 787/955 (82%), Positives =• 858/955 (89%) 

Query: 1 MPAVSKGDGMRGLAVFISDIRNCKSKEAEIKRINKELANIRSKFKGDKALDGYSKKKYVC 60 

MPAVSKGDGMRGLAVFISDIRNCKSKEAEIKRINKELANIRSKFKGDKALDGYSKKKYVC 
Sbjct: 1 MPAVSKGDGMRGLAVFISDIRNCKSKEAEIKRINKELANIRSKFKGDKALDGYSKKKYVC 60 

Query: 61 KLLFIFLLGHDIDFGHMEAVNLLSSNKYTEKQIGYLFISVLVNSNSELIRLINNAIKNDL 120 

KLLFIFLLGHDIDFGHMEAVNLLSSN+YTEKQIGYLFISVLVNSNSELIRLINNAIKNDL 
Sbjct: 61 KLLFIFLLGHDIDFGHMEAVNLLSSNRYTEKQIGYLFISVLVNSNSELIRLINNAIKNDL 120 

Query: 121 ASRNPTFMCIALHCIANVGSREMGEAFAADIPRILVAGDSMDSVKQSAALCLLRLYKASP 180 

ASRNPTFM LALHCIANVGSREM EAFA +IP+ILVAGD+MDSVKQSAALCLLRLY+ SP 
Sbjct: 121 ASRNPTFMGLALHCIANVGSREMAEAFAGEIPKILVAGDTMDSVKQSAALCLLEOiYRTSP 180 

Query: 181 DLVPMGEWTARWHLLNDQHMGWTAAVSLITCLCKKNPDDFKTCVSLAVSRLSRTVSSA 240 

DLVPMG+WT+RWHLLNDQH+GVVTAA SLIT L +KNP++FKT VSLAVSRLSRIV+SA 
Sbjct: 181 DLVPMGDWTSRVVHLLNDQHLGWTAATSLITTLAQKNPEEFKTSVSLAVSRLSRIVTSA 240 

Query: 241 STDLQDYTYYFVPAPWLSVKLLRLLQCYPPPEDAAVKGRLVECLETVLNKAOEPPKSKKV 300 

STDLQDYTYYFVPAPWLSVKLLRLLQCYPPP D AV+GRL ECLET+LNKAQEPPKSKKV 
Sbjct: 241 STDLQDYTYYFVPAPWLSVKLLRLLQCYPPP-DPAVRGRLTECLETILNKAQEPPKSKKV 299 

Query: 301 QHSNAKNAILFETISLIIHYDSEPNLLVRACNQLGQFLQHRETNLRYLALESMCTLASSE 360 

QHSNAKNA+LFE ISLI IH+DSEPNLLVRACNQLGQFLQHRETNLRYLALESMCTLASSE 
Sbjct: 300 QHSNAKNAVLFEAISLIIHHDSEPNLLVRACNQL(;QFLQHRETNLRYLALESMCTLASSE 359 

Query: 361 FSHEAVKTHIDTVINALKTERDVSVRQRAADLLYAMCDRSNAKQIVSEMLRYLETADYAI 420 

FSHEAVKTHI+TVINALKTERDVSVRQRA DLLYAMCDRSNA+QIV+EML YLETADY+I 
Sb j ct : 3 60 FSHEAVKTHIETVINALKTERDVSVRQRAVDLLYAMCDRSNAQQIVAEMLSYLETADYSI 419 

Query: 421 REEIVLKVAILAEKYAVDYSWYVDTILNLIRIAGDYVSEEVWYRVLQIVTNRDDVQGYAA 480 

REETVLKVAILAEKYAVDY+WYVDTILNLIRIAGDYVSEEVWYRV+QIV NRDDVQGYAA 
Sbjct: 420 REEIVLKVAILAEKYAVDYTWYVDTILNLIRIAGDYVSEEVWYRVIQIVINRDDVQGYAA 479 

Query: 481 KTVFEALQAPACHENMVKVGGYILGEFGNLIAGDPRSSPPVQFSLLHSKFHLCSVATRAL 540 

KTVFEALQAPACHEN+VKVGGYILGEFGNLIAGDPRSSP +QF+LLHSKFHLCSV TRAL 
Sbjct: 480 KTVFEALQAPACHENLVKVGGYILGEFGNLIAGDPRSSPLIQFNLLHSKFHLCSVPTRAL 539 

Query: 541 llstyikfinlfpetkatiqgvlragsqlrnadvelqqraveyltlssvastdvlatvle 600 

LLSTYIKF+NLFPE KATIQ VLR+ SQL+NADVELQQRAVEYL LS+VASTD+LATVLE 
Sbjct: 540 LLSTYIKFVNLFPEVKATIQDVLRSDSQIKNADVELQQRAVEYLRLSTVASTDILATVLE 599 

Query: 601 EMPPFPERESSILAKLKRKKGPGAGSALDDGRRDPSSNDINGGMEPTP STVSTPSPS 657 

EMPPFPERESSILAKLK+KKGP + L+-f +R+ S D+NGG EP P S STPSPS 
Sbjct: 600 EMPPFPERESSILAKLKKKKGPSTVTDLEETKRERSI-DVNGGPEPVPASTSAASTPSPS 658 

Query: 658 ADLLGLRAAPP-PAAPPASAGAGNLLVDVFDGPAAQPSLGPTPEEAFLSPGPEDIGPPIP 716 

ADLLGL A PP P PP S+G G LLVDVF A+ ++ P L+PG ED 
Sbjct: 659 ADLLGLGAVPPAPTGPPPSSGGG-LLVDVFSDSAS— AVAP LAPGSEDN 704 

Query: 717 EADELLNKFVCKNNGVLFENQLLQIGVKSEFRQNLGRMYLFYGNKTSVQFQNFSPTWHP 776 

+FVCKNNGVLFENQLLQIG+KSEFRQNLGRM++FYGNKTS QF NF+PT++ 
Sbjct: 705 FARFVCKNNGVLFENQLLQIGLKSEFRQNLGRMFIFYGNKTSTQFLNFTPTLICA 759 

Query: 777 GDLQTQLAVQTKRVAAQVDGGAQVQQVLNIECLRDFLTPPLLSVRFRYGGAPQALTLKLP 836 

DLQT L +QTK V VDGGAQVQQV+NIEC+ DF P+L+++FRYGG Q +++KLP 
Sbjct: 760 DDLQTNLNLQTKPVDPTVDGGAQVQQVVNIECISDFTEAPVLNIQFRYGGTFQNVSVKLP 819 

Query: 837 VTINKFFQPTEMAAQDFFQRWKQLSLPQQEAQKIFKANHPMDAEVTKAKLLGFGSALLDN 896 

+T+NKFFQPTEMA+QDFFQRWKQLS PQQE Q IFKA HPMD E+TKAK++GFGSALL+ 
Sbjct: 820 ITLNKFFQPTEMASQDFFQRWKQLSNPQQEVQNIFKAKHPMDTEITKAKIIGFGSALLEE 879 

Query: 897 VDPNPENFVGAGIIQTKALQVGCLLRLEPNAQAQMYRLTLRTSKEPVSRHLCELLAQQF 955 
VDPNP NFVGAGII TK Q+GCLLRLEPN QAQMYRLTLRTSK+ VS+ LCELL++QF 
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Sbjct: 880 VDPNPANFVGAGIIHTKTTQIGCLLRLEPNLQAQMYRLTLRTSKDTVSQRLCELLSEQr 938 
Pedant information for DKFZphutel_20hl3, frame 3 
Report for DKFZphutel_20hl3. 3 


(LENGTH] 

[MW] 

[plj 

{HOMOLJ 

[FUNCATJ 

yBL037wJ 

{FUNCATJ 

{FUNCAT] 

(FUNCATJ 

4e-04 

IPIRKWJ 

(PIRKWJ 

IPIRKWJ 

(PIRKWJ 

(SUPFAMJ 

(PROSITEJ 

(PROSITEJ 

(PROSITEJ 

[PROSITEJ 

(PROSITEJ 

[PROSITEJ 

(PROSITEJ 

(KWJ 

IKW] 


5e-67 


955 

105361.97 
7.75 

PIR:A30111 alpha-adaptin A - mouse 0.0 

30.09 organization of intracellular transport vesicles 


(S. cerevisiae. 


08.19 cellular import (S. cerevisiae, YBL037wj 5e-67 

06.10 assembly of protein complexes (S. cerevisiae, yBL037wI 5e-67 

08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YDR238cJ 

heterodimer 0.0 
transmembrane protein le-65 
membrane trafficking 0.0 
receptor 0.0 
beta-adaptin 5e-16 
MYRISTYL 7 
IG_MHC 1 
AMIDATIQN 1 
CK2_PH0SPH0_SITE 11 
TYRPHOS PHO_S I TE 3 
PKC_PHOSPHO_SITE 15 
ASN_GLYCOSYLATION 1 
All_Alpha 

LOW COMPLEXITY 6.81 % 


SEQ MPAVSKGDGMRGLAVFISDIRNCKSKEAEIKRINKELANIRSKFKGDKALDGYSKKKYVC 
SEG 

PRD ccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccc^ 

SEQ KLLFIFLLGHDIDFGHMEAVNLLSSNKYTEKQIGYLFISVLVNSNSELIRLINNAIKNDL 
SEG 

PRD hhhhhhhcccccccchhhhhhhhlicccccchhlihhhhhhhhhhcchhhhhh^ 

SEQ ASRNPTFMCLALHCIANVGSREMGEAFAADIPRILVAGDSMDSVKQSAALCLLRLYKASP 

SEG 

PRD cccccchhhhhhhhhhccchhhhhhhhhhhhhheeeccccchhhhhhhhhhhhhh^ 

SEQ DLVPMGEWTARWHLLNDQHMGWTAAVSLITCLCKKNPDDFKTCVSLAVSRLSRIVSSA 

SEG 

PRD cccccccchhhhhhhhhcccceeeehhhhhhhhhhhcccchhhhhhhhhhhhhhhhJihcc 

SEQ STDLQDYTYYFVPAPWLSVKLLRLLQCYPPPEDAAVKGRLVECLETVLNKAQEPPKSKKV 

SEG 

PRD ccccccceeeecccchhhhhhhhhhhhhcccccchhhhhhhjih^ 

SEQ QHSNAKNAILFETISLIIHYDSEPNLLVRACNQLGQFLQHRETNLRYLALESMCTLASSE 

SEG 

PRD cccccchhhhhhhhhhhhhcccccceeeeehhhhhhhhhhccccceeeehhhhhhhhh^ 

SEQ FSHEAVKTHIDTVINALKTERDVSVRQRAADLLYAMCDRSNAKQIVSEMLRYLETADYAI 

SEG 

PRD cchhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhcccch 

SEQ REEIVLKVAILAEKYAVDYSWYVDTILNLIRIAGDYVSEEVWYRVLQIVTNRDDVQGYAA 

SEG 

PRD hhlihhhhhhhhhhhhhccchhhhhhhhhhlihhhccccchhhhhhhheeeccccchjih^ 

SEQ KTVFEALQAPACHENMVKVGGYILGEFGNLIAGDPRSSPPVQFSLLHSKFHLCSVATRAL 

SEG 

PRD lihhhhhhhhhcccccceeeeeeeecccccccccccccccchhhhhhhhhhhcccchlihhh 

SEQ I-LSTYXKFINLFPETKATIQGVLRAGSQLRNADVELQQRAVEYLTLSSVASTDVLATVLE 

SEG 

PRD hhhhhhhhhhccccchhhhhhhhhhhcccchhhhhhhhhhhhhhhhhccchhhhhhhh^ 

SEQ EMPPFPERESSILAKLKRKKGPGAGSALDDGRRDPSSNDINGGMEPTPSTVSTPSPSADL 

* • xxxxxxxxxxxxxxx 

PRD nccccccchhhhhhhhhliccccccccccccccccccccccccccccccccccccccccce 

SEQ LGLRAAPPPAAPPASAGAGNLLVDVFDGPAAQPSLGPTPEEAFLSPGPEDIGPPIPEADE 

SEG 3CXXXXXXXXXXXXXXXXXXXXX . ••••»••■•••...,,,,.,, , xxxxxxxxxxxxxx 
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PRO eecccccccccccccccccceeeeeeccccccccccccccceeecccccccccccccccc 

SEQ LLNKFVCKNNGVLFENQLLQIGVKSEFRQNLGRMYLFYGNKTSVQFQNFSPTWHPGDLQ 

SEG 

PRD cceeeeeccccccchhhhhhhhcchhhhhccccceeeccccccccccccceeeeccchhh 

SEQ TQLAVQTKRVAAQVDGGAQVQQVLNIECLRDFLTPPLLSVRFRYGGAPQALTLKLPVTIN 

SEG xxxxxxxxxxxxxx 

PRD hhhhhhhhcccccccccchhhhhhhhhhccccccccceeeeeeccccccccccccccccc 

SEQ KFFQPTEMAAQDFFQRWKQLSLPQQEAQKI FKANHPMDAEVTKAKLLGFGSALLDNVDPN 

SEG 

PRD cccccchhhhhhhhhhhhhhhchhhhhhhhhhhcccchhhhhhhhhhccccceeeecccc 

SEQ PENFVGAGI IQTKALQVGCLLRLEPNAQAQMYRLTLRTSKEPVSRHLCELLAQQF 

SEG 

PRD ccceeeceeeeeccccceeeeecccchhhhhhhhhhhccccchhhhhhhhhhccc 


Prosite for DKFZphutel__20hl3. 3 



760 

->764 


54->57 


8 

5->88 


89->92 

w V/ \J\J\J ^ 

163 

->166 

PS00005 

189 

->192 

PS00005 

258 

->261 

PS00005 

297 

->3O0 

PS00005 

379 

->382 

PS00005 

384 

->387 

PS00005 

470 

->473 

PS00005 

787 

->790 

PS00005 

819 

->822 

PS00005 

832 

->835 

PS00005 

935 

->938 

PS00005 

938 

->941 

PS00006 


5->9 

PS00006 

104- 

->108 

PS00006 

368- 

->372 

PS00006 

379 

->383 

PS00006 

470- 

->474 

PS00006 

482- 

->486 

PS00006 

597- 

->601 

PS00006 

626- 

->630 

PS00006 

636- 

->640 

PS00006 

698- 

->702 

PS00006 

938- 

->942 

PS00007 

388- 

•>395 

PS00007 

411- 

->419 

PS00007 

434- 

->443 

PS00008 

202- 

'>208 

PS00008 

508- 

->514 

PS00008 

561- 

->567 

PS00008 

623->629 

PS00008 

759->765 

PS00008. 

826- 

->832 

PS00008 

908- 

->914 

PS00009 

630- 

■>634 

PS00290 

127- 

■>134 


ASN_GLYCOSYLATI0N 

PKCPHOSPHOSITE 

PKC_PHOSPHO_S I TE 

PKC_PHOS PHO__S I TE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC_PH0SPH0'SITE 

PKCPHOSPHOSITE 

PKC_PHOSPHO_SITE 

PKCPHOSPHOSITE 

PKCPHOS PHOS I T E 

PKC_PHOSPHO_S ITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_S ITE 

PKCPHOSPHO^SITE 

CK2_PH0SPH0_S ITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2PH0SPH0_SITE 

CK2 PHOSPHORS ITE 

CK2~PH0SPH0_SITE 

CK2_PHOSPHO SITE 

CK2_PH0SPH0"SITE 

CK2_PHOSPHO_SITE 

CK2_PHGS PKO_S ITE 

CK2_PHOSPH0_SITE 

TYR_PHOSPHO_SITE 

TYR_PHOSPHO_SITE 

TYR_PHOSPHO SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

AMI DAT I ON 

IG MHC 


PDOCOOOOl 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOCOOOOS 
PDOC00005 
PDOCOOOOS 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00007 
PDOC00007 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOC00008 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOC00262 


(No Pfam data available for DKFZphutel_20hl3.3) 
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DKFZphutel_20inll 
group: cell cycle 

DKFZphutel 20mll encodes a novel 225 amino acid protein with similarity to yeast sds22 and 
protein phosphatase-1 regulatory subunits. jr^aoL ana 

sds22 is 

of mitos-- ciiiu 
protein for protein phosphatase-1 

oJln^oh^r^co^'T^^ H^? ^^"^ application in modulating/blocking the activity of protein 
phosphatase-! and in modulating the cell cycle. 


ITmttnlit fn^nif^^^ polypeptide of protein phosphatase-1 that is required for the completion 
of mitosis in both fxssion and budding yeast. The novel protein seems^to be a new regulator 
or protein phosphatase-1. 


similarity to suppressor protein sds22 

complete cDNA, complete cds, EST hits 
localisation? only a part of the STS matches 

Sequenced by AGOWA 

Locus: /map="17-? 

Insert length: 5822 bp 

Poly A stretch at pos. 5803, polyadenylation signal at pes. 5786 

1 GGGCGCTTGG TTCCCCAGCA ACCGGGAGAC GCGTCTGCTG CGTGGAACCG 
51 CCGAGTTCCC AGCGCTTGAG AAGGAAAATT CTGGATCTGT TATCTGTGAG 
101 GAGGCCACTC CGTTGACAGT TGTGTAAAAC TCTGCTGCTT TCCCCAGCTC 
151 CAACCTCTCT GGTCTTCAAC AACACTATCA TCAGGGAAAA CGTGGGGGAA 
201 GATGAACCAG CCGTGCAACT CGATGGAGCC GAGGGTGATG GACGATGACA 
251 TGCTCAAGCT GGCCGTCGGG GACCAGGGCC CCCAGGAGGA GGCCGGGCAG 
301 CTGGCCAAGC AGGAGGGCAT CCTCTTCAAG GATGTCCTGT CCCTGCAGCT 
351 GGACTTTCGG AACATCCTCC GCATAGACAA CCTCTGGCAG TTTGAGAACT 
401 TGAGGAAGCT GCAGCTGGAC AATAACATCA TTGAGAAGAT CGAGGGCCTG 
4 51 GAGAACCTCG CACACCTGGT CTGGCTGGAT CTGTCTTTCA ACAACATTGA 
501 GACCATCGAG GGGCTGGACA CACTGGTGAA CCTGGAGGAC CTGAGCTTGT 
551 TCAACAACCG GATCTCCAAG ATCGACTCCC TGGACGCCCT CGTCAAGCTG 
601 CAGGTGTTGT CGCTGGGCAA CAACCGGATT GACAACATGA TGAACATCAT 
651 CTACCTCCGG CGGTTCAAGT GCCTGCGGAC GCTCAGCCTC TCTAGGAACC 
701 CTATCTCTGA GGCAGAGGAT TACAAGATGT TCATCTGTGC CTACCTTCCT 
751 GACCTCATGT ACCTGGACTA CCGGCGCATT GATGACCACA CAGCAAGTGT 
801 CTCCCTCTCA GTCTCCCAGC CCTGTGAGAC AGATTCCTCA AGCCCCCAGG 
851 TTTCTTGGAA AAGGGGCATT GAAGAGTAGC TTCCCCTGCC CACAACTAGG 
901 AGAGAAAGGG CAGCTCCCTC TTCCTAATCC CTTTACCTGA CTCTGTCAGA 
951 GTGATTCCAG CAGCACCCTT GTAAGTACTG TTTTGTGTGC GTTCCCAGGG 
1001 GCCAGGCCTC TTCCACACAC TGTCCCAGGG CCACCTCACA GCCATCCTGC 
1051 ACTGTCTAGT TTTCCAGATG AAGAAGCTGA GGAGGGCTGG GAGCAGTGGC 
1101 TCACGCCTGT AATCCCAGCA CTTTGAGAGG CTGAGGCGGG AGGATCGCTT 
1151 GAGCCAAGGA GTTCAAGACC AGCCTGGGCA ACATAGGGAG ACCCCATCTC 
1201 TACAGAAACT ACCAAAATTA GCCAGGTGTG GTGGCACACA CCAGTAATCC 
1251 TGGCTACTCA CAAGGCCGAG GTAGAAGAAT CGCTTGAGAC TAGGAGTTTG 
1301 AGGCTGCAGT GAACTAAGAA GATGCCATTG CACTCCAGCC TGGGCAACAG 
1351 AGTGAAAAAA TTAAAAAATT AGAAAAGAAA AGAAGTTGAG GAGGCCCAAG 
1401 GAGGGCAAGC AGCCAGGATC ACTGGCTCAA GGCCAAGCCA GGATTCACCC 
1451 TAAGTTGGTG TCATCCCAGG AGCAATATTA ACAGCTGAGC TCCAGAGGGA 
1501 ACCAGGCCAT CAGAGGCTCA GGCCTGGCTC TCAGGGGCAG AGTCAGGGCT 
1551 GGAGGTAGAG ACCTGAGTGT CATCTGAGGA TTGCCAATTG GCAGTAGTTG 
1601 AAGCCATGGT ACAGGTGGGA TCACCTGGGG CACATGGAGT GAGCTGGGGG 
1651 ACGGGGACTA AGTTCTAGAG GTGCCAGCAT TCCTGGCCAG GTACAGGGGG 
1701 ATGAGCCAGT GCGGTGGAGA GAGCCAAGGG CCAGACCCTC GTGACCAGCC 
1751 CTATGGCCTC ACTCTACCTC TGTCCTGTTG TCCTCCTTCC CTAAAAGAGG 
1801 GCCAGAAGGC CTGCTGAGGG CTGTTGGGAG TGAGAGAGCA AGTCCTCTGT 
1851 GGAGAACACC CAGTCTGGGG CGAGGGGAGC GCTCCATTGC TGTGGCTCCT 
1901 GCCCTGGAGA TGGCCCCGGG AACCCCAGCC TGCCACGCTG CCTTCCGCTC 
1951 CTCCTGGTCT TTCCCTGATT TCCCTGCGCT CACAAAAACC TGGTGAGGGT 
2001 CATCAGGAGA TGGGCATTCT CATCCACGAG ACCTCATGGC TTTCACAGCC 
2051 TTCATGCAGG CCCCTGTGCA ACACCCCTGC CCATGCGCGG GAGGCTGCAG 
2101 CATGGCAGAG GCGGCATGGC AGAGGCGGTG TGGCTCGGAG GAACCTCTGG 
2151 TAACAATGCC ACTCCCGTTC CCTGGTCAGA AAAAGCTTGC GGAGGCTAAG 
2201 CACCAGTACA GCATCGACGA GCTGAAGCAC CAGGAGAACC TGATGCAGGC 
2251 CCAGCTGGAG GACGAGCAGG CGCAGCGGGA GGAGGTAGAG AAGCACAAGA 
2301 CTGCGTTTGT GGAACACCTG AATGGCTCCT TCCTGTTTGA CAGCATGTAC 
2351 GCTGAGGACT CAGAGGGCAA CAATCTGTCC TACCTGCCTG GTGTCGGTGA 
2401 GCTCCTTGAG ACCTACAAGG ACAAGTTTGT CATCATCTGC GTGAATATTT 
2451 TTGAGTATGG CCTGAAACAG CAGGAGAAGC GGAAAACAGA GCTTGACACC 
2501 TTCAGTGAAT GTGTCGGTGA GGCCATCCAG GAAAACCAGG AGCAGGGCAA 
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2551 ACGCAAGATT GCCAAATTCG AGGAGAAGCA CTTGTCGAGT TTAAGTGCCA 
2601 TTCGAGAGGA GTTGGAACTG CCCAACATTG AGAAGATGAT CCTAGAATGC 
2651 AGTGCTGACA TCAGTGAGTT GTTCGATGCG CTCATGACGC TGGAGATGCA 
2701 GCTGGTGGAG CAGCTGGAGG TAAGGCTGGG CCCTGGGCAC AAGTGCCAGA 
2751 ATCTGGCGAT GCAGCTGCAC ATCCATAGGT GAACTGTAGC CTTCATGGGC 
2801 ACGCCTCTGC TGGAAACGTC CAGCACGACT CAGCGTGGCA GGCTGTAGCT 
2851 TTCTTGCTCA TCAGTCCTGT TTGCTTTTAT TACATTTTAA TCATTTACAT 
2901 TGGAAGTGAT TCTTGTGGAA AATGAGAGGT GAGCTCATTC TTCTGAAATG 
2951 GTCCCCCTAT CCTGGAAGTC AGTGGGGAGA GGTTTTTGAT TAGACCCCTG 
3O01 GAGCTATCCG GGTACTCTAA AGGCAAAGCG CACCCCCACT TGGGGACCAA 
3051 ACAAAGACCC CTCCGCATTG CAGCCTGCAG TTGCCGCTTC TCAGGTGACG 
3101 TGAGGAGGCT GCAACTCAGC ACTAAGTAGT GAAAATGAAA AGCGCCGCTG 
3151 TCTGAAATTC ATTAGCAGCC AGAGTATGTG TTACAAGGCA GCGGAGGCTG 
3201 GGAGTCTGAA GTGGTGTGAT GAATTGAACC TCATCGGATG CTGCTGTGGC 
3251 TGGGCCAAGT GATAGCACCT AATCAATTCC TCACACGTCA AGTGACACCT 
3301 CAGACATGGG ATAGATTTCC CCATCACATC ACAGGGCAGG TGCTCCCTCC 
3351 CTGCTGGAGA GCACAGGCAC TGCAGAAGCA GCGCACAGTG CCAGGGGCGA 
3401 GTGAGGCAGC AGCTCCCAGC CTTTTCAGGC ACGGAGATTG CCTTTCAACA 
3451 TCCAAACATT TCCCAGAACC CATGTGCCAT CCTACTTGTA TTACTGGTGG 
3501 CCAGAAAGCC ACAAGCGCAA TCATGCTTTT CAATGACCCT ATTTTTATTC 
3551 ACGAGAACAG CACATACATG TGTTTGAAAA TTATGTGAGG TGCTCACTCT 
3601 GCAGACAGTA CTCACATTCC TATAGATTCC ACCCCTGCCC ACCTTGCAGC 
3651 CCCTGGAGTC TATAGCAGAT GGGAGTGGGG CACTCCGAGA GTGGCAGGCC 
3701 TGGAGATCAC ATCTTCCATT GTTCCTTCAA TCAACACTAA CTCCCATTTG 
3751 GGCCTTAGGT GCCTTGCTAA GCACCACAAA ACAGCAACTA ACTGAAAGAG 
3801 ATCTGGAGTG CCAGCCCGCT CCTACTGAGG GCCTCCTCTC TGTCAGGCAC 
3851 CTTGCAAAGC ATTTTGTGTG AAGTGACTCA TTTAACCTCA CCACAACGCC 
3901 ACAACGCAGG GATTATGCAG GTAACCTATT TCCCAGATGA GGAAGATAAG 
3951 GCCCAAGGAG GTGAAATGCC TTTCCCAGAG TTACACAGAG TGCTGGAGCT 
4001 GGGAATACTG ACCCAGGCAG TCTAGCTCTT AACAGCTCAC TCCACTGTTT 
4051 CCCTGGAGGT GATGCACAGA TGTCACTGGG AAACCCAAAG GAGAGGGGGT 
4101 TGGCTGTGTG TGTGTGTGTT GGGCAGGCAG GTAAGGGGAG TAAGACCAGG 
4151 ACAAGTGTTC CTGGCAAAGT TCCGGTGACA GCATTAAACA TTCAGATGGT 
42 01 GAGGGAGTTA ATATGGTTGG AGAACAACAA CTTTAGAGAG AGCAGAGGGG 
4251 TCAGTTCACA ACCATCTGCT CAGGAGGGTC AAGATGGGTG GTCTTTATGC 
4301 TGAAGGTCTG TGATTAGAGG AGCTGGTTGC TAAATTTTGA GGAGTACCTT 
4 351 TTGCTCTGTG CTGGACATCT AAATATGCAT GTTAACTGTG TTCTTTAACA 
4 4 01 TTTCCAGGAG ACTATAAACA TGTTTGAAAG GAACATTGTT GACATGGTAG 
44 51 GACTGTTTAT CGAAAATGTC CAAAGCCTAT ATCCTTTCTG TGATGACCTT 
4501 CCCCATGGGG AGGTGCTACA GAGCCCCTGG GCTTGTCCCG GCCTCTGGAC 
4551 AAAAGAATGT TCCACAGGGT CTGAGGAGGT TTCCCGACCC TCAGAACAAT 
4 601 GATGGCCTGG TTAGAGCTGT GGTTTGGATG CCCAGAGGGA CAACATCCAA 
4651 ACTGTTTGCA GTAGGCTCCC AGCATGATTG TTCTCATATG AGTGATGTTC 
4701 ACTAGGAAAT GACGCCCCCT GTGTTGCAGG CAAGCACACT CTGGGGTTGA 
47 51 GGCAACCCCC ACGTGGAAGA CACTATAAGG AGTACATCAG GTGAAATGTT 
4801 AGGGTGAGGA GCCAACATCG GAGCATGGCC AACCCTTCTT CCACCCGAAC 
4851 TCAGGGCACT CCACATGGGG CAAACTGCTG TGCTCCAGCT AGCAGCAGCC 
4901 CTGTGGTCCT GCCCTCCTGG GGCTCACAGT CCCTCAGGGA GACAAGTTGT 
4951 AGAGGCAACA AGTGGTGCCA AATGCACAGG GTGAGAAGCA GTTAACCCAG 
5001 AGGCCAGGAG CCTCCATGCA GGAGGGAGAG AAGAGTGTGA TGGCAGGGGC 
5051 CGAGGGTCCG TCCGAGGTGT GGGGCAGGGG CAGGGAGTCG AGGAAGGCCC 
5101 AGGGTTCGGA GCTTGTGAGT GGACGGTGCT GCCAGCCAGA ATTTCCGAGC 
5151 TCGCCTTGGG CCCTTAAAGT CTGTCTCCCG CCGTCTGAGA GCATCAGGGA 
5201 CGCGCCGGGC CTGCTCCTCC CGGGCCTTTG CTTAACTCGG GGCTGCACGA 
5251 TGGCTCAGTG CCGGGACCTG GAGAATCACC ACCACGAGAA GCTCCTGGAG 
5301 ATCTCTATCA GCACCCTGGA GAAGATTGTC GAGGGCGACC TGGACGAGGA 
5351 CCTGCCTAAC GACCTGCGCG CGCTTTTTGT CGATAAAGAT ACGATTGTTA 
5401 ATGCTGTCGG GGCATCGCAC GACATCCACC TCCTGAAGAT TGACAATCGA 
5451 GAAGATGAGC TGGTGACCAG AATCAACTCT TGGTGTACAC GTTTAATAGA 
5501 CAGGATTCAC AAGGATGAGA TCATGAGGAA CCGCAAGCGC GTGAAGGAGA 
5551 TCAATCAGTA CATCGACCAC ATGCAGAGCG AACTGGACAA CCTGGAATGT 
5601 GGCGACATCC TAGACTAGAT GAATGTCAGC CACAGGAGCT TCTTCAAAAC 
5651 ATAGCACCAG CCCCAGCCAG GAGAAGGAAG TGCACACGCC TCACCCGCAC 
5701 CTCTAGAGAG TTGCTGGGCA TCTCTCAACC GCGATCCCCA ACACCATTCT 
5751 TCCCCCACCC CTGGAAAAAC TTCCAAAAGT AGAGAAAATA AAGGACTCAT 
5801 TTCACAAAAA AAAAAAAAAA AA 


BLAST Results 


Entry HS1292248 from database EMBL: 
human STS SHGC-53917. 
Score = 874, P - 3.3e-33, identities = 180/185 


Medline entries 


No Medline entry 
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Peptide information for frame 1 


ORF from 202 bp to 876 bp; peptide length: 225 
category: similarity to known protein 


1 MNQPCNSMEP RVMDDDMLKL AVGDQGPQEE AGQLAKQEGI LFKDVLSLQL 

51 DFRNILRIDN LWQFENLRKL QLDNNIIEKI EGLENLAHLV WLDLSFNNIE 

101 TIEGLDTLVN LEDLSLFNNR ISKIDSLDAL VKLQVLSLGN NRIDNMMNII 

151 YLRRFKCLRT LSLSRNPISE AEDYKMFICA YLPDLMYLDY RRIDDHTASV 

201 SLSVSQPCET DSSSPQVSWK RGIEE 

BLASTP hits 

Entry S68209 from database PIR: 

sds22 protein homolog - human >TREMBL :HSSDS22MR 1 gene* "sds22*'' 
product: "yeast sds22 homolog"; H. sapiens sds22-like mRNA 
Score « 234, P « 1.2e-19, identities = 61/143, positives = 93/143 

Entry A384 39 from database PIR: 

^iS?«nr"oLSf?o^t" sds22( + ) - fission yeast (Schizosaccharomyces pombe) 
sl^^ Inf ^•P^"^^ ^^^22+ gene, complete cds. 

Score = 208, P - 5.6e-17, identities » 52/127, positives = 71/127 

Entry S43988 from database PIR: 

'^cS^fcSoSJi'^?''??^'^'' ^"^^^^ " fission yeast (Schizosaccharomyces pombe) 
>SWISSPROT:SD22_SCHPO PROTEIN PHOSPHATASES PPl REGULATORY SUBUNIT 
SDS22. >TREMBL:SPAC4A8_12 gene: -sds22-; product: -phosphatases ppl 
regulatory subunif; S. pombe chrcMnosome I cosmid c4A8. 
Score « 208, P = 8.5e-17, identities - 52/127, positives « 71/127 

Entry CEK10D2_5 from database TREMBL: 

gene: "K10D2.1"; Caenorhabditis elegans cosmid K10D2. 

Score = 214, P = 3.6e-16, identities - 50/125, positives - 75/125 


Alert BLASTP hits for DKFZphutel_20mll, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphutel_20mll, frame 1 


Report for DKFZphutel' 20mll.l 


[LENGTH J 

[MWJ 

[pll 

[HOMOL] 

(FUNCAT) 

[FUNCATJ 

f FUNCAT) 

palmitylation, 

[FUNCATJ 

[ FUNCAT 1 

YJLOOSwJ 3e-05 

[FUNCATJ 

[FUNCATJ 

[FUNCAT) 

[FUNCATJ 

[FUNCAT) 

[EC) 

[PIRKW] 

[PIRKW] 

(PIRKW) 

(PIRKWJ 

[PIRKW) 

[PIRKW) 

[SUPFAMj 

[SUPFAMJ 

[SUPFAMJ 

(SUPFAMJ 

[PROSITEJ 

[PROSITEJ 


225 

25955.87 
4 . 63 

PIR:S68209 sds22 protein homolog - human le-18 

03.22 cell cycle control and mitosis [S. cerevisiae, YKL193cJ 2e-ll 
30.10 nuclear organization (S. cerevisiae, YKL193c) 2e-ll 
06.07 protein modification {glycolsylation, acylation, rayristylation, 
farnesylation and processing) (s. cerevisiae, YKLl93c) 2e-ll 

n?'S^ ff^anization of centrosome [S. cerevisiae, YOR373w) 2e«06 

01.03.10 metabolism of cyclic and unusual nucleotides [s. cerevisiae, 

03.10 sporulation and germination [S. cerevisiae, YJLOOSwJ 3e-05 

,A organization of plasma membrane [s. cerevisiae, YJLOOSwJ 3e-05 

10.04.03 second messenger formation [S. cerevisiae, YJL005w] 3e-05 

04.07 ma transport (S. cerevisiae, YPL169cJ 9e-04 

04.05.01.04 transcriptional control (S. cerevisiae, YCR065wJ 9e-04 

4.6.1.1 Adenylate cyclase 2e-06 

nucleus 5e-16 

duplication 2e-06 

tandem repeat 2e-06 

CAMP biosynthesis 2e-06 

glycoprotein 2e-06 

phosphorus-oxygen lyase 2e-'06 

leucine-rich alpha-2-glycoprotein repeat homology 5e-16 
fibromodulin 3e-07 

yeast adenylate cyclase catalytic domain homology 2e-06 
yeast adenylate cyclase 2e-06 
CK2_PH0S PHO_S I TE 2 
PKC_PH0SPH0 SITE 1 
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[KW] All^Alpha 


SEQ MNQPCNSMEPRVMDDDMLKLAVGDQGPQEEAGQLAKQEGILFKDVLSLQLDFRNILRIDN 

PRD ccccccccccccccchhhhhhcccccchhhhhhhhhhhchhhhhhhhhcccccccccccc 

SEQ LWQFENLRKLQLDNNIIEKIEGLENLAHLVWLDLSFNNIETIEGLDTLVNLEDLSLFNNR 

PRD hhhhhhhhhhhhcccccccccccchhhhhhhhcccccccccccccchhhhhhhhhccccc 

SEQ ISKIDSLDALVKLQVLSLGNNRIDNMMNIIYLRRFKCLRTLSLSRNPISEAEDYKMFICA 

PRD cccchhhhhhhhhhhhhccccccccccccccchhhhhhhhhcccccccccchhhhhhhhh 

SEQ YLPDLMYLDYRRI DDHTAS VSLSVSQPCETOSSS PQVSWKRGI EE 

PRD hhcccccccccccccchhhhhhhhccccccccccccccccccccc 


Prosite for DKFZphutel_20mll . 1 

PS00005 218->221 PKC_PHOSPHO_SITE PDOC00005 
PS00006 122~>126 CK2_PH0SPH0_SITE PDOC00006 
PS00006 169->173 CK2_PH0SPH0 SITE PDOC00006 


(No Pfam data available for DKFZphutel 20mll . 1) 
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DKFZphutel_20in24 
group: metabolism 

DKFZphutel_20ni24 encodes a novel 611 amino acid protein with similarity to a hypothetical 
C.elegans protein and to yeast Alg9 protein. 

This protein is a putative mannosyl transferase that is involved in the assembly of the core 
oligosaccharide Glc3Man9GlcNAc2 . 

The new protein can find application in modulation of glycosylation of proteins and as a new 
enzyme for biotechnologic production processes. 

strong similarity to S.cerevisiae Alg9p 

complete cDNA, complete cds, potential start at Bp 23, few EST hits 
Alg9 is involved in the assembly of the core oligosaccharide 
Glc3Mdn9GlcNAc2 

HSAC381 corresponding genomic DNA (2 exons) 
HSB8954 corresponding genomic DNA (1 exon ) 

Sequenced by AGOWA 

Locu s : /map= "11" 

Insert length: 1986 bp 

Poly A stretch at pes. 1966, polyadenylation signal at pos . 1949 

1 TTCTTTTTTC CCCAGGCTTG CCATGGCTAG TCGAGGGGCT CGGCAGCGCC 
51 TGAAGGGCAG CGGGGCCAGC AGTGGGGATA CGGCCCCGGC TGCGGACAAG 

101 CTGCGGGAGC TGCTGGGCAG CCGAGAGGCG GGCGGCGCGG AGCACCGGAC 

151 CGAGTTATCT GGGAACAAAG CAGGACAAGT CTGGGCACCT GAAGGATCTA 

201 CTGCTTTCAA GTGTCTGCTT TCAGCAAGGT TATGTGCTGC TCTCCTGAGC 

251 AACATCTCTG ACTGTGATGA AACATTCAAC TACTGGGAGC CAACACACTA 

301 CCTCATCTAT GGGGAAGGGT TTCAGACTTG GGAATATTCC CCAGCATATG 

351 CCATTCGCTC CTATGCTTAC CTGTTGCTTC ATGCCTGGCC AGCTGCATTT 

401 CATGCAAGAA TTCTACAAAC TAATAAGATT CTTGTGTTTT ACTTTTTGCG 

451 ATGTCTTCTG GCTTTTGTGA GCTGTATTTG TGAACTTTAC TTTTACAAGG 

501 CTGTGTGCAA GAAGTTTGGG TTGCACGTGA GTCGAATGAT GCTAGCCTTC 

551 TTGGTTCTCA GCACTGGCAT GTTTTGCTCA TCATCAGCAT TCCTTCCTAG 

601 TAGCTTCTGT ATGTACACTA CGTTGATAGC CATGACTGGA TGGTATATGG 

651 ACAAGACTTC CATTGCTGTG CTGGGAGTAG CAGCTGGGGC TATCTTAGGC 

701 TGGCCATTCA GTGCAGCTCT TGGTTTACCC ATTGCCTTTG ATTTGCTGGT 

751 CATGAAACAC AGGTGGAAGA GTTTCTTTCA TTGGTCGCTG ATGGCCCTCA 

801 TACTATTTCT GGTGCCTGTG GTGGTCATTG ACAGCTACTA TTATGGGAAG 

851 TTGGTGATTG CACCACTCAA CATTGTTTTG TATAATGTCT TTACTCCTCA 

901 TGGACCTGAT CTTTATGGTA CAGAACCCTG GTATTTCTAT TTAATTAATG 

951 GATTTCTGAA TTTCAATGTA GCCTTTGCTT TGGCTCTCCT AGTCCTACCA 
1001 CTGACTTCTC TTATGGAATA CCTGCTGCAG AGATTTCATG TTCAGAATTT 
1051 AGGCCACCCG TATTGGCTTA CCTTGGCTCC AATGTATATT TGGTTTATAA 
1101 TTTTCTTCAT CCAGCCTCAC AAAGAGGAGA GATTTCTTTT CCCTGTGTAT 
1151 CCACTTATAT GTCTCTGTGG CGCTGTGGCT CTCTCTGCAC TTCAGAAATG 
1201 TTACCACTTT GTGTTTCAAC GATATCGCCT GGAGCACTAT ACTGTGACAT 
1251 CGAATTGGCT GGCATTAGGA ACTGTCTTCC TGTTTGGGCT CTTGTCATTT 
1301 TCTCGCTCTG TGGCACTGTT CAGAGGATAT CACGGGCCCC TTGATTTGTA 
1351 TCCAGAATTT TACCGAATTG CTACAGACCC AACCATCCAC ACTGTCCCAG 
14 01 AAGGCAGACC TGTGAATGTC TGTGTGGGAA AAGAGTGGTA TCGATTTCCC 
1451 AGCAGCTTCC TTCTTCCTGA CAATTGGCAG CTTCAGTTCA TTCCATCAGA 
1501 GTTCAGAGGT CAGTTACCAA AACCTTTTGC AGAAGGACCT CTGGCCACCC 
1551 GGATTGTTCC TACTGACATG AATGACCAGA ATCTAGAAGA GCCATCCAGA 
1601 TATATTGATA TCAGTAAATG CCATTATTTA GTGGATTTGG ACACCATGAG 
1651 AGAAACACCC CGGGAGCCAA AATATTCATC CAATAAAGAA GAATGGATCA 
1701 GCTTGGCCTA TAGACCATTC CTTGATGCTT CTAGATCTTC AAAGCTGCTG 
1751 CGGGCATTCT ATGTCCCCTT CCTGTCAGAT CAGTATACAG TGTACGTAAA 
1801 CTACACCATC CTCAAACCCC GGAAAGCAAA GCAAATCAGG AAGAAAAGTG 
1851 GAGGTTAGCA ACACACCTGT GGCCCCAAAG GACAACCATC TTGTTAACTA 
1901 TTGATTCCAG TGACCTGACT CCCTGCAAGT CATCGCCTGT AACATTTGTA 
1951 ATAAAGGTCT TCTiSACATGA AAAAAAAAAA AAAAAA 

BLAST Results 


Entry HSAC381 from database EMBL: 

Homo sapiens chromosome 11 pac pDJ159ol, complete sequence. 
Length = 42,771 

Entry HSB8954 from database EMBL: 
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CSRL-50A3-U cSRL flow sorted Chromosome 11 specific cosmid Homo 
sapiens genomic clone cSRL-50A3. 
Length « 601 


Medline entries 


96293493: 

Stepwise assembly of the lipid-linked oligosaccharide in the 
endoplasmic reticulum of Saccharomyces cerevisiae: 
identification of the ALG9 gene encoding a putative 
mannosyl transferase. 


Peptide information for frame 2 


ORF from 23 bp to 1855 bp; peptide length: 611 
Category: strong similarity to known protein 


1 MASRGARQRL KGSGASSGDT APAADKLREL LGSREAGGAE HRTELSGNKA 
51 GQVWAPEGST AFKCLLSARL CAALLSNISD CDETFNYWEP THYLIYGEGF 
101 QTWEYSPAYA IRSYAYLLLH AWPAAFHARI LQTNKILVFY FLRCLLAFVS 
151 CICELYFYKA VCKKFGLHVS RMMLAFLVLS TGMFCSSSAF LPSSFCMYTT 
201 LIAMTGWYMD KTSIAVLGVA AGAILGWPFS AALGLPIAFD LLVMKHRWKS 
251 FFHWSLMALI LFLVPVVVID SYYYGKLVIA PLNIVLYNVF TPHGPDLYGT 
301 EPWYFYLING FLNFNVAFAL ALLVLPLTSL MEYLLQRFHV QNLGHPYWLT 
351 LAPMYIWFII FFIQPHKEER FLFPVYPLIC LCGAVALSAL QKCYHFVFQR 
401 YRLEHYTVTS NWLALGTVFL FGLLSFSRSV ALFRGYHGPL DLYPEFYRIA 
451 TDPTIHTVPE GRPVNVCVGK EWYRFPSSFL LPDNWQLQFI PSEFRGQLPK 
501 PFAEGPLATR IVPTDMNDQN LEEPSRYIDI SKCHYLVDLD TMRETPREPK 
551 YSSNKEEWIS LAYRPFLDAS RSSKLLRAFY VPFLSDQYTV YVNYTILKPR 
601 KAKQIRKKSG G 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_20ra24, frame 2 

SWISSPR0T;YTH3_CAEEL HYPOTHETICAL 75,5 KD PROTEIN C14A4,3 IN CHROMOSOME 
II., N = 1, Score = 957, P = 2.7e-96 

PIR:S63177 mannosyl transferase {EC 2.4.1.-) - yeast (Saccharomyces 
cerevisiae), N = 1, Score = 533, P - 2.3e-51 

SWISSPR0T:YTH3_CAEEL HYPOTHETICAL 75.5 KD PROTEIN C14A4-3 IN CHROMOSOME 
II., N = 1, Score = 957, p = 2.7e-96 

PIR:S63177 mannosyl transferase (EC 2.4.1.-) - yeast (Saccharomyces 
cerevisiae), N = 1, Score = 533, p = 2.3e-51 


>SWISSPR0T:YTH3 CAEEL HYPOTHETICAL 75.5 KD PROTEIN C14A4.3 IN CHROMOSOME 

Length = 653 

HSPs: 

Score = 957 (143.6 bits). Expect = 2-7e-96, P = 2.7e-96 
Identities = 206/514 {40%), Positives = 296/514 (57%) 

Query: 48 NKAGQVWAPEGSTAFKCLLSARLCAALLSNISDCDETFNYWEPTHYLIYGEGFQTWEYSP 107 

N W + FK LLS R+ A+ I+DCDE tNYWEP H ♦YGEGFQTWEYSP 

Sbjct: 43 NNPDNDWPFSFGSVFKMLLSIRISGAIWGIINDCDEVYNYWEPLHLFLYGEGFQTWEYSP 102 

Query: 108 AYAIRSYAYLLLHAWPAAFHARILQTNKILVFYFLRCLLAFVSCICELYFYKAVCKKFGL 167 

YAIRSY Y+ LH PA+ A + KI+VF +R + + E Y + A+CKK + 

Sbjct: 103 VYAIRSYFYIYLHYIPASLFANLFGDTKIWFTLIRLTIGLFCLLGEYYAFDAICKKINI 162 

Query: 168 HVSRMMLAFLVLSTGMFCSSSAFLPSSFCMYTTLIAMTGWYMDKTSIAVLGVAAGAILGW 227 
R + F + S+GMF +S-fAF+PSSFCM T + + + + + VA ++GW 

Sbjct: 163 ATGRFFILFSIFSSGMFLASTAFVPSSFCMAITFYILGAYLNENWTAGIFCVAFSTMVGW 222 

Query: 228 PFSAALGLPIAFDLLVMKHRWKSFFHWSLMALILFLVPVWIDSYYYGKLVIAPLNIVLY 287 
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Sbjct: 
Query : 
Sbjct; 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 


PFSA LGLPI D+L+ + K F SL+ + V+ DS+Y+GK V+APLNI LY 

223 PFSAVLGLPIVADMLLLKGLRIRFILTSLVIGLCIGGVQVITDSHYFGKTVLAPLNIFLY 282 

288 NVFTPHGPDLYGTEPWYFYLINGFLNFNVAFALALLVLPLTSLMEYLLQRFHVQNLGHPY 347 
NV + GP LYG EP FY+ N F N+N+ A PL+ + y + + q+ 

283 NVVSGPGPSLYGEEPLSFYIKNLFNNWNIVIFAAPFGFPLS— LAYFTKVWMSQDRNVAL 340 

348 WLTLAPMYI WPI I FFIQPHKEERFLFPVYPLICLCGAVALSALQKCYHFVFQR 400 

+ AP+ + W +IF Q HKEERFLFP+YP I A+AL A + ++ 

341 YQRFAPIILLAVTTAAWLLIFGSQAHKEERFLFPIYPFIAFFAALALDATNR— LCLKK 397 

401 YRLEHYTVTSNWLALGTVFLFGLLSFSRSVALFRGYHGPLDLYPEFYRIATDPTIHTVPE 4 60 

++ N L++ + F +LS SR+ ++ Y T+ T + 

398 LGMD NILSILFILCFAILSASRTYSIHNNYGSHVEIYRSLNAELTKRT-NFKNF 450 

461 GRPVNVCVGKEWYRFPSSFLLPDNW QLQFIPSEFRGQLPKPFAEGPL— ATRI 511 

P+ VCVGKEW+RFPSSF +P SEFRG LPKPF + TR 

451 HDPIRVCVGKEWHRFPSSFFIPQTVSDGKKVEMRFIQSEFRGLLPKPFLKSDKLVEVTRH 510 

512 VPTDMNDQNLEEPSRYIDISKCHYLVDLDTMRETPREPKYSSNKEEW 558 

+PT+MN+ N EE SRY+D+ C Y+VD+D M ++ REP + ++ + 
511 IPTEMNNLNQEEISRYVDLDSCDYVVDVD-MPQSDREPDFRKMRQNY 556 


Pedant information for DKFZphutel_20m24, frame 2 


Report for DKF2phutel_20in24 .2 


{LENGTH J 

[MWJ 

{plj 

(HOMOLJ 

93 

(FUNCAT) 

fFQNCAT] 

4e-69 

[FUNCAT] 

[PIRKW3 

[PIRKW] 

(PIRKWJ 

[PROSITEJ 

tPROSITE) 

(PROSITEI 

[PROSITE} 

IPROSITEI 

tKW} 

[KWJ 


611 

69863-78 
8.91 

SWISSPR0T:YTH3_CAEEL HYPOTHETICAL 75.5 KD PROTEIN C14A4.3 IN CHROMOSOME II. 2e- 

nVnl niT"^!^^°^ ""^^^ "^^^ cerevisiae, YNL219cl 4e-69 

01.05.01 lipid, fatty-acid and sterol biosynthesis [S. cerevisiae, yNL219c) 


01.05.01 carbohydrate utilization 
glycosyltransferase 9e-68 
transmembrane protein 9e-68 
hexosyltransferase 9e-68 
MYRISTYL 9 
CAMP_PHOSPH0_SITE 1 
CK2_PH0SPH0_SITE 7 
PKC_PHOSPHO_SITE 6 
ASNGLYCOSYLATION 2 
TRANSMEMBRANE 7 
LOW_CC»«PLEXITY 6.71 % 


IS. cerevisiae, YNL219c) 4e-69 


SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 


MASRGARQRLKGSGASSGDTAPAADKLRELLGSREAGGAEHRTELSGNKAGQVWAPEGST 

ccchhhhhhhcccccccccccchhhhhhhhhccccccccccceeecccccccccc^ 
MMMMMM 

AFKCLLSARLCAALLSNISDCDETFNYWEPTHYLIYGEGFQTWEYSPAYAIRSYAYLLLH 
. . .xxxxxxxxxxxxx 

MMMMMMMMMMMMMMMMM „ 

« M 

AWPAAFHARI LQTNKI L VFYFLRCLLAFVSC ICELYFYKAVCKKFGLHVSRMMLAFLVLS 

cchhhhhhhhhcchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh^ 

MMMMMMMMMMMMMMMMMMMMMMMMNIMMM 

TGMFCSSSAFLPSSFCMYTTLIAMTGWYMDKTSIAVLGVAAGAILGWPFSAALGLPIAFD 
xxxxxxxxxxxxx 

cceeeeccccccchhhhhhhhhhhhcccccccceeeeeehhhhhhccceeeeeecchhhh 
MMMMMMMMMMMMMM 

LLVMKHRWKSFFHWSLMALILFLVPVVVIOSYYYGKLVIAPLNIVLYNVFTPHGPDLYGT 

hhhhhhhhhhhhhhhhhhhhhheeeeeeeecccccccccccceeeeeeeecccccccccc 
MMMMMMM . MMMMMMMMMMMMM^1^1^1MM^1MM 

EPWYFYLINGFLNFNVAFALALLVLPLTSLMEYLLQRFHVQNLGHPYWLTLAPMYIWFII 
xxxxxxxxxxxxxxx 

cceeeeeecccccchhhhhhhhhhhhchhhhhhhhhhhhccccccceeeeehhh^^ 
MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 
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SEQ 
SEG 
PRD 
MEN 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRO 
MEM 

SEQ 

SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 


FFIQPHKEERFLFPVyPLICLCGAVALSALQKCYHFVFQRYRLEHYTVTSNWIALGTVFL 
hhcccchhhhhhcccceeehhhhhhhhhhhhhhhhhhhhhhhhheeeeccchhhhhhhee 

MMMMMMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMM . 

FGLLSFSRSVALFRGYHGPLDLYPEFYRIATDPTIHTVPEGRPVNVCVGKEWYRFPSSFL 

eehhhhhhhheeecccccccccccceeeeccccccceeecccceeeeeeccccccccccc 

LPDNWQLQFIPSEFRGQLPKPFAEGPLATRIVPTDMNDQNLEEPSRYIDISKCHYLVDLD 

ccccceeeecccccccccccccccccceeeeccccccccccccccceeeeeeceeeeecc 

TMRETPREPKYSSNKEEWISLAYRPFLDASRSSKLLRAFYVPFLSDQYTVYVNYTILKPR 
cccccccccccchhhhhhhhhhhhhhhhhhhhhhheeeeeeeeecceeeeeeeeeecccc 

KAKQIRKKSGG 
hhhhhhccccc 


Prosite for DKFZphutel_20m24 .2 


PSOOOOl 
PSOOOOl 
PS00004 
PS00005 
PS00005 
PS0C005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 


77->81 
593->597 
606->610 
67->70 
133->136 
541->544 
545->548 
553->556 
572->575 
16->20 
79->83 
329->333 
457->461 
541->545 
545->549 
553->557 
12->18 
14->20 
32->38 
47->53 
166->172 
182->188 
218->224 
222->228 
234->240 


ASNGL YCOS YLAT ION 
ASN_GLYCOSYLATI0N 
CAMP_PHOS PHO_S ITE 
PKC_PHOS PHO_S ITE 
PKC_PHOSPHO SITE 
PKC^PHOS PHO~S ITE 
PKC_PHOS PHO_S ITE 
PKC_PHOSPHO_SITE 
PKCPHOS PHO_S I TE 
CK2_PH0S PHO_S ITE 
CK2_PH0SPH0_SITE 
CK2_PH0SPH0_SITE 

CK2_PHOSPH0_SITE 

CK2_PHOSPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 


PDOCOOOOl 
PDOCOOOOl 
PDOC00004 
PDOC00005 
PDOCOOOOS 
PDOC00005 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
POOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 


(No Pfara data available for DKFZphutel 20m24.2) 


506 


WO01/J2659 


PCT/IBOO/01496 


DKFZphutel_21dl5 


group: uterus derived 

DKFZphutel_21dl5 encodes a novel 191 amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 


unknown 

Sequenced by MediGenomix 
Locus: /chromosome=''3" 

Insert length: 5292 bp 

Poly A stretch at pes, 5273, polyadenylation signal at pos. 5252 


1 CTCCCACTAG TGTATGCCTT AATGGTGCCG CTCTTGTCCG CGTCTACGCT 
51 TGGGACCTTG GCTTCTGACT TGGAGAGTGT ACAGCTCTGC CCGACGGCAA 
101 CCCAGCTTGG GAAGAGAAGC CCCAGCGTGG GCTGGGGCTC AAGGCGCAGG 
151 AAGGCCGAGC CCGGCGCGGA CGCAGGCGGC TCCGGGCGGG CTCAGCACCC 
201 CCAGGCACCG TCTCCTAGTG ACCGCGGCGC TCGCGGGCCT GGCGGCCGTT 
251 GTCCGGGCGA CTGCGCAGCG CGGGCACCCC CGCGGCCCCT CCCCTGGGCG 
301 CGCGCGCGAC CTGGGTGCCA TGGCGGCAGC GGCGGTGACA GGCCAGCGGC 
351 CTGAGACCGC GGCGGCCGAG GAGGCCTCGA GGCCGCAGTG GGCGCCGCCA 
401 GACCACTGCC AGGCTCAGGC GGCGGCCGGG CTGGGCGACG GCGAGGACGC 
451 ACCGGTGCGT CCGCTGTGCA AGCCCCGCGG CATCTGCTCG CGCGCCTACT 
501 TCCTGGTGCT GATGGTGTTC GTGCACCTGT ACCTGGGTAA CGTGCTGGCG 
551 CTGCTGCTCT TCGTGCACTA CAGCAACGGC GACGAAAGCA GCGATCCCGG 
601 GCCCCAACAC CGTGCCCAGG GCCCCGGGCC CGAGCCCACC TTAGGTCCCC 
651 TCACCCGGCT GGAGGGCATC AAGGTGAGGA CCTCCCTGCC CCGCCGCGCT 
701 CCAGGCCCTG CACGGCTGAG CCCGAGAGGA CCGGCGCTCA GCCCGGGTCC 
751 CCACGCTGCC CCCGGCGCTG CTCTGCGTCG GTCCCGCGCG CTCCCACTCA 
801 CTCGCCTGCT GTCGCTCTCC GGGCCGGGGC GACTTGGCCC TTTTTGGGCA 
851 GCGCGGTCTG GCGCCCCAGC TGCCCGCTGT GCGCCTTTTC CTTAGGTGGG 
901 GCACGAGCGT AAGGTCCAGC TGGTCACCGA CAGGGATCAC TTCATCCGAA 
951 CCCTCAGCCT CAAGCCGCTG CTCTTCGAAA TCCCCGGCTT CCTGACTGAT 
1001 GAAGAGTGTC GGCTCATCAT CCATCTGGCG CAGATGAAGG GGTTACAGCG 
1051 CAGCCAGATC CTGCCTACTG AAGAGTATGA AGAGGCAATG AGCACTATGC 
1101 AGGTCAGCCA GCTGGACCTC TTCCGGCTGC TGGACCAGAA CCGTGATGGG 
1151 CACCTTCAGC TCCGTGAGGT TCTGGCCCAG ACTCGCCTGG GAAATGGATG 
1201 GTGGATGACT CCAGAGAGCA TTCAGGAGAT GTACGCCGCG ATCAAGGCTG 
1251 ACCCTGATGG TGACGGTGAG CTCACACCTC TGCACAGTCC TATCCCCGTG 
1301 AGCCTCCTGC CCACTCCCAG GTGCACAATT TTGAAAACTT GGGCCCTTCC 
1351 CCCACAGCCA GGCAGCCTCT CTGCACCCCT TTATAGTGGC CAGAGATGGG 
1401 GAGGTGAAGA TCCAGCCTTG CTTTTTACCC CTGGGAAGTA GGCAGGCAGC 
1451 CAGGCCCCCC GTTCCCCTTG GTGATGGTCT CGAGGGCAGT TCTTGGAGAC 
1501 CCTTTTGATA ACATCAGGCA GAGTTGAGAG CCTGGGGACA GGAAGTAGGG 
1551 CTGCTAGTTG GCAGAGAACA GAGTGGGTGG AGCAGGAGCA AGGCGACAGT 
1601 GAGGCCAGCT AGAGCTTGGC TGTTTACCCT GCTCCATCCA TCTCTCCAGC 
1651 CAGACACGAG GTCCACCCCA GCAGACAGCT TCCCTGGTCT AAGTGAGGTC 
1701 TCCCTTGCCT TCCTCTTGTC CACCTGGAGT CATGCCGAAG CGCCTAAAAT 
1751 GGTAGTGCTG CTACCTGTGC TAACTGCTGG GGAGGGGTGG GCAGGGAAGC 
1801 TGTCATGCAA GTGGTGCCCC CTCTGGTAAT AACTCTCAGG AGGTTTCTGA 
1851 GGTGTGGTCA TCACCCTCAT GCCCAAATTC TGGACCAAGA GAGGAAGATA 
1901 CAGCAGTTAG AAAGGACTTG GAACAGTGGC TTTGCGGCTG GTGAACCAGA 
1951 GTGAAGAATC TGGCCGTGAC CTGGGTGCCA CACTGCTATA GGCCCCAGAA 
2001 CAGAGGTGGT GACAGTCTCA CAGCCCTTGA ATGTCCCCCA CCCTCAGAGG 
2051 AATCTGGGCC AAAGAGTGGA AGGTGATGTC CTTGGGTCAG CCAGAATAAC 
2101 ATGGAGCAAA GATACCAACT ACTCTTCCAG AACCCCAAGA GGGTAGAACC 
2151 CCTGCTTAAT GGTTTGAGCA GGGACAGTGG AGAATGTTCT CATGAGAGGG 
2201 GGTGGCCTGA CTTTCGTTGC TAAGTGGGCT GGTAACGCAG TAGGCAGGGC 
2251 TGGCGAAGTA GGTTCCACCC AGGATGAAAC CTGGGGTCAT GAGGAACTCC 
2301 CCGGGGGCTG GCCCTGCTTG CACCCTGGCG TATGTATGTA AGGCCCTGGA 
2351 TGAGGCCCAG CACTGCCTGC TCTCTCCTCA CCCTCCACAG GCCGGAGAGT 
2401 GGCCACCACT CTATATAGCC AGGCTGGAAG GCCAGGGTCC TGGCCATATG 
2451 GCTCAAGCTT CCTTTGGAGA ACCTTCTCTG GCCACTCTAA TAGGGGGTGG 
2501 GCCTCTTTCT TCTTAGGGCC AAATTAGGGC TTAAACTGAG AAAAGGAACT 
2551 GCTCTGGGTC TTCCTGT/VAG GCCTGATGTG ACAGAAACCA GGTTCATCTG 
2601 ACCCAAAAGT CCAGGTGGGG GACAAGTGTA CAAGGCCCCT CAGTGCCTGA 
2651 GGTCAGGGGC TGCTGCTGCC TTTGGGGTAG GTAGGGAAGT GCAGCCTGCC 
2701 ACTGTTGCCT CCCAATATGG GCTTGGTGGG CATTGATGGT GGGTGCCCTG 
2751 TGCAGGAGTG CTGAGTCTGC AGGAGTTCTC CAACATGGAC CTTCGGGACT 
2801 TCCACAAGTA CATGAGGAGC CACAAGGCAG AGTCCAGTGA GCTGGTGCGG 
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28 5X AACAGCCACC ATACCTGGCT CTACCAGGGT GAGGGTGCCC ACCACATCAT 
2901 GCGTGCCATC CGCCAGAGGT GAGCACCTGA AGCTGTTCTC ACTGGAGCAG 
2951 GGGGAGAAGA CTGGGCAGGG CCTCCACAGA AGTCCTTGTC TGGGGCCAAG 
3001 AGGACAGAAT GGATTAACCC ATTTGGGATT AAGTTCCATT TGTTAGACCA 
3051 GGATTGGGAC CCACTGAAAG ACAGGCAATT AACAAAGGCA AATTAGCCCT 
3101 CCTTGCAGGC ACACAATGGG CAACTGGGGT TAGATAGAGA TTGAGCACTT 
3151 CTTTCTGATT AGATAAATGA CCTCTTATCT TTGACCCCTT ATCTGACCCC 
3201 GTCACAGCA6 GAAAAGGGTT TTTAAATAAA CAACTTTCTT CCAGGGAGGA 
3251 GGACCTCAGG ACTCCCCGCC CCCTTTATTT AGTGGAAATG TCAACATTTC 
3301 CACATAGCAG GTGTCTCTGT CTTTGGCATC TGAGGGAGAA GGATCATCAT 
3351 GAGTAACCCC CTCCTGCTCT TACAGGGCCA GTCTGAGATG GCTTAAGGGA 
3401 CTTCCAGGGG AGGTGGGTAG GGGCAAAGCT TGTGGCAGGC CTAGGGTCCA 
34 51 CCTTGGCCAG CTCCTTCAGA TCACCACCTT GCCTGGGGCT GCCCAGCCAA 
3501 ATGCCTGCTG CCCACCAGGG TGCTGCGCCT CACTCGCCTG TCGCCTGAGA 
3551 TCGTGGAGCT CAGCGAGCCG CTGCAGGTTG TTCGATATGG TGAGGGGGGC 
3601 CACTACCATG CCCACGTGGA CAGTGGGCCT GTGTACCCAG AGACCATCTG 
3651 CTCCCATACC AAGCTGGTAG CCAACGAGTC TGTACCCTTC GAGACCTCCT 
3701 GCCGGCAAGT ATCTCCCAAC TGGGGGCTGC CTTCAATCCT CAGACCAGGA 
3751 ACACCCATGA CACAGGCACA GCCCTGCACT GTGGGCGTGC CCCTTGGCAT 
3801 GGGGCCAGGA GATCACTGGG TTATCCCGGT TAGTGATGCC CTCACCTCTC 
3851 CCCACAAGTT GTTTACCCAA TGGCTGGAAA GGGGTGGCTA CTGGTCATCG 
3901 TGACCACTGG AGTCAACACA GACTGATGTA CCCACAGACA CCAAAACTTG 
3951 CCCCCTGAGT TCTGAAGCAA GGGGCAAGGC TGGGCCCCTA GCTTGTCCTG 
4001 CCCATTCCTC CAGGTGTTGA TCTTGATTCC ACTTAGAGAA GCTGAAGCTG 
4051 TGCCTCCCTC CCCTGTCAAG CCAGTTCTTT CCTCTTCAGG TGGCTGTTCT 
4101 GGCCCAGCCC CTTCCCATCC CCAAGGAGCC CTTCAGCGCG CCCTGTTGCT 
4151 TCTGCTAGCC TACCTTTCCC TGCCAGGCCC TTGCTCAGGG CCATGGCATT 
4201 TAACTAAGTG CACCTGTGAT CTTGGCCAAA AAACCATTGC AACTCACAGT 
4251 AAGAGACTGG GTTTCGGGGA AGGAGGGGCT AGGGACATTT TGGCACTGGC 
4 301 CTGCCCTATT GTCTCCCATC CTAGTCTGTC CTGGTCCCTG GCAACAGGAA 
4351 CCTGGGCAGC TTATCCTGCC CACAGGTAAG CCCCTGGGAG CATCCACAAC 
4 4 01 TGGGGACCTG CTCAGTGCCC CCCCTGCCTT ACAGCTACAT GACAGTGCTG 
4 451 TTTTATTTGA ACAACGTCAC TGGTGGGGGC GAGACTGTTT TCCCTGTAGC 
4501 AGATAACAGA ACCTACGATG AAATGGTAAG G6TCAACTGG GCTATTACTC 
4551 TTGTGGGCTG GCAGGGGCTT AGACAAGTGA AGTACACACC TCTCCAGGTC 
4 601 TAAGGATGTG GGCCCAAATT ATTCCTTGGG CATATCTGGT TGGTTTCCCT 
4 651 TTGGTCACCC TTGGCTGGCC TGGCCATAGA GTGGGGACAG GTTGAACACC 
4701 CCACCACCCT GCTGCCCACA GAGTCTGATT CAGGATGACG TGGACCTCCG 
4751 TGACACACGG AGGCACTGTG ACAAGGGAAA CCTGCGTGTC AAGCCCCAAC 
4801 AGGGCACAGC AGTCTTCTGG TACAACTACC TGCCTGATGG GCAAGGTTGG 
4851 GTGGGTGACG TAGACGACTA CTCGCTGCAC GGGGGCTGCC TGGTCACGCG 
4 901 CGGCACCAAG TGGATTGCCA ACAACTGGAT TAATGTGGAC CCCAGCCGAG 
4951 CGCGGCAAGC GCTGTTCCAA CAGGAGATGG CCCGCCTTGC CCGAGAAGGG 
5001 GGCACCGACT CACAGCCCGA GTGGGCTCTG GACCGGGCCT ACCGCGATGC 
5051 GCGCGTGGAA CTCTGAGGGA AGAGTTAGCC CCGGTTCCCA GCCGCGGGTC 
5101 GCCAGTTGCC CAAGATCAGG GGTCCGGCTG TCCTTCTGTC CTGCTGCAGA 
5151 CTAAAGGTCT GGCCAATGTC TTGCCCCACC CCGCCAGCCG CGATACGGCG 
5201 CAGTTCCTAT ATTCATGTTA TTTATTGTGT ACTGACTCCA TCTGCCCCGT 
5251 CAAATAAAAA ACCACAAGGT TCGAAAAAAA AAAAAAAAAA G6 


BLAST Results 


Entry HSU64252 from database EMBL; 
Human STS sequence NOTI-225. 

Score = 959, P = 1.2e-36, identities = 195/199 


Medline entries 


No Medline entry 


Peptide information for frame 1 


ORF from the beginning to 351 bp; peptide length: 118 
Category: questionable ORF 
Classification: no clue 

1 LPLVYALMVP LLSASTLGTL ASDLESVQLC PTATQLGKRS PSVGWGSRRR 
51 KAEPGADAGG SGRAQHPQAP SPSDRGARGP GGRCPGDCAA RAPPRPLPWA 
101 RARPGCHGGS GGDRPAA 


BLASTP hits 
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No BLASTP hits available 

Alert BLASTP hits for DKrZphutel_2ldl5, frame 1 
No Alert BLASTP hits found 

Peptide information for frame 2 


ORF from 320 bp to 892 bp; peptide length: 191 
Category; putative protein 
Classification: no clue 

1 MAAAAVTGQR PETAAAEEAS RPQWAPPDHC QAQAAAGLGD GEDAPVRPLC 
51 KPRGICSRAY FLVLMVFVHL YLGNVLALLL EVHYSNGDES SDPGPQHRAQ 
101 GPGPEPTLGP LTRLEGIKVR TSLPRRAPGP ARLSPRGPAL SPGPHAAPGA 
151 ALRRSRALPL TRLLSLSGPG RLGPFWAARS GAPAARCAPF P 


BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_21dl5, frame 2 

PIR:EDBE75 immediate-early protein IE175 - human herpesvirus 1, N = 2 
Score = 106, p = 0.0067 

>PIR:EDBE75 immediate-early protein IE175 - human herpesvirus 1 
Length = 1,298 

HSPs : 

Score = 106 (15.9 bits). Expect = 6.7e-03, Sum P{2) « 6.7e-03 

Identities = 36/103 (34%), Positives = 44/103 (42%) 

Query: 87 GDESSDPGPQHRAQGPGPEPTLGPLTRLEGIKVRTSLPRRA-PGPARLS-PRGPALSPGP 144 

G + PGP G GP P P T+ G S R P PA S P GP +P 

Sb3Ct: 726 GRKRKSPGPARPPGGGGPRP PKTKKSGADAPGSDARAPLPAPAPPSTPPGPEPAPAQ 782 

Query: 145 HAAPGAALRRSRALPLT-RLLSLSGPGRLGPFWAARSGAPAARCAP 189 

AAP AA ++R P+ GP LG W + P+ AP 

Sbjct: 783 PAAPRAAAAQARPRPVAVSRRPAEGPDPLGG-WRRQPPGPSHTAAP 827 

Score « 40 (6.0 bits). Expect = 6.7e-03, Sum P(2) - 6.7e-03 
Identities - 8/21 (38%), Positives » 9/21 (42%) 

Query: 28 DHCQAQAAAGLGDGEDAPVRP 48 

DH + A G G AP P 
Sbjct: 212 DHAREARAVGRGPSSAAPAAP 232 

Pedant information for DKFZphutel_21dl5, frame 1 

Report for DKF2phutel_21dl5. 1 

(LENGTH! 117 

{MW] 11797.32 

[pD 10.68 

tKWj Irregular 

{KW] SIGNAL_PEPTIDE 22 

[KW] LOW^COMPLEXITY 38.46 % 

SEQ LPLVYALMVPLLSASTLGTLASDLESVQLCPTATQLGKRSPSVGWGSRRRKAEPGADAGG 
SEG xxxxxxxxxxxxxx 

PRD cccccccccccccccccccchhhhhhhhcccccccccccccccccccccccccccccccc 

SEQ SGRAQHPQAPSPSDRGARGPGGRCPGDCyUlRAPPRPLPWARARPGCHGGSGGDRPAA 

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

(No Prosite data available for DKFZphutel_21dl5.1) 
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Pedant information for DKFZphutel_21dl5, frame 2 
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Report for DKFZphutel_21dl5.2 

[LENGTH] 191 

fMW) 19916.88 

Cpl) 10.43 

fKWl TRANSMEMBRANE 1 

(KW) LCW COMPLEXITY 29.84 % 


SEQ MAAAAVTGQRPETAAAEEASRPQWAPPDHCQAQAAAGLGDGEDAPVRPLCKPRGICSRAY 

SEG 

PRD ccceeeeccccchhhhhhhhhccccccchhhhhhhhcccccccccccccccccccchhhh 

MEM 

SEQ rLVLMVFVHLYLGNVLALLLFVHYSNGDESSDPGPQHRAOGPGPEPTLGPLTRLEGIKVR 

SEG xxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhccccccccccccccccccccccccccccccccceeeeee 

MEM MMMMMMMMMMMMMMMMM 

SEQ TSLPRRAPGPARLSPRGPALSPGPHAAPGAALRRSRALPLTRLLSLSGPGRLGPFWAARS 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxx . .xxxx 

PRD eeccccccccccccccccccccccccccchhhhhhhcccccceeecccccccchhhhhhc 

MEM 

SEQ GAPAARCAPFP 

SEG xxxxxxxxx . . 

PRD ccccccccccc 

MEM 

(No Prosite data available for DKFZphutel_21dl5 .2) 
(No Pfam data available for DKFZphutel_21dl5.2) 
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DKFZphuCel_22d2 


group: signal transduction 


DKFZphutel_22d2 encodes a novel 580 amino acid putative GTP-binding protein related to the ras 
protein. Additionally, the putative protein contains an EF-hand for calcium-binding. 

G-proteins are involved in various signal transduction pathways, transferring the signal of a 
cellular receptor to an intracellular signal cascade. 

The new protein can find clinical application in modulating/blocking the response to a 
cellular receptor. 


similarity to GTP-binding proteins 

complete cDNA, complete cds, potential start at Bp 64, EST hits 
complete cds according to K08F11.5 and yAL048c 


Sequenced by BMFZ 
Locus: /map="17" 

Insert length: 3247 bp 

Poly A stretch at pos. 3230, no polyadenylation signal found 


1 CTCCTGGTGA GAGGAGTCCA CTCCGTGCGT GCGGGCGGAG GCCGGCCCCC 
51 GAGAGCCGCC GACATGAAGA AAGACGTGCG GATCCTGCTG GTGGGAGAAC 
101 CTAGAGTTGG GAAGACATCA CTGATTATGT CTCTGGTCAG TGAAGAATTT 
151 CCAGAAGAGG TTCCTCCCCG GGCAGAAGAA ATCACCATTC CAGCTGATGT 
201 CACCCCAGAG AGAGTTCCAA CACACATTGT AGATTACTCA GAAGCAGAAC 
251 AGAGTGATGA ACAACTTCAT CAAGAAATAT CTCAGGCTAA TGTCATCTGT 
301 ATAGTGTATG CCGTTAACAA CAAGCATTCT ATTGATAAGG TAACAAGTCG 
351 ATGGATTCCT CTCATAAATG AAAGAACAGA CAAAGACAGC AGGCTGCCTT 
401 TAATATTGGT TGGGAACAAA TCTGATCTGG TGGAATATAG TAGTATGGAG 
451 ACCATCCTTC CTATTATGAA CCAGTATACA GAAATAGAAA CCTGTGTGGA 
501 GTGTTCAGCG AAAAACCTGA AGAACATATC AGAGCTCTTT TATTACGCAC 
551 AGAAAGCTGT TCTTCATCCT ACAGGGCCCC TGTACTGCCC AGAGGAGAAG 
601 GAGATGAAAC CAGCTTGTAT AAAAGCCCTT ACTCGTATAT TTAAAATATC 
651 TGATCAAGAT AATGATGGTA CTCfCAATGA TGCTGAACTC AACTTCTTTC 
701 AGAGGATTTG TTTCAACACT CCATTAGCTC CTCAAGCTCT GGAGGATGTC 
751 AAGAATGTAG TCAGAAAACA TATAAGTGAT GGTGTGGCTG ACAGTGGGTT 
801 GACCCTGAAA GGTTTTCTCT TTTTACACAC ACTTTTTATC CAGAGAGGGA 
851 GACACGAAAC TACTTGGACT GTGCTTCGAC GATTTGGTTA TGATGATGAC 
901 CTGGATTTGA CACCTGAATA TTTGTTCCCC CTGCTGAAAA TACCTCCTGA 
951 TTGCACTACT GAATTAAATC ATCATGCATA TTTATTTCTC CAAAGCACCT 
1001 TTGACAAGCA TGATTTGGAT AGAGACTGTG CTTTGTCACC TGATGAGCTT 
1051 AAAGATTTAT TTAAAGTTTT CCCTTACATA CCTTGGGGGC CAGATGTGAA 
1101 TAACACAGTT TGTACCAATG AAAGAGGCTG GATAACCTAC CAGGGATTCC 
1151 TTTCCCAGTG GACGCTCACG ACTTATTTAG ATGTACAGCG GTGCCTGGAA 
1201 TATTTGGGCT ATCTAGGCTA TTCAATATTG ACTGAGCAAG AGTCTCAAGC 
1251 TTCAGCTGTT ACAGTGACAA GAGATAAAAA GATAGACCTG CAGAAAAAAC 
1301 AAACTCAAAG AAATGTGTTC AGATGTAATG TAATTGGAGT GAAAAACTGT 
1351 GGGAAAAGTG GAGTTCTTCA GGCTCTTCTT GGAAGAAACT TAATGAGGCA 
1401 GAAGAAAATT CGTGAAGATC ATAAATCCTA CTATGCGATT AACACTGTTT 
1451 ATGTATATGG ACAAGAGAAA TACTTGTTGT TGCATGATAT CTCAGAATCG 
1501 GAATTTCTAA CTGAAGCTGA AATCATTTGT GATGTTGTAT GCCTGGTATA 
1551 TGATGTCAGC AATCCCAAAT CCTTTG7VATA CTGTGCCAGG ATTTTTAAGC 
1601 AACACTTTAT GGACAGCAGA ATACCTTGCT TAATCGTAGC TGCAAAGTCA 
1651 GACCTGCATG AAGTTAAACA AGAATACAGT ATTTCACCTA CTGATTTCTG 
1701 CAGGAAACAC AAAATGCCTC CACCACAAGC CTTCACTTGC AATACTGCTG 
1751 ATGCCCCCAG TAAGGATATC TTTGTTAAAT TGACAACAAT GGCCATGTAT 
1801 CCGTAAGTAC TTGCTGTCTT CATTTTCATG TTGCATGGTT CATAACATTG 
1851 CATGCCATTA TTAGCCATGA AGGGAATATC TTTGTCACAT AGGAATTGTT 
1901 CAGCAACAGA AAGATACTTT GTAATGAGAA GGTACAAATT TGAGTAAATG 
1951 CAAGTTTGGT TTGAATGCCA TAATAAAATG ATATAAACAG TGCTTCTGAC 
2001 AATATCTGTA TATTTTTGAG CAGGCTGTAA CTATCTTAAT AGAATAGTAC 
2051 AATAAAACAC AACCCCCCAC CCAGCATTAA AAAATAGTTT TACTGGAATA 
2101 AAATGGGTTT GGCATCATGT TGTTTTATGC TTATAAAGCA TTTTCATATG 
2151 AACAGAAAGT TTATATTTTT CTGTTTTTGA CCTTAGGTAT ATGAAGTTTT 
2201 CTAAAATATT TTATTAATTT ATGTTGAAAT TGTGGGTATG CTTCAGTTAG 
2251 GATATGTCTT TTTTAAGTGC TGTAAAGAGT AGTTGTAATT GGAATTTCTA 
2301 CTGTATAAAT GTTTTACATT AAGTGTTACG AGCCACAAAT TTCATGTACA 
2351 TTTATTATAT ATCTATACAT GCATATGCAC AAGCACATAA CTGTGGTCAT 
2401 CTCTGTAGTT TACTAACTGC CTTAAAATTG CATGGTTCTT AATGGCATTC 
24 51 GCCTCAAGTA GTGTGTTTGT ATAAATTCTG TTTTGTAACA AAATAGTTTT 
2501 TCAGGCAGTG CGTTTCTCAG GACTTTATAG CTTATTCTAC TTATTCTTAT 
2551 GTTAGTCTCT AAATTATTTT TCTTCTTATG AAAACTACAG TGTAACACAG 
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2601 AGTAATAATC AAACATTGCT ATAAACCAAG AATGACATTT TTCAAAAAGG 
2651 TGTTGATTTG TACAGATTTT TAAAGTCAGT TAACTTTACT GCTATTTTAT 
2701 TACCTAATAC TTTTTTTAGA TGCAACAAAC CCTTGAATTT CTATTTGTAT 
2751 TCGAAGACAA GTCATTCCTA TTATTATAGA ATAACCAAAA CCTTATTTAT 
2801 GTTTTACCTT TGCTTTAAAA CTCTCATGTA TGTTATCTAC AGAGAGGATC 
2851 ATTACAGAGA CAGACTCTCC CGAGACATGG GCCACACTGA TAGAATAGAG 
2901 AATTTGAGAA AAATCTGGGT CTTTCTAAAA ACTGCTTTGT AAGTTACTTT 
2951 TTCTTTATGA CTTCTGTGGG ATTTTGTTGA TATTTTCTTA GAGAATGACC 
3001 AAATCTCCTT TCTTGCCATA ATTAACATTT AGTAATTATG TAGAAACGCA 
3051 CTGCTTGGTC AGGCTTCCTG CCTAGCTATA TATTACGTTG TCTTCCTTAC 
3101 TACATAAATG TACTTCTTTA ATCTTGTGAT TACAGTAACT GCAAGTGTGT 
3151 TTTTACATCT GCATTTTTAA AACATTTTAC TGTAATTCTG TTGTGTGTGT 
3201 GTGTGTTATA TGATAAATGT ACATACATGG AAAAAAAAAA AAAAAAA 


BLAST Results 


Entry AC004527 from database EMBL: 

*** SEQUENCING IN PROGRESS *** MFl-related locus. Direct Submission; 

HTGS phase 1, 10 unordered pieces. 

Score = 1899, P = l.le-78, identities = 387/396 

Entry HS148355 from database EMBL: 
human STS SHGC-31220. 
Score - 1826. P = 7.5e-78, identities = 388/406 


Medline entries 


No Medline entry 


Peptide information for frame 1 


ORF from 64 bp to 1803 bp; peptide length: 580 
Category: similarity to known protein 


1 MKKDVRILLV GEPRVGKTSL IMSLVSEEFP EEVPPRAEEI TIPADVTPER 
51 VPTHIVDYSE AEQSDEQLHQ EISQANVICI VYAVNNKHSI DKVTSRWIPL 
101 INERTDKDSR LPLILVGNKS DLVEYSSMET ILPIMNQYTE lETCVECSAK 
151 NLKNISELFY YAQKAVLHPT GPLYCPEEKE MKPACIKALT RIFKISDQDN 
201 DGTLNDAELN FFQRICFNTP LAPQALEDVK NVVRKHISDG VAD3GLTLKG 
251 FLFLHTLFIQ RGRHETTWTV LRRFGYDDDL DLTPEYLFPL LKIPPDCTTE 
301 LNHHAYLFLQ STFDKHDLDR DCALSPDELK DLFKVFPYIP WGPDVNNTVC 
351 TNERGWITYQ GFLSQWTLTT YLDVQRCLEY LGYLGYSILT EQESQASAVT 
401 VTRDKKIDLQ KKQTQRNVFR CNVIGVKNCG KSGVLQALLG RNLMRQKKIR 
451 EDHKSYYAIN TVYVYGQEKY LLLHDISESE FLTEAEIICD VVCLVYDVSN 
501 PKSFEYCARI FKQHFMDSRI PCLIVAAKSD LHEVKQEYSI SPTDFCRKHK 
551 MPPPQAFTCN TADAPSKDIF VKLTTMAMYP 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKF2phutel_22d2, frame 1 

TREMBL:CEUK08F11_3 gene: "KOSFU.S"; Caenorhabditis elegans cosmid 
K08F11., N = 1, Score = 1357, P = l.le-138 

TREMBL:SPCC320_4 gene: "SPCC320 . 04c"; product: "hypothetical protein" 
S.pombe chromosome III cosmid c320., N = 1, Score = 889, p - 4.4e-89 

TREMBL:CEUC47C12_3 gene: "C47C12.4"; Caenorhabditis elegans cosmid 
C47C12., N = 2, Score « 408, P = 5.6e-74 

PIR:S51971 probable membrane protein YAL048c - yeast (Saccharomyces 
cerevisiae), N = 1, Score = 677, p * 1.3e-66 


>TREMBL:CEUK08ril_3 gene: "KOSFILS"; Caenorhabditis elegans cosmid 
K08F11. 

Length =625 

HSPs: 


512 


wo 01/12659 


PCT/IBOO/01496 


Score = 1357 (203.6 bits). Expect = l.le-138, P = l.le-138 
Identities = 263/582 (45%), Positives ^ 380/582 (65%) 

Query: 4 DVRILLVGEPRVGKTSLIMSLVSEEFPEEVPPRAEEITIPADVTPERVPTHIVDYSEAEQ 63 

DVRI+L+G+ GKTSL+MSL+ +E+ + VP R + + IPADVTPE V T IVD S E+ 
Sbjct: 9 DVRIVLIGDEGCGKTSLVMSLLEDEWVDAVPRRLDRVLIPADVTPENVTTSIVDLSIKEE 68 

Query: 64 SDEQLHQEISQANVICIVYAVNNKHSIDKVTSRWIPLINERTDKDSRLPLILVGNKSDLV 123 

+ + EI QAtJVIC+Vy+V ++ ++D + ++W+PLI + + P+ILVGNKSD 
Sbjct: 69 DENWIVSEIRQANVICVVYSVTDESTVDGIQTKWLPLIRQSFGEYHETPVILVGNKSDGT 128 

Query: 124 EYSSMETILPIMNQYTEIETCVECSAKNLKNISELFYYAQKAVLHPTGPLYCPEEKEMKP 183 

++ + ILPIM TE+ETCVECSA+ +KN+SE+FYYAQKAV++PT PLY + K++ 
Sbjct: 129 A'NNTDKILPIMEANTEVETCVECSARTMKNVSEIFYYAQKAVIYPTRPLYDADTKQLTD 187 

Query: 184 ACIKALTRIFKISDQDNDGTLNDAELNFFQRICFNTPLAPQALEDVKNVVRKHISDGVAD 243 

KAL R+FKI D+DNDG L+D ELN FQ++CF PL ALEDVK V OGVA+ 
Sbjct: 188 RARKALIRVFKICDRDNDGYLSDTELNDFQKLCFGIPLTSTALEDVKRAVSDGCPDGVAN 247 

Query: 244 SGLTLKGFLFLHTLFIQRGRHETTWTVLRRFGYDDDLDLTPEYLFPLLKIPPDCTTELNH 303 

L L GFL+LH LFI+RGRHETTW VLR+FGY+ L L+ +YL+P + IP C+TEL+ 
Sbjct: 248 DSLMLAGFLYLHLLFIERGRHETTWAVLRKFGYETSLKLSEDYLYPRITIPVGCSTELSP 307 

Query: 304 HAYLFLQSTFDKHDLDRDCALSPDELKDLFKVFPYIPWGPDVKNTVCTNERGWITYQGFL 363 

F+ + F+K+D D+D LSP EL-I-+LF VP D + TN+RGW+TY G++ 

Sbjct: 308 EGVQFVSALFEKYDEDKDGCLSPSELQNLFSVCPVPVITKDNILALETNQRGWLTYNGYM 367 

Query: 364 SQWTLTTYLDVQRCLEYLGYLGYSILTEQESQAS AVTVTRDKKIDLQKKQTQRNVF 419 

+ W +TT +++ + E L YLG+ + +A ++ VTR++K DL+ T R VF 

Sbjct: 368 AYWMMTTLINLTQTFEQLAYLGFPVGRSGPGRAGNTLDSIRVTRERKKDLENHGTDRKVF 427 

Query: 420 RCNVIGVKNCGKSGVLQALLGRNLMRQKKIREDHKSYYAINTVYVYGQEKYLLLHDI 476 

+C V+G K+ GK+ +Q+L GR + +1 H S + IN V V + KYLLL ++ 
Sbjct: 428 QCLWGAKDAGKTVFMQSLAGRGMADVAQIGRRH-SPFVINRVRVKEESKYLLLREVDVL 486 

Query: 477 SESEFLTEAEIICDVVCLVYDVSNPKSFEYCARIFKQHFMDSRIPCLIVAAKSDLHEVKQ 536 
S + L E DVV +YD+SNP SF +CA +++++F ++ PC+++A K + EV Q 

Sbjct: 487 SPQDALGSGETSADVVAFLYDISNPDSFAFCATVYQKYFYRTKTPCVMIATKVEREEVDQ 546 

Query: 537 EYSISPTDFCRKHKMPPPQAPTCNTADAPSKDIFVKLTTMAMYP 580 

+ + P +FCR+ ++P P F+ S IF +L MA+yP 

Sbjct: 547 RWEVPPEEFCRQFELPKPIKFSTGNIGQSSSPIFEQLAMMAVYP 590 


Pedant information for DKFZphutel_22d2, frame 1 


Report for DKFZphutel_22d2.1 


[LENGTH I 
IMW) 
(pI) 
[HOMOL] 
149 

(FONCAT) 
(FUNCAT) 
3e-ll 
[FUNCAT] 
cerevisiae^ 
[FUNCAT J 
[ FUNCAT 1 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
8e-09 
[FUNCAT] 
8e-09 
[FUNCAT) 
[FUNCAT) 
[FUNCAT) 
(FUNCAT] 
[FUNCAT] 
[FUNCAT] 
9e-08 
[FUNCAT] 
YFLOOSw] 9e-08 
[FUNCAT] 
[FUNCAT] 


580 

66541.61 
5.56 

TREMBL : CEUK08F11 


3 gene: •*K08F11.5"; Caenorhabditis elegans cosmid K08F11. le- 

99 unclassified proteins [S. cerevisiae, YAL048c) 5e-81 

03,04 budding, cell polarity and filament formation (S. cerevisiae, YKR055wJ 

03.99 other cell growth, cell division and dna synthesis activities [S. 
YNL098C] 8e-09 

10.04.07 g-proteins [S. cerevisiae, YNL098cl 8e-09 
03.10 sporulation and germination (S. cerevisiae, YNL098c] 8e-09 
11.01 stress response (S. cerevisiae, YNL098cJ 8e-09 
03.22 cell cycle control and mitosis {S. cerevisiae, YNL098c] 8e-09 
01.03.13 regulation of nucleotide metabolism (S. cerevisiae, YNL098c] 


01.05.04 regulation of carbohydrate utilization 


[S. cerevisiae, YNL098C] 


30.03 organization of cytoplasm [S. cerevisiae, YORlOlw) 4e-08 
11.10 cell death [S. cerevisiae, YORlOlw] 4e-08 

10.02.07 g-proteins [S. cerevisiae, YPR165w] 7e-08 

30.04 organization of cytos)celeton [S. cerevisiae, YPR165w] 7e-08 
30.08 organization of golgi [s. cerevisiae, YPR165wJ 7e-08 

08.07 vesicular transport (golgi networJt, etc.) [S. cerevisiae, YFLOOSw] 


30.09 organization of intracellular transport vesicles 


(S. cerevisiae. 


30.02 organization of plasma membrane [S. cerevisiae, YFLOOSw] 9e-08 

08.13 vacuolar transport IS. cerevisiae, YNL093w] le-07 
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I FUNCAT J 

06.04 protein tarQetinq. sortina and translocation f*5 r*#»rpwi «i j»a vMrno'^Lji 

le-07 

[FUNCAT J 

08.19 cellular j.inpor't [S. cersvisias^ YNL093w] ls~07 

[FUNCAT] 

10.05.07 a-oroteins fS. cerevi <? i a** VTa99Qr-i Pa-n7 

( FUNCAT i 

03.07 □h&l'OmOnP I*@SDOnS&. mat inO— tl/n^ Hal*0**in'i nat*! rvn a^v,.cr\e^r^i f i ^ v<k v-^*. A i 

^Kcxuiuv^iic L^^^yjii^^f iiiciuxii^ ^jrH*- tic tB tulxllel t JLOIl/ 56X SpBCXIXC prOCGmS 

[S. 

cerevisiae, YLR229c] 8e-07 

[FUNCAT] 

10.99 other sianal-transduction activiti*i«s f<s <*>AV'ot/{ c-i vr^DftOT^-i 'Sa—ac 

( FUNCAT] 

09-09 bioaenesis of intracellular tr■ansnnr^ u^^-if^i f« r^awawiid^^ 

yGL210wJ 9e 

-04 

[BLOCKS] 

BL00410A Dynamin family proteins 

iscop] 


fSCOP] 

dlcfuaa 3.25.1.3.10 RaplA (Hui&an (Homo sapiens) 5e— 59 

[PIRKW) 

Crdnsmembrane protein le"~79 

[PIRKW] 

IllClliLyx Ol IC 1. i. a 1. L JLX 11^ £.tz \JV 

[PIRKW] 

acetv3at<*fl Amino <^nri '^p'— OQ 

[ PIRKW] 

o T n V/ 1 A t" fs H f^\/c 1" o > nra — nO 

[PIRKW] 

signal transduction le— 07 

r PTRKWl 

dns fominQ protein 3e~09 

rpTQKWI 

I X£VX\¥f j 

xnuiicuj.aL€ ccirxy pxotexn o6 Uo 


alternative splicing 4e— 08 

f PTRKW 1 
I X r\x\ri J 

r j.oop i.e iU 

r PTRKWl 

xxpc>pxoL.cxn /e iu 

r PTRKU 1 
\ r i.rv(x*v J 

pro tO"~onco9ene 3e~09 

fPTRKWl 

methylated carboxyl end 3e'-09 


niembr'ane protein 3e'~'09 

(PIRKW] 

GTP binding le~10 

r P T R KiJ 1 

ttiiolester bond 7e'-10 


ras transfornung protein le~10 

[PROSITE] 

ATP GTP A 2 

(PROSITE) 

MYRISTYL 3 

(PROSITE) 

EF HAND 1 

[PROSITE] 

CAMPPHOSPKO SITE 1 

[PROSITE] 

CK2_PH0SPH0 SITE 14 

(PROSITE] 

TYR PHOSPHO SITE 4 

[PROSITE] 

PKC PHOSPHO SITE 5 

(PROSITE] 

ASN^GLYCOSYLATION 3 

[PFAM] 

Ras family (contains ATP/GTP binding P->loop) 

[KWJ 

Irregular 

[KW] 

3D 


SEQ MKKDVRILLVGEPRVGKTSLIMSLVSEEFPEEVPPRAEEITIPADVTPERVPTHIVDYSE 

Ijai- . , .EEEEEEEETTTTCHHHHHHHHHHCCCCCCCCCCCCEEEEEEEETTEEEEEEEEECCC 

SEQ AEQSDEQLHQEISQANVICIVYAVNNKHSIDKVTSRWIPLINERTDKDSRLPLILVGNKS 

Ijai- CGGGHHHHHHHHHHTTEEEEEEETTTHHHHHHH-HHHHHHHHHHHCTTT-TCEEEEEETT 

SEQ DLVEYSSMETILPIMNQYTEIETCVECSAKNLKNISELFYYAQKAVLHPTGPLYCPEEKE 

Ijai- TTTTTTTTHHHHHHHHHHHCCCE-EECTTTTTTTHHHHHH 

SEQ MKPACIKALTRIFKISDQDNDGTLNDAELNFFQRICFNTPLAPQALEDVKNWRKHISDG 

Xjai- 

SEQ vadsgltlkgflflhtlfiqrgrhettwtvlrrfx;ydddldltpeylfpllkippdctte 

Ijai- 

SEQ lnhhaylflqstfdkhdldrdcalspdelkdlfkvfpyi pwgpdvnntvctnergwityq 

Ijai- 

SEQ gflsqwtlttyldvqrcleylgylgysilteqesqasavtvtrdkkidlqkkqtqrnvfr 

Ijai- 

SEQ cnvigvkncgksgvlqallgrnlmrqkkiredhksyyaintvyvygqekylllhdisese 

Ijai- 

SEQ flteaeiicdwclvydvsnpksfeycarifkqhfmdsripclivaaksdlhevkqeysi 

Ijai- 

SEQ sptdfcrkhkmpppqaftcntadapskdi fvklttmamyp 

Ijai- 


Prosite for DKFZphutel_22d2 . 1 


PSOOOOl 
PSOOOOl 
PSOOOOl 
PS00004 
PS00005 
PS00005 


118->122 
154->158 
346->350 
411->415 
94->97 
105->108 


asn_glycosylation 

AS N_GL YCOS YLATI ON 
ASN_CLYCOSyLATION 
C AMP_PHOS PHO_S I TE 
PKC_PHOSPHO_SITE 
PKC PHOSPHO SITE 


PDOCOOOOl 
PDOCOOOOl 
PDOCOOOOl 
PDOC00004 
PDOC00005 
PDOC00005 
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pennons 

148- 

->151 

PKC PHOSPHO 

SITE 



247- 

->250 

PKC PHOSPHO' 

"site 

Dnrvp A Ann "s 


414- 

->417 

PKC PHOSPHO' 

"site 


pcnnnng 

59->63 

CK2 PHOSPHO SITE 


pcAAnnft 

105->109 

CK2"'PH0SPH0~SITE 


trOUUUUO 

126- 

■>130 

CK2 PHOSPHO'SITE 


pcnnnnc 

139->143 

CK2 PHOSPHO 

SITE 

onrvAAAAK 

pQfifjnnfi 

143- 

->147 

CK2 PHOSPHO' 

'site 

pnnp A AAA*? 

pcnnnn^ 

196- 

•>200 

CK2 PHOSPHO' 

"site 

DfW A A AA<J 

DcAnnn^ 
foUUUUD 

203- 

->207 

CK2 PHOSPHO' 

'site 

OTW AAAAC 


311- 

•>315 

CK2 PHOSPHO' 

"site 

DTW* n A nnc^ 

rLHAoUUUUO 

DC A AAAC 

325- 

■>329 

CK2 PHOSPHO' 

"site 

DrwAAnnc 
rLAX.UUUUO 

DCAAAAC 

370- 

>>374 

CK2 PHOSPHO' 

"site 

OrW^AAAAC 

PSUUUUb 

390- 

->394 

CK2 PHOSPHO' 

"site 

PDOCQUUUb 

PS00006 

477- 

■>481 

CK2 PHOSPHO' 

"site 

PDOC00006 

DC A AA 

483- 

•>487 

CK2 PHOSPHO" 

'site 


PS00006 

541- 

•>545 

CK2 PHOSPHO' 

'site 

PDOC00006 

PS00007 

153->161 

TYR PHOSPHO SITE 

PDOC00007 

PS00007 

376- 

■>384 

tyr'phospho'^site 

PDOC00007 

PS00007 

153- 

'>162 

TYR PHOSPHO SITE 

PDOC00007 

PS00007 

448- 

•>457 

TYR phosphors I TE 

PDOC00007 

PS00008 

240- 

■>246 

MYRISTYL 


PDOC00008 

PS00008 

425- 

>431 

MYRISTYL 


PDOC00008 

PS00008 

433- 

•>439 

MYRISTYL 


PDOC00008 

PS00017 

13 

->19 

ATP GTP A 


PDOC00017 

PS00017 

425- 

>433 

ATP GTP A 


PDOC00017 

PS00018 

197- 

>210 

EF HAND 


PDOC00018 


Pfam for DKFZphutel_22d2 , 1 


HMM_NAME Ras family (contains ATP/GTP binding P-loop) 

H^4M ♦ KLVLIGDSGVGKSCLLI RFTQNe FnEe YI PTIGvDFY tKTI El DGKt I K 

++L+G+ VGK++L ++ EF+EE +P ++ T ++ +++ 
Query 6 RILLVGEPRVGKTSLIMSLVSEEFPEE-VPPR-AEEITIPADVTPERVP 52 

HMM LQIWDTAGQERYRsMRPMYYRGAMGFMLVYDITNRqSFENI r . NWweEI r 

ID E+ + + + 4-A+++ +VY+++N+ S ++++ +W++ 1 + 
Query 53 THIVDYSEAEQSDEQLHQEISQANVICIVYAVNNKHSIDKVTSRWIPLIN 102 

HMM RHCDr DENVPIMLVGHKCDLEDQRQVS tEEGQeFAREWGAI PFMETSAKT 

+ D+D+ P +LVGNK+DL + ++T + +E+SAK+ 

Query 103 ERTDKDSRLPLILVGNKSDLVEYSSMETILPIMNQYTEI-ETCVECSAKN 151 

HMM NiNVEEAFMEIvRellqrMqeqNqteNinidQpsrnrkrCCCIM* 

N+ E F+ + +++L + +++ +++++ + C+ 

Query 152 LKNISELFYYAQKAVLHPT GPLYCPEEKEMK-PACI— 186 
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DKFZphutel_22el2 


group: signal transduction 

DKFZphutel_22el2 encodes a novel 92 amino acid protein, with similarity to yeast, C.elegans, 
Drosophila and manunalian proteins - 

The Drosophila cni and mammalian cornicon proteins are part of a signal transduction pathway 
involving hte EGF-receptor . 

The new protein can find application in modulating the cornichon modulated signal transduction 
way and also the EGF receptor signaling processes. 


strong similarity to S.cerevisiae YGL054C and cornichon 
complete cDNA, complete cds, EST hits 

cornicon is requiered for signal transduction in the EGF-receptor 
signal processing 

Sequenced by BMFZ 

Locus: unknown 

Insert length: 519 bp 

Poly A stretch at pos. 499, no polyadenylation signal found 


1 GTCGGGGCAT CCGAGCGGGT TTGACGGAAG GAGCGGCGGC GACGGAGGAG 
51 GAGGATGGAG GCGGTGGTGT TCGTCTTCTC TCTCCTCGAT TGTTGCGCGC 
101 TCATCTTCCT CTCGGTCTAC TTCATAATTA CATTGTCTGA TTTAGAATGT 
151 GATTACATTA ATGCTAGATC ATGTTGCTCA AAATTAAACA AGTGGGTAAT 
201 TCCAGAATTG ATTGGCCATA CCATTGTCAC TGTATTACTG CTCATGTCAT 
251 TGCACTGGTT CATCTTCCTT CTCAACTTAC CTGTTGCCAC TTGGAATATA 
301 TATCGTATGA TCTTAGCTTT GATAAATGAC TGAAGCTGGA GAAGCCGTGG 
351 TTGAAGTCAG CCTACACTAC AGTGCACAGT TGAGGAGCCA GAGACTTCTT 
401 AAATCATCCT TAGAACCGTG ACCATAGCAG TATATATTTT CCTCTTGGAA 
451 CAAAAAACTA TTTTTGCTGT ATTTTTACCA TATAAAGTAT TTAAAAAACA 
501 TGAAAAAAAA AAAAAAAAA 


BLAST Results 


No BLAST result 


Medline entries 


95300228: 

cornichon and the EGF receptor signaling process are necessary for both 
anterior-posterior 

and dorsal-ventral pattern formation in Drosophila. 


Peptide information for frame 1 


ORF from 55 bp to 330 bp; peptide length: 92 
Category: strong similarity to known protein 


1 MEAVVFVFSL LDCCALIFLS VYFIITLSDL ECDYINARSC CSKLNKWVIP 
51 ELIGHTIVTV LLLMSLHWFI FLLNLPVATW NIYRMILALI ND 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKF2phutel_22el2, frame 1 

PIR:S64058 probable membrane protein yGL054c - yeast (Saccharomyces 
cerevisiae) , N = 2, Score = 185, P = 5.7e-17 

TREMBL:SPAC2C4_5 gene: '•SPAC2C4 .05"; product: "cornichon homolog"; 
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S.pombe chromosome I cosmid c2C4 . , N = 1, Score - 163, P « 3.7e-12 

PIR:S4 6084 probable membrane protein YBR210w - yeast (Saccharomyces 
cerevisiae), N = 1, Score = 162, P = 4.8e-12 

TREMBL:AF104398_1 product: ••cornichon"; Homo sapiens cornichon mRNA, 
complete cds., N = 1, Score - 141, P = 8e-10 

SWISSPROT:CNI_DROVI CORNICHON PROTEIN., N = 1, Score = 139, P « 1.3e-09 

>PIR:S64058 probable membrane protein yGL054c - yeast (Saccharomyces 
cerevisiae) 

' Length - 138 

HSPs: 

Score = 185 (27.8 bits). Expect = 5.7e-17, Sum P{2) - 5.7e-17 
Identities = 35/85 (41%), Positives = 56/85 (65%) 

Query: 1 MEAVVFVFSLLDCCALIFLSVYFIITLSDLECDYINARSCCSKLNKWVIPELIGHTIVTV 60 

M A +F+ +++ C +F V+F I +DLE DYIN CSK+NK + PE H +++ 
Sbjct: 1 MGAWLFILAWVNCINLFGQVHFTILYADLEADYINPIELCSKVNKLITPEAALHGALSL 60 

Query: 61 LLLMSLHWFIFLLNLPVATWNI YRM 85 

L L++ +WF+FLLNLPV +N+ ++ 
Sbjct: 61 LFLLNGYWFVFLLNLPVLAYNLNKI 85 

Score = 31 (5-6 bits). Expect = 5.7e-17, Sum P(2> = 5.7e-17 
Identities = 7/9 (77%), Positives = 9/9 (100%) 


Query: 82 lYRMILALI 90 

+YRMI+ALI 
Sbjct: 123 LYRMIMALI 131 


Pedant information for DKF2phutel_22el2, frame 1 


Report for DKFZphutel_22el2 . 1 


(LENGTH) 92 

(MWJ 10614.98 

[pll 5.04 

(HCMOL] PIR:S64058 probable membrane protein YGL054c - yeast (Saccharomyces cerevisiae) 
5e-14 

[FUNCAT] 03.04 budding, cell polarity and filament formation [S. cerevisiae, YGL054cJ 
2e-15 

(PIRKWJ transmembrane protein 2e-ll 

(PROSITEl CK2_PH0SPH0_SITE 3 

[KW) SIGNAL_PEPTIDE 33 

(KW] TRANSMEMBRANE 2 


SEQ MEAVVFVFSLLDCCALIFLSVYFIITLSDLECDYINARSCCSKLNKWVIPELIGHTIVTV 
PRD ccchhhhhhhhhhhhhhhhhhhheeeccccccccccccccccccceeehhhhhhhhhhhh 
MEM MMMMMHMMMM 


SEQ LLLMSLHWFIFLLNLPVATWNI YRMILALIND 
PRD hhhhhhhheeecccccchhhhhhhhhhhhccc 
MEM MMMMMMMMKMMMMMMMMMM . .MMMMMMM 


Prosite for DKFZphutel_22el2 , 1 

PS00006 9->13 CK2_PH0SPH0_SITE PDOC00006 

PS00006 26->30 CK2_PHOSPH0_SITE PDOCO0O06 

PS00006 28->32 CK2 PHOSPHO SITE PDOC00006 


(No Pfam data available for DKF2phutel_22el2. 1) 


517 


wo 01/12659 


PCT/lBOO/01496 


DKF2phutel_22n2 


group: uterus derived 

DKFZphutel_22n2 encodes a novel 304 amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of uterus-specific 
genes . 


unknown 

complete cDNA, complete cds, EST hits 
Sequenced by BMFZ 

Locus: /map-"553.3 cR from top of Chrll linkage group" 

Insert length: 1556 bp 

Poly A stretch at pos. 1534, no polyadenylation signal found 


1 ACAACAGGCT GGTTGCTTGG CGTGGAATCC TAAAGTGGCC TGGCTTTGAG 
51 ACTGGAGTGA GACCCCAGCC CTAGGCTGGG GTTCTTTCCA TTATAGAGGA 
101 GACGGATTCA GAAGGGCTAC AGACCAAGGT TGTTGAAAAC CAGACATATG 
151 ATGAGCGTCT AGAGATTAAC GACTCCGAAG AGGTTGCAAG TATTTATACT 
201 CCAACCCCAA GACACCAAGG ACTTCCTCGT TCTGCCCATC TTCCTAACAA 
251 GGCTATGGCT GATAACAGCA GTGATGAGTG TGAAGAGGAA AATAACAAGG 
301 AGAAGAAGAA GACCTCACAG TTGACACCTC AACGGGGCTT TAGTGAAAAT 
351 GAGGATGACG ATGATGATGA TGATGATTCA TCTGAAACTG ATTCTGATTC 
401 TGATGATGAT GATGAAGAGC ATGGAGCCCC TCTGGAAGGG GCCTATGACC 
451 CTGCAGACTA TGAGCATTTG CCAGTTTCTG CTGAAATTAA GGAACTCTTC 
501 CAGTACATCA GTAGGTACAC ACCTCAGTTG ATTGACCTGG ACCACAAACT 
551 GAAGCCTTTC ATTCCTGATT TTATCCCAGC TGTCGGGGAT ATTGATGCAT 
601 TCTTAAAGGT CCCACGTCCT GATGGAAAGC CTGACAACCT TGGCCTATTG 
651 GTATTGGATG AACCTTCTAC AAAGCAGTCA GACCCTACGG TGCTCTCACT 
701 CTGGTTAACA GAGAATTCTA AGCAGCACAA CATCACACAA CATATGAAAG 
751 TAAAAAGCCT AGAAGATGCA GAAAAGAATC CCAAAGCCAT TGACACGTGG 
801 ATTGAGAGCA TCTCTGAATT ACACCGTTCT AAGCCCCCTG CGACTGTGCA 
851 CTACACCAGG CCCATGCCCG ACATTGACAC GCTGATGCAG GAATGGTCCC 
901 CGGAGTTTGA AGAGCTTTTG GGCAAGGTAA GCCTGCCCAC GGCAGAGATT 
951 GATTGCAGCC TGGCAGAGTA CATTGACATG ATCTGTGCCA TTCTAGACAT 
1001 CCCTGTCTAC AAGAGTCGGA TCCAGTCCCT CCATCTGCTC TTTTCCCTCT 
1051 ACTCAGAATT CAAGAACTCA CAGCATTTTA AAGCTCTCGC TGAAGGCAAG 
1101 AAAGCATTCA CTCCTTCATC CAATTCCACC TCCCTU^GCTG GAGACATGGA 
1151 GACATTAACC TTCAGCTGAG ACACTTCCCA AGCTGCTGTT TCAAGGCTGA 
1201 GCTGGCCCCT CTGCCCCAGC TGAGATGGAC AGATCGTTGT CAGCTACTTG 
1251 ATGTCCTTGC CCATGCCACA GCTTGGCTCA GGGGCAGTGC ATGTCCTGCT 
1301 GCCCTCTCTG CCAGAGGGCA CAGAACATGT TTGTTTAATG AACCTGCCTG 
1351 CCTCAGATTG CTGTCCCCGG GGAGTTAATG CATCTACACC ACTGTGGGGA 
1401 TTTGAGTTAT AAGAATTGGA ATTTCTGAGA TCCCATGGAG GTTAGATTGG 
1451 GAGGAAAGCT TAAAAGATGT CCTTTTTGTG AGAGGGATGG AATTGTTTTC 
1501 TTTCATTCGT AAAGTTAGTG AGTAAAGATT TTATAAATCA AAAAAAAAAA 
1551 AAAAAA 


BLAST Results 


Entry HS188252 from database EMBL: 
human STS Wl-12265. 
Score = 2554, P = 4.1e-109, identities - 556/587 


Medline entries 


No Medline entry 


Peptide information for frame 3 


ORF from 255 bp to 1166 bp; peptide length: 304 
Category: putative protein 
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1 MADNSSDECE EENNKEKKKT SQLTPQRGFS ENEDDDDDDD DSSETDSDSD 
51 DDDEEHGAPL EGAYDPADYE HLPVSAEIKE LFQYISRYTP QLIDLDHKLK 
101 PFIPDFIPAV GDIDAFLKVP RPDGKPDNLG LLVLDEPSTK QSDPTVLSLW 
151 LTENSKQHNI TQHMKVKSLE DAEKNPKAID TWIESISELH RSKPPATVHY 
201 TRPMPDIDTL MQEWSPEFEE LLGKVSLPTA EIDCSLAEYI DMICAILDIP 
251 VYKSRIQSLH LLFSLYSEFK NSQHFKALAE GKKAFTPSSN STSQAGDMET 
301 LTFS 

BLASTP hits 

No BLASTP- hits available 

Alert BLASTP hits for DKFZphutel_22n2, frame 3 

PIR:S38149 SIS2 protein - yeast (Saccharomyces cerevisiae), N = 1, 
Score = 132, P = le-05 


>PIR:S3814 9 SIS2 protein - yeast (Saccharomyces cerevisiae) 
Length = 562 

HSPs: 

Score =132 (19.8 bits). Expect = l.Oe-05, P « l.Oe-05 
Identities = 24/63 (38%), Positives = 35/63 (55%) 

Query: 3 DNSSDECEEENNKEKKKTSQLTPQRGFSENEDDDDDDDDSSETDSDSDDDDEEHGAPLEG 62 

+ DE EEE++ E++ T +++DDDDDDDD + D D DDD++E A G 

Sbjct: 497 EEDDDEDEEEDODEEEDTEDKNENNNODDDDDDDDDDDDODDDODDDDDDEDEDEAETPG 556 

Query: 63 AYD 65 
D 

Sbjct: 557 IID 559 

Score = 122 (18.3 bits). Expect = 1.46-04, P = 1.4e-04 
Identities * 20/52 (38%), Positives = 33/52 (63%) 

Query: 4 NSSDECEEENNKEKKKTSQLTPQRGFSENEDDDDDDDDSSETDSDSDDDDEE 55 

N+ +E ++E+ +E + T + + N+DDDDDDDD + D D DDDD++ 

Sbjct: 494 NNEEEDODEDEEEDDDEEEDTEDKNENNNDDDDDODDDDDDODDDDDDDDOD 545 


Pedant information for DKFZphutel_22n2, frame 3 


Report for DKFZphutel_22n2.3 


[LENGTH) 

304 


[MWJ 

34285.85 


(pn 

4.37 


(PROSITEJ 

AMIDATION 1 


[PROSITE] 

CAMP PHOSPHO SITE 

2 

{PROSITEJ 

CK2 PHOSPHO SITE 

10 

{ PROSITE) 

PKC PHOSPHO SITE 

1 

(PROSITE) 

ASN GLYCOSYLATION 

3 

CKW] 

All Alpha 


[KW] 

LOW COMPLEXITY 11. 

.84 % 


SEQ MADNSSDECEEENNKEKKKTSQLTPQRGFSENEDDDDDDDDSSETDSDSDDDDEEHGAPL 

SEC xxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccchhhhhhchhhhhhcccccccccccccccccccccccccccccccccccccccc 

SEQ EGAYDPADYEHLPVSAEIKELFQYISRYTPQLIDLDHKLKPFIPDFIPAVGDIDAFLKVP 

SEG 

PRD ccccccccccccchhhhhhhhhhhhhhhccccccccccccccccccccccccccceeecc 

SEQ RPDGKPDNLGLLVLDEPSTKQSDPTVLSLWLTENSKQHNITQHMKVKSLEDAEKNPKAID 

SEG 

PRD ccccccccceeeeecccccccccccchhhhhhccccccccccccchhhhhhhhcccccch 

SEQ TWIESISELHRSKPPATVHYTRPMPDIDTLMQEWSPEFEELLGKVSLPTAEIDCSLAEYI 

SEG 

PRD hhhhhhhhhhcccccceeeeecccccchhhhhhcccchhhhhccccccccccchhhhhhh 

SEQ DMICAILDIPVYKSRIQSLHLLFSLYSEFKNSQHFKALAEGKKAFTPSSNSTSQAGDMET 

SEG 


519 


wo 01/12659 


PCT/lBOO/01496 


PRO 


SEQ 

SEG 

PRD CCCC 


hhhhhhhcccchhhhhhhhhhhhhhhhhhhcchhhhhhhhcccccccccccccccccccc 
LTFS 


Prosite for DKFZphutel_22n2.3 


PSOOOOl 

4->8 

PSOOOOl 

159->163 

PSOOOOl 

290->294 

PS00004 

17->21 

PS00004 

18->22 

PS00005 

138->141 

PS00006 

5->9 

PS00006 

30->34 

PSO0O06 

43->47 

PS00006 

45->49 

PS00006 

47->51 

PS00006 

49->53 

PS00006 

168->172 

PS00006 

181->185 

PS00006 

185->189 

PS00006 

235->239 

PS00009 

280->284 


ASN^GL YCOS YLAT I ON 
ASN^GLYCOSYLATION 
ASN_GLYCOSYLATION 
CAMP^PHOS PHO_S I TE 
CAMP_PHOSPH0_SITE 
PKC^PHOS PHO_SI TE 
CK2_PH0SPH0 SITE 
CK2~PH0SPH0~SITE 
CK2_PH0SPH0_SITE 
CK2_PH0SPH0_SITE 
CK2_PH0S PHO_S I TE 
CK2 PHOSPHO_SITE 
CK2~PH0SPH0 SITE 
CK2~PH0S PHO'SITE 
CK2_PHOSPHO SITE 
CK2_PH0SPH0~SITE 
AMI DAT ION 


PDOCOOOOl 
PDOCOOOOl 
PDOCOOOOl 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00009 


{No Pfam data available for DKFZphutel_22n2. 3) 
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DKFZphutel_22o2 


group: uterus derived 

DKFZphutel_22o2 encodes a novel 537 amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of uterus-specific 
genes. 


genes. 

similarity to S.pombe SPBC3E7.03c 
complete cDNA, complete cds, EST hits 
Sequenced by BHFZ 
Locus: map""llpl5.5" 
Insert length: 2714 bp 

Poly A stretch at pos. 2695, polyadenylation signal at pos . 2677 

1 GCAGGGCACG GTGGGGGCTG AGATCGTTTC CTGTTGGAAC TTCTGGCCCA 
51 AGAAGCGCGG GTCACAAGGA GAGGGGTCAG TTCGGTTCAG AGCGACTCAG 
101 CCCCTCGACT CGGGTCTTAA AACCTCCGAG CCGCCAGTTC TGCCTCAGGC 
151 CGCGCCCCCT TAAAGCGCCA CCAGACGCTG CGCCCCGTTA AAGCGCCACC 
201 AGACGCCGCG CCCCGTCCCG GCCTCCCCCG CGCGCTGGCG CGGGGCTTTC 
251 TGGGCCAGGG CGGGGCCGGC GAACTGCGGC CCGGAACGGC TGAGGAAGGG 
• 301 CCCGTCCCGC CTTCCCCGGC GCGCCATGGA GCCCCGGGCG GTTGCAGAAG 
351 CCGTGGAGAC GGGTGAGGAG GATGTGATTA TGGAAGCTCT GCGGTCATAC 
401 AACCAGGAGC ACTCCCAGAG CTTCACGTTT GATGATGCCC AACAGGAGGA 
4 51 CCGGAAGAGA CTGGCGGAGC TGCTGGTCTC CGTCCTGGAA CAGGGCTTGC 
501 CACCCTCCCA CCGTGTCATC TGGCTGCAGA GTGTCCGAAT CCTGTCCCGG 
551 GACCGCAACT GCCTGGACCC GTTCACCAGC CGCCAGAGCC TGCAGGCACT 
601 AGCCTGCTAT GCTGACATCT CTGTCTCTGA GGGGTCCGTC CCAGAGTCCG 
651 CAGACATGGA TGTTGTACTG GAGTCCCTCA AGTGCCTGTG CAACCTCGTG 
701 CTCAGCAGCC CTGTGGCACA GATGCTGGCA GCAGAGGCCC GCCTAGTGGT 
751 GAAGCTCACA GAGCGTGTGG GGCTGTACCG TGAGAGGAGC TTCCCCCACG 
801 ATGTCCAGTT CTTTGACTTG CGGCTCCTCT TCCTGCTAAC GGCACTCCGC 
851 ACCGATGTGC GCCAGCAGCT GTTTCAGGAG CTGAAAGGAG TGCGCCTGCT 
901 AACTGACACA CTGGAGCTGA CGCTGGGGGT GACTCCTGAA GGGAACCCCC 
951 CACCCACGCT CCTTCCTTCC CAAGAGACTG AGCGGGCCAT GGAGATCCTC 
1001 AAAGTGCTCT TCAACATCAC CCTGGACTCC APCAAGGGGG AGGTGGACGA 
1051 GGAAGACGCT GCCCTTTACC GACACCTGGG GACCCTTCTC CGGCACTGTG 
1101 TGATGATCGC TACTGCTGGA GACCGCACAG AGGAGTTCCA CGGCCACGCA 
1151 GTGAACCTCC TGGGGAACTT GCCCCTCAAG TGTCTGGATG TTCTCCTCAC 
1201 CCTGGAGCCA CATGGAGACT CCACGGAGTT CATGGGAGTG AATATGGATG 
1251 TGATTCGTGC CCTCCTCATC TTCCTAGAGA AGCGTTTGCA CAAGACACAC 
1301 AGGCTGAAGG AGAGTGTAGC TCCCGTGCTG AGCGTGCTGA CTGAATGTGC 
1351 CCGGATGCAC CGCCCAGCCA GGAAGTTCCT GAAGGCCCAG GGATGGCCAC 
1401 CTCCCCAGGT GCTGCCCCCT CTGCGGGATG TGAGGACACG GCCTGAGGTT 
1451 GGGGAGATGC TGCGGAACAA GCTTGTCCGC CTCATGACAC ACCTGGACAC 
1501 AGATGTGAAG AGGGTGGCTG CCGAGTTCTT GTTTGTCCTG TGCTCTGAGA 
1551 GTGTGCCCCG ATTCATCAAG TACACAGGCT ATGGGAATGC TGCTGGCCTT 
1601 CTGGCTGCCA GGGGCCTCAT GGCAGGAGGC CGGCCCGAGG GCCAGTACTC 
1651 AGAGGATGAG GACACAGACA CAGATGAGTA CAAGGAAGCC AAAGCCAGCA 
1701 TAAACCCTGT GACCGGGAGG GTGGAGGAGA AGCCGCCTAA CCCTATGGAG 
1751 GGCATGACAG AGGAGCAGAA GGAGCACGAG GCCATGAAGC TGGTGACCAT 
1801 GTTTGACAAG CTCTCCAGGA ACAGAGTCAT CCAGCCAATG GGGATGAGTC 
1851 CCCGGGGTCA TCTTACGTCC CTGCAGGATG CCATGTGCGA GACTATGGAG 
1901 CAGCAGCTCT CCTCGGACCC TGACTCGGAC CCTGACTGAG GATGGCAGCT 
1951 CTTCTGCTCC CCCATCAGGA CTGGTGCTGC TTCCAGAGAC TTCCTTGGGG 
2001 TTGCAACCTG GGGAAGCCAC ATCCCACTGG ATCCACACCC GCCCCCACTT 
2051 CTCCATCTTA GAAACCCCTT CTCTTGACTC CCGTTCTGTT CATGATTTGC 
2101 CTCTGGTCCA GTTTCTCATC TCTGGACTGC AACGGTCTTC TTGTGCTAGA 
2151 ACTCAGGCTC AGCCTCGAAT TCCACAGACG AAGTACTTTC TTTTGTCTGC 
2201 GCCAAGAGGA ATGTGTTCAG AAGCTGCTGC CTGAGGGCAG GGCCTACCTG 
2251 GGCACACAGA AGAGCATATG GGAGGGCAGG GGTTTGGGTG TGGGTGCACA 
2301 CAAAGCAAGC ACCATCTGGG ATTGGCACAC TGGCAGAGCC AGTGTGTTGG 
2351 GGTATGTGCT GCACTTCCCA GGGAGAAAAC CTGTCAGAAC TTTCCATACG 
2401 AGTATATCAG AACACACCCT TCCAAGGTAT GTATGCTCTG TTGTTCCTGT 
2451 CCTGTCTTCA CTGAGGGCAG GGCTGGAGGC CTCTTAGACA TTCTCCTTGG 
2501 TCCTCGTTCA GCTGCCCACT GTAGTATCCA CAGTGCCCGA GTTCTCGCTG 
2551 GTTTTGGCAA TTAAACCTCC TTCCTACTGG TTTAGACTAC ACTTACAACA 
2601 AGGAAAATGC CCCTCGTGTG ACCATAGATT GAGATTTATA CCACATACCA 
2651 CACATAGCCA CAGAAACATC ATCTTGAAAT AAAGAAGAGT TTTGGACAAA 
2701 AAAAAAAAAA AAAA 
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BLAST Results 


Entry AF015416 from database EMBL: 

Homo sapiens chromosome 11 from llpl5.5 region, complete sequence. 
Score - 3356, P = 2.0e-144, identities = 672/673 

Entry HS263253 from database EMBL: 
human STS SHGC-15914. 
Score - 1143, P - 9.0e-46, identities = 245/255 


Medline entries 


No Medline entry 


Peptide information for frame 2 


ORF from 326 bp to 1936 bp; peptide length: 537 
Category: similarity to unknown protein 


1 MEPRAVAEAV ETGEEDVIME ALRSYNQEHS QSFTFDDAQQ EDRKRLAELL 
51 VSVLEQGLPP SHRVIWLQSV RILSRDEINCL DPFTSRQSLQ ALACYADISV 
101 SEGSVPESAD MDWLESLKC LCNLVLSSPV AQMLAAEARL WKLTERVGL 
151 YRERSFPHDV QFFDLRLLFL LTALRTDVRQ QLFQELKGVR LLTDTLELTL 
201 GVTPEGNPPP TLLPSQETER AMEILKVLFN ITLDSIKGEV DEEDAALYRH 
251 LGTLLRHCVM lATAGDRTEE FHGHAVNLLG NLPLKCLDVL LTLEPHGDST 
301 EFMGVNMDVI RALLIFLEKR LHKTHRLKES VAPVLSVLTE CARMHRPARK 
351 FLKAQGWPPP QVLPPLRDVR TRPEVGEMLR NKLVRLMTHL DTDVKRVAAE 
401 FLFVLCSESV PRFIKYTGYG NAAGLLAARG LMAGGRPEGQ YSEDEDTDTD 
451 EYKEAKASIN PVTGRVEEKP PNPMEGMTEE QKEHEAMKLV TMFDKLSRNR 
501 VIQPMGMSPR GHLTSLQDAM CETMEQQLSS DPDSDPD 

BLAST? hits 

No BLASTP hits available 

Alert BLASTP hits for DKF2phutel_22o2, frame 2 

TREMBL:SPBC3E7_3 gene: "SPBC3E7 .03c"; product: "hypothetical protein"; 
S.pombe chromosome II cosmid c3E7., N = 1, Score » 112, P - 0.0023 


>TREMBL:SPBC3E7_^3 gene: -SPBC3E7 .03c"; product: "hypothetical protein"; 
S.pombe chromosome II cosmid c3E7. 
Length = 362 

HSPs: 

Score = 112 (16,8 bits). Expect 2,3e-03, P = 2.3e-03 
Identities 71/289 (24%), Positives =^ 124/289 <42%) 

Query: 215 SQETERAM-EILKVLFNITLDSIKGEVDEEDAALYRHLGTLLRHCVMIATAGDRTEEFHG 273 

SQ+ E + EIL++LF 1+ S E DE+ L L+ + + 

Sbjct: 12 SQDNEMVLTEILRLLFPISKRSYLKEEDEQKILL LVIEIWASSLNNNPNSPLRW 65 

Query: 274 HAVN-LLG-NLPLKCLDVLLTLEPHGDSTEFMGVNMDVIRALLIFLEKRLHKTH RL 327 

HA N LL NL L LD + + T + +1 + +LEK L+ + 
Sbjct: 66 HATNALLSFNLQLLSLDQAIYVSEIACQT LQSILISREVEYLEKGLNLCFDIAAKY 121 

Query: 328 KESVAPVLSVLTECARMHRPARKFLKAQGWPPPQVLPPLRDVRTRP-EVGEMLRNKLVRL 386 

+ 4-+ P+L++L + +LPDR++G+R L+RL 

Sbjct: 122 QNTLPPILAILLSLLSFFNIKQNL SMLLFPTNDDRKQSLQKGKSFRCLLLRL 173 

Query: 387 MT-HLDTDVKRVAAEFLFVLCSESVPRFIKYTGYGNAAGLLAARGLMAGGRPEGQYS 442 

+T++ ALLC + + GGAG+ M P+ + 

Sbjct: 174 LTIPIVEPIGTYYASLLNELCDGDSQQIARIFGAGYAMGISQHSETMPFPSPLSKAASPV 233 

Query: 443 -EDEDTDTDEYKEAKASINPVTGRV— EEKPPNPMEGMTEEQKEHEAMKLVTMFDKLSRN 499 

+ + +E +I+PfTG + +E +++E+KE EA +L +F +L +N 

Sbjct: 234 EQKNSRGQENTEENNLAIDPITGSMCTNRNKSQRLE-LSQEEKEREAERLFYLFQRLEKN 292 
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Query: 500 RVIQ 503 
IQ 

Sbjct: 293 STIQ 296 


Pedant information for DKFZphutel_22o2, frame 2 
Report for DKFZphutel_22o2.2 

[LENGTH] 537 

(MWJ 60372.53 

(pi) 5.20 

[BLOCKS] BL00415L Synapsins proteins 

[PROSITE] MYRISTYL 4 

[PROSITE] CK2_PH0SPH0_SITE 13 

[PROSITE] PKC_PHOSPHO_SITE 10 

[PROSITE] ASN_GLYCOSYLATION 1 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 9.50 % 

SEQ MEPRAVAEAVETGEEDVIMEALRSYNQEHSQSFTFDDAQQEDRKRLAELLVSVLEQGLPP 
SEG , 

PRO ccchhhhhhhhhccchhhhhhhhhhccccccceeeccchhhhhhlihhhhhhhhhhccccc 

SEQ SHRVIWLOSVRILSRDRHCLDPFTSRQSLQALACYADISVSEGSVPESADMDWLESLKC 
SEG 

PRO cceeeeeccccccccccccccccchhhhhhhhlihhhceeeeccccccccchhhhhhhhh^ 

SEQ LCNLVLSSPVAQMLAAEARLWKLTERVGLYRERSFPHDVQFFDLRLLFLLTALRTDVRQ 

xxxxxxxxxxxxxxx . . . 

PRO ti^hhccccchhhhlihhhhhhhhhhhhccccccccccccccchhhhhhhhhhhhhhhhhhh 

SEQ QLFQELKGVRLLTDTLELTLGVTPEGNPPPTLLPSQETERAMEILKVLFNITLDSIKGEV 
SEG 

PRO hhhhhhchhhhhhhhhhhhccccccccccccccchhhhhhhhhhhhhhhhhhccccclihh 

SEQ DEEDAALYRHLGTLLRHCVMIATAGDRTEEFHGHAVNLLGNLPLKCLDVLLTLEPHGDST 

SEG 

PRO hhhhhhhhhhhhhhhhhhhhccccccccccccceeeeecccccccceeeeeeeccccccc 

SEQ EFMGVNMDVIRALLIFLEKRLHKTHRLKESVAPVLSVLTECARMHRPARKFLKAQGWPPP 
SEG 

PRO eGeehhhhhhhhhhhhhhhhhhhhhhccccceeeehhhhhhhhhchhhhhhhhccccccc 


SEQ QVLPPLRDVRTRPEVGEMLRNKLVRLMTHLDTDVKRVAAEFLFVLCSESVPRFI KYTGYG 

SEG jjj^j^ 

PRO cccccccccccchhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhcccccceeeecccc 

SEQ NAAGLLAARGLMAGGRPEGQYSEDEDTDTDEYKEAKASINPVTGRVEEKPPNPMEGMTEE 

SEG xxxxxxxxxxxxxxx xxxxxxxxx 

PRD chhhhhhhhhccccccccccccccocccchlihhhhhhhccccccceeecccccccclihlih 

SEQ QKEHEAMKLVTMFDKLSRNRVIQPMGMSPRGHLTSLQDAMCETMEQQLSSDPDSDPD 

xxxxxxxxx 

PRD hhhhhhhhhhhhhhhcccccccccccccccccchhhhhhhhhhhhhtihticccccccc 


PSOOOOl 

230->234 

ASN 

PS00005 

61->64 

PKC" 

PS00005 

69->72 

PKC' 

PS00005 

84->87 

PKC' 

PS00005 

117->120 

PKC 

PS00005 

145->148 

PKC" 

PS00005 

218->221 

PKC 

PS00005 

235->238 

PKC' 

PS00005 

324->327 

PKC' 

PS00005 

463->466 

PKC' 

PS00005 

508->511 

PKC" 

PS00006 

12->16 

CK2' 

PS00006 

34->38 

CK2' 

PS 00 00 6 

52->56 

CK2" 

PS00006 

99->103 

CK2' 

PS00006 

104->108 

CK2" 

PS00006 

263->267 

CK2" 

PS00006 

371->375 

CK2 


for DKFZphutel_22o2.2 


GLYCOSYLATION 

PDOCOOOOl 

PHOSPHO 

SITE 

PDOC00005 

'PHOSPHO" 

'site 

PDOC00005 

"PHOSPHO' 

'site 

PDOC00005 

'PHOSPHO* 

"site 

PDOC00005 

"PHOSPHC" 

"site 

PDOC00005 

"PHOSPHO" 

"site 

PDOC00005 

*PHOSPHO~ 

"site 

PDOC00005 

"PHOSPHO" 

"site 

PDOC00005 

"PHOSPHO" 

"site 

PDOC00005 

"PHOSPHo" 

'site 

PDOC00005 

"PHOSPHO' 

"site 

PDOC00006 

"pHOSPHO" 

"site 

PDOC00006 

■pHOSPHO" 

'site 

PDOC00006 

"PHOSPHO" 

'site 

PDOC00006 

"PHOSPHO" 

"site 

PDOC00006 

PHOSPHO" 

'site 

PDOC00006 

PHOSPHO" 

site 

PDOC00006 
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PS00006 

388->392 

CK2 PHOSPHO 

SITE 

PDOC00006 

PS00006 

442->446 

CK2 PHOSPHO' 

"site 

PDOC00006 

PS00006 

447->451 

CK2_PH0SPH0~ 

"site 

PDOC00006 

PS00006 

491->495 

CK2 PHOSPHO' 

"site 

PDOC00006 

PS00006 

515->519 

CK2 PHOSPHO" 

"site 

PDOC00006 

PS00006 

530->534 

CK2 PHOSPHO" 

SITE 

PDOC00006 

PS00008 

57->63 

MYRISTYL 


PDOC00008 

psooooe 

420->426 

MYRISTYL 


PDOC00008 

psooooe 

424->430 

MYRISTYL 


PDOC00008 

PSOOOOS 

430->436 

MYRISTYL 


PDOC00008 


(Ho Pfam data available for DKFZphutel_22o2.2) 
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DKF2phutel_23el3 


group: metabolism 


DKFZphtes3_15jl8 encodes a novel 148 amino acid protein with similarity to 27K heat shock 
proteins. 

The novel protein contains a serine protease of the subtilase family with an aspartic acid- 
containing active site. Subtilases are an extensive family of serine proteases whose catalytic 
activity is provided by a charge relay system similar to that of the trypsin family of serine 
proteases but which evolved by independent convergent evolution. The sequence around the 
residues involved in the catalytic triad {aspartic acid, serine and histidine) are completely 
different from that of the analogous residues in the trypsin serine proteases. Thus the novel 
protein is a new member of this family. 

The new protein can find application in modulation of proteinase activity in cells and as a 
new enzyme for proteomics and biotechnologic production processes. 


heat shock protein HSP27 

strong similarity to heat shock 27K proteins 
complete cDNA, complete cds, EST hits 
Sequenced by EMBL 

Locus: /map-»'578.9 cR from top of Chrl2 linkage group" 
Insert length: 1854 bp 

Poly A stretch at pos. 1831, polyadenylation signal at pos. 1810 


1 GGTTTATTAA GCTCCTGGCT CCGCTCTAGA CCTCAGCGGT TCTGGCTGCC 
51 AGCCTGGGCA GCCTGGGAAG CCTGGGAGGA CGGTGGCTTG CCGGTCTGTC 
101 GTGAGGCAGT GCGGACGGGG ACCCTCTGGG ATTCTGCTGG ATCTGCCCCG 
151 GGGGTTACCT TTGGGGGCTG GGACCCCAGT CGAGGGGACA CAACCGTCCC 
201 TGGCAGTGGT TGGTTCTGCT TCTCCCTGCA GAAAAGCAGC ATTTTCGGAA 
251 GCTGAAGAAT AAGCTAGCCC AGCCACACCA CCTTGTTGTG TGACCTTGGG 
301 CAGGTGGTTC TGTCTCTCTG AGCCTCTGTT TCTCTCTGAG CTGAGCAGCC 
351 ACCATGGCTG ACGGTCAGAT GCCCTTCTCC TGCCACTACC CAAGCCGCCT 
401 GCGCCGAGAC CCCTTCCGGG ACTCTCCCCT CTCCTCTCGC CTGCTGGATG 
451 ATGGCTTTGG CATGGACCCC TTCCCAGACG ACTTGACAGC CTCTTGGCCC 
501 GACTGGGCTC TGCCTCGTCT CTCCTCCGCC TGGCCAGGCA CCCTAAGGTC 
551 GGGCATGGTG CCCCGGGGCC CCACTGCCAC CGCCAGGTTT GGGGTGCCTG 
601 CCGAGGGCAG GACCCCCCCA CCCTTCCCTG GGGAGCCCTG GAAAGTGTGT 
651 GTGAATGTGC ACAGCTTCAA GCCAGAGGAG TTGATGGTGA AGACCAAAGA 
701 TGGATACGTG GAGGTGTCTG GCAAACATGA AGAGAAACAG CAAGAAGGTG 
751 GCATTGTTTC TAAGAACTTC ACAAAGAAAA TCCAGCTTCC TGCAGAGGTG 
801 GATCCTGTGA CAGTATTTGC CTCACTTTCC CCAGAGGGTC TGCTGATCAT 
851 CGAAGCTCCC CAGGTCCCTC CTTACTCAAC ATTTGGAGAG AGCAGTTTCA 
901 ACAACGAGCT TCCCCAGGAC AGCCAGGAAG TCACCTGTAC CTGAGATGCC 
951 AGTACTGGCC CATCCTTGTT TTGTCCCCAA CCCTAGGGCT TCTCTGATTC 
1001 CAGGATACAT TACTTTAGCT GAACTCAGAT TTAGTGCAAG TAAAATGTTA 
1051 GAGGGTGCGG GGGTGAGGAC TGACCACAGA TTCCCTGGAT AGTGTAGTGG 
1101 TAGATTTCTC CACAGGATAG CGCAATTGGC AAATCATGCT TGGTTGTGTT 
1151 AGGCCAAAAT ACTAGTTTTG CTTTCTTTAC CTTTTCTATC TTGATGAAAA 
1201 TGTTGCACAT TCTATAGTTG CAAAACACAT AAAAGGGGAC TTAACATTTC 
1251 ACGTTGTATC TTACTTGCAG TGAATGCAAG GGTTACTTTT CTCTGGGGAC 
1301 CTCCCCCATC ACCCAGGTTC CTACTCTGGG CTCCCGATTC CCATGGCTCC 
1351 CAAACCATGC CGCATGGTTT GGTTAATGAA ACCCAGTAGC TAACCCCACT 
1401 GTGCTTCCAC ATGCCTGGCC TAAAATGGGT GATATACAGG TCTTATATCC 
1451 CCATATGGAA TTTATCCATC AACCACATAA AAACAAACAG TGCCTTCTGC 
1501 CCTCTGCCCA GATGTGTCCA GCACGTTCTC AAAGTTTCCA CATTAGCACT 
1551 CCCTAAGGAC GCTGGGAGCC TGTCAGTTTA TGATCTGACC TAGGTCCCCC 
1601 CTTTCTTCTG TCCCCTGTGT TTAAGTCGGG ATTTTTACAG AGGGAGCTGT 
1651 CTCCAGACAG CTCCATCAGG AACCAAGCAA AGGCCAGATA GCCTGACAGA 
1701 TAGGCTAGTG GTATTGTGTA TATGGGCGGG ACGTGTGTGT CATTATTATT 
1751 TGAGTTATGC TGTTGTTTAG GGGTAAATAA CAGTAAATAA TTAATAATAA 
1801 TAATAATAAT AATAAAGGAG CTGACGTTCT TAAAAAAGAA AAAAAAAATA 
1851 AAAA 


BLAST Results 


Entry HS286348 from database EMBL: 
human STS TIGR-A002J47 . 
Score = 510, P = 1.2e-16, identities - 102/102 
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Medline entries 


9S394379: 

Cloning and sequencing of a cDNA encoding the canine HSP27 protein, 
94110260: 

Physiological and pathological changes in levels of the two 
small stress proteins, HSP27 and alpha B crystallin, in rat 
hindlimb muscles 


Peptide information for frame 3 


ORF from 354 bp to 941 bp; peptide length: 196 
Category: strong similarity to known protein 
Prosite motifs: SUBTILASE_ASP (28-39) 


1 MADGQMPFSC HYPSRLRRDP FRDSPLSSRL LDDGFGMDPF PDDLTASWPD 

51 MALPRLSSAW PGTLRSGMVp RGPTATARFG VPAEGRTPPP FPGEPWKVCV 
101 NVHSFKPEEL MVKTKDGYVE VSGKHEEKQQ EGGIVSKNFT KKIQLPAEVD ' 
151 PVTVFASLSP EGLLIIEAPQ VPPYSTFGES SFNNELPQDS QEVTCT 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_23el3, frame 3 

PIR:JC4244 heat-shock 27K protein - dog, N « 1, Score - 304, P = 
4-3e-27 

PIR:JN0924 heat shock 27 protein - rat, N - 1, Score * 301, P = 8.9e-27 

TREMBL:MM03561_1 product: "heat shock protein HSP27"; Mus musculus 
heat shock protein HSP27 internal deletion variant b mRNA, complete 
cds., N « 1, Score = 301, P - 8.9e-27 


>P.tR:JC4244 heat-shock 27K protein - dog 
Length = 209 

HSPs: 

Score = 304 (45,6 bits). Expect 4.3e-27, P = 4.3e-27 
Identities = 80/182 (43%), Positives = 102/182 (56%) 

Query: 1 MADGQMPFSC-HYPSRLRRDPFRD-SPLSSRLLDDGFGMDPFPDDLTASWPDWALPRLSS 58 

M + ++PFS PS DPFRD P SRL D FG+ P++ WW S 
Sbjct: 1 MTERRVPFSLLRSPSW— DPFRDWYPAHSRLFDQAFGLPRLPEE WAQWFG HS 50 

Query: 59 AWPGTLRSGMVP RGPTATARFGVPAEGR--TPPPFPG EPWKVCVNVHSF 105 

WFG +R +P GP A A PA R G + W+V ++V+ F 

Sbjct: 51 GWPGYVRP— IPPAVEGPAAAAAAAAPAYSRALSRQLSSGVSEIRQTADRWRVSLDVNHF 108 

Query: 106 KPEELMVKTKDGYVEVSGKHEEKQQEGGIVSKNFTKKIQLPAEVDPVTVFASLSPEGLLI 165 

PEEL VKTKDG VE++GKHEE+Q E G +S+ T K LP VDP V +SLSPEG L 
Sbjct: 109 APEELTVKTKDGVVEITGKHEERQDEHGYISRRLTPKYTLPPGVDPTLVSSSLSPEGTLT 168 

Query: 166 lEAPQVPPYSTFGE 179 

+EAP P + E 
Sbjct: 169 VEAPMPKPATQSAE 182 


Pedant information for DKFZphutel_23el3, frame 3 


Report for DKF2phutel_23el3 . 3 


(LENGTH] 196 

IMWJ 21604.37 
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[pi] 

(HOMOLl 

I BLOCKS J 

(PIRKW) 

(PIRKWl 

[PIRKWJ 

[PIRKWJ 

[PIRKWJ 

(PIRKWJ 

[PIRKWJ 

(PIRKWJ 

[PIRKWJ 

(SUPFAMJ - 

[PROSITEJ 

[PROSITEJ 

[PROSITEJ 

(PROSITEJ 

(PROSITEJ 

(PFAMJ 

(KWJ 

(KWJ 


5.00 

BL01031C^^ heat-shock 27K protein - dog 3e-22 

blocked amino end le-13 

acetylated amino end 46-13 

phosphoprotein 7e-21 

glycoprotein 2e-ll 

heat shock 7e-21 

molecular chaperone 4e-13 

alternative splicing le-19 

eye lens 6e-l4 

stress-induced protein 7e-21 
alpha-crystallin 7e-2l 
SUBTILASE_ASP 1 
MYRISTYL 2 
CK2_PH0SPH0_SITE 2 
PKC PHOSPHO SITE 6 

asn"glycosylation 1 
Heat shock hsp20 proteins 
All_Beta 

LOW_COMPLEXITY 7.14 % 


SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 


MADGQMPFSCHYPSRLRRDPFRDSPLSSRLLDDGFGMDPFPDDLTASWPDWALPRLSSAW 
XXXXXXXXXXXXXJC 

ccccccccccccccccccccccccccchhhhhcccccccccccccccccccccccccccc 
pgtlrsgmvprgptatarfgvpaegrtpppfpgepwkvcvnvhsfkpeelmvktkdgyve' 
cccccccccccccchhhhhhhhccccccchhhhhhheeeeeecccccceeeeec^^ 
vsgkheekqqeggivsknetkkiqlpaevdpvtvfaslspeglliieapqvppystexses 

eccchhhhhcccceeeeccccccccccccccceeeecccccceeeeeccccccccccccc 
sfnnelpqdsqevtct 

cccccccccceeeccc 


Prosite for DKFZphutel_23el3. 3 


PSOOOOl 
PS00005 
PSO00O5 
PS00005 
PS00005 
PS00005 
PS00005 
P5O00O6 
PS00006 
PS00008 
PS00008 
PS00136 


138->142 

27- >30 
63->66 
76->79 

104->107 
122->125 
140->143 

47->51 
176->180 

62->68 
132->138 

28- >39 


asn_glycosylation 

PKC^PHOSPHO SITE 
PKC_PHOS PHO'SITE 
PKC_PH0SPHO SITE 

pkc_phospho~site 
pkcphos pho_s i te 

PKC_PHOS PHO_S I TE 
CK2_PHOSPHO_S ITE 
CK2_PH0S PHO_S ITE 
MYRISTYL 
MYRISTYL 
SOBTILASE ASP 


PDOCOOOOl 
P0OC00005 

PDcxroooos 

PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
FDOC00008 
PDOC00008 
PDOC00125 


Pfam for DKF2phutel_23el3.3 


HMM_NAME 
HMM 
Query 
HMH 

Query 

HMM 

Query 


Heat shock hsp20 proteins 
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*AMMrpPWDWRE DpDHFeVrMDMPGFKPEEIKVkVEDNNVLvIeG 

A P++ R + ++V++++ FKPEE+ VK+ D+ +++++G 

ARFGVPAEGR-TPPPFPGEPWKVCVNVHSFKPEELMVKTKDG-YVEVSG 123 

EHEREEEREDDkWWWHERIYRHFMRRFrLPENVDpDqIkAsMSdNGVLTI 
+HE E++ + + ++ F +++LP +VDP + AS+S++G+L I 

124 KHE EKQQ EGGIVSKNFTKKIQLPAEVDPVTVFASLSPEGLLII 166 

TVPKpEP* 
++P ++P 
167 EAPQVPP 173 
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DKFZphutel_23gll 


group: uterus derived 

DKF2phutel_23gll encodes a novel 256 amino acid protein with similarity to S.pombe 
SPAC31G5.12C and S. cerevisiae Maflp. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of uterus-specific 
genes . 


similarity to SPAC31G5.12c and Maflp 

complete cDNA, complete cds, EST hits 

Sequenced by EM3L 

Locus: unknown 

Insert length: 1674 bp 

Poly A stretch at pos. 1664, polyadenylation signal at pos. 1644 


1 GGGGGAGGCG GAGGTCGCTC GCTCGCTCGC TCGGCTCGCT GACTCGCCGG 
51 AGCGCTCTGT GGCGGTCGGC GGCAGGTCGG TCGCGAGAGC GGGCTCTGTG 
101 GAAGGGGGCG AGGCTATGTC GCGGTGGCAG CCCGGATGGG CCGGCAGGGC 
151 CGGGAGTAAC GGGACGTCGC CGCGGAGCTT CTTCCCCCGG ATACAGTGCG 
201 GCCCGAGCGG AGGCCGCGGC GCCGCCCTCC GATCTTGAAG AGCCCGCGCT 
251 GCGCGGAGCC CGCCCCCGCC TGCGCACCGG CACCGACGCG GAGCGACCAG 
301 CCCAGCCAGA CCCGGCCCGG CGCGGCCTGA TCTAACCCAG CCAGGCAGGC 
351 AATACTAGCC CCTCTGGAGC ACGGAGCTCC TTCCCCAAAG ACATGAAGCT 
401 ATTGGAGAAC TCGAGCTTTG AAGCCATCAA CTCACAGCTG ACTGTGGAGA 
451 CCGGAGATGC CCACATCATT GGCAGGATTG AGAGCTACTC ATGTAAGATG 
501 GCAGGAGACG ACAAACACAT GTTCAAGCAG TTCTGCCAGG AGGGCCAGCC 
551 CCACGTGCTG GAGGCACTTT CTCCACCCCA GACTTCAGGA CTGAGCCCCA 
601 GCAGACTCAG CAAAAGCCAA GGCGGTGAGG AGGAGGGCCC CCTCAGTGAC 
651 AAGTGCAGCC GCAAGACCCT CTTCTACCTG ATTGCCACGC TCAATGAGTC 
701 CTTCAGGCCT GACTATGACT TCAGCACAGC CCGCAGCCAT GAGTTCAGCC 
751 GGGAGCCCAG CCTTAGCTGG GTGGTGAATG CAGTCAACTG CAGTCTGTTC 
801 TCAGCTGTGC GGGAGGACTT CAAGGATCTG AAACCACAGC TGTGGAACGC 
851 GGTGGACGAG GAGATCTGCC TGGCTGAATG TGACATCTAC AGCTATAACC 
901 CAGACTTGGA CTCAGATCCC TTCGGGGAGG ATGGTAGCCT CTGGTCCTTC 
951 AACTACTTCT TCTACAACAA GCGGCTCAAG CGAATCGTCT TCTTTAGCTG 
1001 CCGTTCCATC AGTGGCTCCA CCTACACACC CTCAGAGGCA GGCAACGAGC 
1051 TGGACATGGA GCTGGGGGAG GAGGAGGTGG AGGAAGAAAG CAGAAGCAGG 
1101 GGCAGTGGGG CCGAGGAGAC CAGCACCATG GAGGAGGACA GGGTCCCAGT 
1151 GATCTGTATT TGATGAGGAG GAGCCGAGGC CCCAGCTTCA TCCAGCTTCA 
1201 ACCAATGCCT GGACCTGTCC ACCTGAGAGG CCCCTGGGGC CTCCCCAGCT 
1251 GCTGGCCAGA CCCTGGCGCT GCCACAGTCC TGGCACTGCC CAAGGCCATA 
1301 CCTGCCTAGC CCTTTGGCTC CATCCTGTGG ATGCCCACTC ACCCCTCAGA 
1351 CTCCTGCTGC CCATGCTGTG GCCGGACTTG TCAGCAGGGG GCCTGGTGGG 
1401 AGGAGCGACT GCCCTGCCCA AATGAACTGC CACAGCAGGG ACAGCTGGAC 
1451 CGCAGAGTTT ATTTTTGTAT TTCTACTGGG CCTGCACACT CCAGCCCAAA 
1501 GGGTCTGTGG CCGGAGGCCC CACGAGCAGG CCCCAGCAGT CACCGGCTCT 
1551 GGTCTTGGGC CGGCCCCGGT GCCCACCTGT ACCCCCACCT CGCCCATTTG 
1601 GCCGCGTGCA CTGAGTGTCA CTTTGCTGCA GCTCGTTTCT TTCCAATAAA 
1651 AGTTTCTGTG ACTTAAAAAA AAAA 


BLAST Results 


No BLAST result 


Medline entries 


NO Medline entry 


Peptide information for frame 3 


ORF from 393 bp to 1160 bp; peptide length: 256 
Category: similarity to known protein 
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1 MKLLENSSFE AINSQLTVET GDAHIIGRIE SYSCKMAGDD KHMFKQFCQE 
51 GQPHVLEALS PPQTSGLSPS RLSKSQGGEE EGPLSDKCSR KTLFYLIATL 
101 NESFRPDYDF STARSHEFSR EPSLSWVVNA VNCSLFSAVR EDFKDLKPQL 
151 WNAVDEEICL AECDIYSYNP DLDSDPFGED GSLWSFNYFF YNKRLKRIVF 
201 FSCRSISGST YTPSEAGNEL DMELGEEEVE EESRSRGSGA EETSTMEEDR 
251 VPVICI 


BLASTP hits 


Entry SPAC31G5_12 from database TREMBL: 

gene: "SPAC31G5. 12c**; product: "hypothetical protein"; S.pombe 
chromosome" I cosmid c31G5. 

Score « 272, P « 9.3e-24, identities - 51/127, positives « 80/127 
Entry SPD656_1 from database TREMBL: 

product: "ORF N150"; Yeast DNA for bfr2+ protein/padl+ protein/sksl+ 
protein, ORF N313, ORF N150, complete cds, and for ORF N118, partial 
cds. 

Score =* 263, P = 8.4e-23, identities = 50/127, positives = 79/127 
Entry S50986 from database PIR; 

MAFl protein - yeast (Saccharomyces cerevisiae) >SWISSPROT:MAFl YEAST 
MAFl PROTEIN. >TREMBL: SC194 92_1 gene: "MAFl"; product: "Maflp";" 
Saccharomyces cerevisiae Maflp (MAFl) gene, complete cds. 
>TREMBL:SC8119_11 gene: "MAFlp-; product: "Maflp"; S. cerevisiae 
chromosome IV cosmid 8119. 

Score = 180, P = 2.3e-17, identities « 43/133, positives « 75/133 
Entry AF098499_2 from database TREMBL: 

gene: "C43H8.2"; Caenorhabditis elegans cosmid C43H8. 

Score = 263, P = 9.2e-23, identities = 78/252, positives « 118/252 


Alert BLASTP hits for DKFZphutel_23gll, frame 3 
No Alert BLASTP hits found 

Pedant information for DKF2phutel_23gll, frame 3 


Report for DKF2phutel_23gll . 3 


[LENGTH! 256 

IMW] 28869.95 

[pi] 4.51 

[HOMOLJ TREMBL :SPAC31G5_1 2 gene: 

S.pombe chromosome I cosmid c31G5. 46-23 
[FUNCATJ 
6e-13 
[PROSITE] 
(PROSITE) 
[PROSITE] 
(PROSITE] 
IKW) 
[KW] 


•SPAC31G5.12C"; product: "hypothetical protein" 


06.04 protein targeting, sorting and translocation [S. cerevisiae. 


YDR005C1 


MYRISTYL 3 

CK2_PH0SPH0_SITE 5 
PKC_PH0SPH0_SITE 6 
ASN_GLYC0SYLATION 3 
All_Alpha 

LOW COMPLEXITY 7.81 % 


SEQ MKLLENSSFEAINSQLTVETGDAHIIGRIESYSCKMAGDDKHMFKQFCQEGQPHVLEALS 


PRO cccccchhhhhhhhhhhhccccceeeeecccchhhhhccchhhhhhhhhcccceeeeccc 

SEQ PPQTSGLSPSRLSKSQGGEEEGPLSDKCSRKTLFYLIATLNESFRPDYDFSTARSHEFSR 

SEG 

PRD cccccccccccccccccccccccccccchhhhhhhhhhhhcccccccccccccccccccc 

SEQ EPSLSWWNAVNCSLFSAVREDFKDLKPQLWNAVDEEICLAECDIYSYNPOLDSDPFGED 

SEG 

PRD ccccccchhhhhhhhhhhhhchhhhhhhhhhhhhhhhccccccceeeccccccccccccc 

SEQ GSLWSFNYFFYNKRLKRIVFFSCRSISGSTYTPSEAGNELDMELGEEEVEEESRSRGSGA 

• xxxxxxxxxxxxxxxxxx 

PRD ccceeeceeechhhhhhhhhhhccccccccccccccccchhhhhhhhhhhhhhccccccc 

SEQ EETSTMEEDRVPVICI 

SEG XX. 

PRD cccccccccceeeccc 
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prosite for DKF2phutel_23gll .3 


PSOOOOI 
PSOOOOl 
PSOOOOI 
PSO0OO5 
PS00005 
PSO0OO5 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PSO0O06 
PSO0OO6 
PSO0O08 
PS00008 
PS00008 


6- >10 
101->105 
132->136 

33->36 
85->88 
89->92 
103->106 
112->115 
202->205 

7- >ll 
99->103 

212->216 

238- >242 
244->248 

66->72 
181->187 

239- >245 


ASN_GLYCOSYLATI0N 
ASN GLYCOSYLATION 
ASN^GLYCOSYLATIOM 
PKC_PHOSPHO_SITE 

PKC^PHOSPHOSITE 

PKC_PHOS PHO_S ITE 

PKC_PHOSPHO SITE 

PKC PHOSPHO^SITE 

PKC~PH0SPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PH0SPH0_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOS PHO_SI T E 

MYRISTYL 

MYRISTYL 

MYRISTYL 


PDOCOOOOl 
PDOCOOOOl 
PDOCOOOOl 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC000G5 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDCX:00006 

pcKxooooe 
PDCxroooos 

PDOC00008 


(No Pfam data available for DKF2phutel_23gll . 3) 
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group: transmembrane protein 

DKFZphutel_24cl9 encodes a novel 195 amino acid protein without similarity to known proteins. 
The novel protein contains 1 transmembrane region. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of uterus-specific 
genes and as a new marker for uterine cells . 

unknown 

membrane regions: 1 

Summary DKrzphutel_24cl9 encodes a novel 195 amino acid protein, with 
no similarity to known proteins. 

unknown 

complete cDNA, complete cds, EST hits 
TRANSMEMBRANE 1 

Sequenced by Qiagen 

Locus: unknown 

Insert length: 769 bp 

Poly A stretch at pos. 746, polyadenylation signal at pos. 735 

1 ACGAGTCAGC CAAAGATGGC TGCGCCCAGG TAATTTGAGC AAAGGCCACA 

51 GTGAACTCCG GCGTGGCTGA GGAAGACCGG AGGAGGCACC CACAGGCTGC 

101 TGGGAGGAGA GC AT AAGGCT CAAAATGGAA AATCAT7UUVT CCAATT^TAA 

151 GGAAAACATA ACAATTGTTG ATATATCCAG AAAAATTAAC CAGCTTCCAG 

201 AAGCAGAAAG GAATCTACTT GAAAATGGAT CGGTTTATGT TGGATTAAAT 

251 GCTGCTCTTT GTGGCCTCAT AGCAAACAGT CTTTTTCGAC GCATCTTGAA 

301 TGTGACAAAG GCTCGCATAG CTGCTGGCTT ACCAATGGCA GGGATACCTT 

351 TTCTTACAAC AGACTTAACT TACAGATGTT TTGTAAGTTT TCCTTTGAAT 

4 01 ACAGGTGATT TGGATTGTGA AACCTGTACC ATAACACGGA GTGGACTGAC 

4 51 TGGTCTTGTT ATTGGTGGTC TATACCCTGT TTTCTTGGCT ATACCTGTAA 

501 ATGGTGGTCT AGCAGCCAGG TATCAATCAG CTCTGTTACC ACACAAAGGG 

551 AACATCTTAA GTTACTGGAT TAGAACTTCT AAGCCTGTCT TTAGAAAGAT 

601 GTTATTTCCT ATTTTGCTCC AGACTATGTT TTCAGCATAC CTTGGGTCTG 

651 AACAATATAA ACTACTTATA AAGGCCCTTC AGTTATCTGA ACCTGGCAAA 

701 GAAATTCACT GATTTTAAAC AAATATGTAA ACAAAAATAA AATGGTAAAA 

751 ACAAAAAAAA AAAAAAAAA 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 2 


ORF from 125 bp to 709 bp; peptide length: 195 
Category: putative protein 


1 MENHKSNNKE NITIVDISRK INQLPEAERN LLENGSVYVG LNAALCGLIA 

51 NSLFRRILNV TKARIAAGLP MAGIPFLTTD LTYRCFVSFP LNTGDLDCET 

101 CTITRSGLTG LVIGGLYPVF LAIPVNGGLA ARYQSALLPH KGNILSYWIR 

151 TSKPVFRKML FPILLQTMFS AYLGSEQYKL LIKALQLSEP GKEIH 

BLASTP hits 

No BLASTP hits available 


531 


wo 01/12659 


PCT/IBOO/01496 


Alert BLASTP hits for DKFZphutel_24cl9, frame 2 
No Alert BLASTP hits found 

Pedant information for DKF2phutel_24cl9, frame 2 

Report for DKFZphutel_24cl9.2 


[LENGTH] 

(MWJ 

Ipl) 

(PROSITEJ- 
(PROSITEJ 
(PROSTTE] 
[PROSITEJ 
(KWJ 


195 

21527.45 
9.36 

MYRISTYL 6 

CK2_PH0SPH0_SITE 
PKC_PH0SPHO_SITE 
ASN^GLVCOS YLAT I ON 
TRANSMEMBRANE 1 


SEQ 
PRD 
MEM 

SEQ 
PRD 
MEM 

SEO 
PRD 
MEM 

SEQ 
PRD 
MEM 


MENHKSNNKENITIVDISRKINQLPEAERNLLENGSVYVGLNAALCGLIANSLFRRILNV 
cccccccccceeeeeehhhhhhccchhhhhhhccccceeeecchhhhhhhhhhhhhhhhh 

TKARI AAGLPMAG I PFLTTDLT YRCFVSFPLNTGDLDCETCT ITRSGLTGLVIGGLYPVF 
hhhhhhhccccccceeeeecccccccccccccccccccccccccccccceeeecccceee 
MMMMMMMMMMMMMM 

LAIPVNGGLAARYQSALLPHKGNILSYWIRTSKPVFRKMLFPILLQTMFSAYLGSEQYKL 
eeeccccccchhhhhhccccccceeeeeeecccchhhhhchhhhhhhhhhhhhcchhhhh 
MMM 

LIKALQLSEPGKEIH 
hhhhhhhcccccccc 


Prosite for DKFZphutelj_24cl9 . 2 


PSOOOOl 

11~>15 

ASN GLYCOSYLATION 

PDOCOOOOl 

PSOOOOl 

34->38 

ASN GLYCOSYLATION 

PDOCOOOOl 

PSOOOOl 

59->63 

ASN GLYCOSYLATION 

PDOCOOOOl 

PS00005 

18->21 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

82->85 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

151->154 

PKC PHOSPHO^SITE 

PDOC00005 

PS00006 

13->17 

CK2 PHOSPHO^SITE 

PDOC00006 

PS00008 

40->46 

MYRISTYL 

PDOC00008 

PS 00 00 8 

47->53 

MYRISTYL 

PDOC00008 

PS00008 

68->74 

MYRISTYL 

PDOC00008 

PS00008 

110->116 

MYRISTYL 

PDOC00008 

PS00008 

127->133 

MYRISTYL 

PDOC00008 

PS00008 

142->148 

MYRISTYL 

PDOC000O8 


(Ho Pfara data available for OKFZphutel 24cl9.2) 
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DKFZphutel_24ell 


group: intracellular transport and trafficking 

DKFZphutel_24ell encodes a novel 226 amino acid protein, with similarity to human/mouse golgi 
4 -transmembrane spanning transporter MTP. mtp may function in the transport of nucleosides 
and/or nucleoside derivatives between the cytosol and the lumen of an intracellular membrane- 
bound compartment. Thus, the novel protein also seems to be involved in nucleotide sugar 
transport. 

The new protein can find application in modulating the transport of nucleosides and/or 
nucleoside derivatives between the cytosol and the lumen of an intracellular membrane-bound 
compartments . 


similarity to 4 -TRANSMEMBRANE SPANNING TRANSPORTER MTP 

complete cDNA, complete cds, EST hits 
potential start at 184, 
TRANSMEMBRANE 4 

function in the transport of nucleosides and/or nucleoside derivatives 
between the cytosol and 

the lumen of an intracellular membrane -bound compartment? 
Sequenced by Qiagen 
Locus: /map="8'* 
Insert length: 2005 bp 

Poly A stretch at pos. 1988, polyadenylation signal at pos. 1963 

1 ACGCGTCCGG CAGAAGCTCG GAGCTCTCGG GGTATCGAGG AGGCAGGCCC 
51 GCGGGCGCAC GGGCGAGCGG GCCGGGAGCC GGAGCGGCGG AGGAGCCGGC 
101 AGCAGCGGCG CGGCGGGCTC CAGGCGAGGC GGTCGACGCT CCTGAAAACT 
151 TGCGCGCGCG CTCGCGCCAC TGCGCCCGGA GCGATGAAGA TGGTCGCGCC 
201 CTGGACGCGG TTCTACTCCA ACAGCTGCTG CTTGTGCTGC CATGTCCGCA 
251 CCGGCACCAT CCTGCTCGGC GTCTGGTATC TGATCATCAA TGCTGTGGTA 
301 CTGTTGATTT TATTGAGTGC CCTGGCTGAT CCGGATCAGT ATAACTTTTC 
351 AAGTTCTGAA CTGGGAGGTG ACTTTGAGTT CATGGATGAT GCCAACATGT 
401 GCATTGCCAT TGCGATTTCT CTTCTCATGA TCCTGATATG TGCTATGGCT 
451 ACTTACGGAG CGTACAAGCA ACGCGCAGCC TGGATCATCC CATTCTTCTG 
501 TTACCAGATC TTTGACTTTG CCCTGAACAT GTTGGTTGCA ATCACTGTGC 
551 TTATTTATCC AAACTCCATT CAGGAATACA TACGGCAACT GCCTCCTAAT 
601 TTTCCCTACA GAGATGATGT CATGTCAGTG AATCCTACCT GTTTGGTCCT 
651 TATTATTCTT CTGTTTATTA GCATTATCTT GACTTTTAAG GGTTACTTGA 
701 TTAGCTGTGT TTGGAACTGC TACCGATACA TCAATGGTAG GAACTCCTCT 
751 GATGTCCTGG TTTATGTTAC CAGCAATGAC ACTACGGTGC TGCTACCCCC 
801 GTATGATGAT GCCACTGTGA ATGGTGCTGC CAAGGAGCCA CCGCCACCTT 
851 ACGTGTCTGC CTAAGCCTTC AAGTGGGCGG AGCTGAGGGC AGCAGCTTGA 
901 CTTTGCAGAC ATCTGAGCAA TAGTTCTGTT ATTTCACTTT TGCCATGAGC 
951 CTCTCTGAGC TTGTTTGTTG CTGAAATGCT ACTTTTTAAA ATTTAGATGT 
1001 TAGATTGAAA ACTGTAGTTT TCAACATATG CTTTGCTAGA ACACTGTGAT 
1051 AGATTAACTG TAGAATTCTT CCTGTACGAT TGGGGATATA ACGGGCTTCA 
1101 CTAACCTTCC CTAGGCATTG AAACTTCCCC CAAATCTGAT GGACCTAGAA 
1151 GTCTGCTTTT GTACCTGCTG GGCCCCAAAG TTGGGCATTT TTCTCTCTGT 
1201 TCCCTCTCTT TTGAAAATGT AAAATAAAAC CAAAAATAGA CAACTTTTTC 
1251 TTCAGCCATT CCAGCATAGA GAACAAAACC TTATGGAAAC AGGAATGTCA 
1301 ATTGTGTAAT CATTGTTCTA ATTAGGTAAA TAGAAGTCCT TATGTATGTG 
1351 TTACAAGAAT TTCCCCCACA ACATCCTTTA TGACTGAAGT TCAATGACAG 
1401 TTTGTGTTTG GTGGTAAAGG ATTTTCTCCA TGGCCTGAAT TAAGACCATT 
1451 AGAAAGCACC AGGCCGTGGG AGCAGTGACC ATCTACTGAC TGTTCTTGTG 
1501 GATCTTGTGT CCAGGGACAT GGGGTGACAT GCCTCGTATG TGTTAGAGGG 
1551 TGGAATGGAT GTGTTTGGCG CTGCATGGGA TCTGGTGCCC CTCTTCTCCT 
1601 GGATTCACAT CCCCACCCAG GGCCCGCTTT TACTAAGTGT TCTGCCCTAG 
1651 ATTGGTTCAA GGAGGTCATC CAACTGACTT TATCAAGTGG AATTGGGATA 
1701 TATTTGATAT ACTTCTGCCT AACAACATGG AAAAGGGTTT TCTTTTCCCT 
1751 GCAAGCTACA TCCTACTGCT TTGAACTTCC AAGTATGTCT AGTCACCTTT 
1801 TAAAATGTAA ACATTTTCAG AAAAATGAGG ATTGCCTTCC TTGTATGCGC 
1851 TTTTTACCTT GACTACCTGA ATTGCAAGGG ATTTTTATAT ATTCATATGT 
1901 TACAAAGTCA GCAACTCTCC TGTTGGTTCA TTATTGAATG TGCTGTAAAT 
1951 TAAGTCGTTT GCAATTAAAA CAAGGTTTGC CCACATCCAA AAAAAAAAAA 
2001 AAAAA 


BLAST Results 


Entry HS012351 from database EMBL: 
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human STS SHGC-31823. 
Score = 1629, P * 3.1e-67, identities = 343/354 


Medline entries 


96199248: 

Identification of a novel membrane transporter 
associated with intracellular membranes by 
phenotypic complementation in the yeast 
Saccharomyces cerevisiae. 


Peptide information for frame 1 


ORF from 184 bp to 861 bp.- peptide length; 226 
Category: strong similarity to known protein 


1 MKMVAPWTRF YSNSCCLCCH VRTGTILLGV WYLIINAVVL LILLSALADP 
51 DQYNFSSSEL GGDFEFMDDA NMCIAIAISL LMILICAMAT YGAYKQRAAW 
101 IIPFFCYQIF DFALNMLVAI TVLIYPNSIQ EYIRQLPPNF PYRDDVMSVN 
151 PTCLVLIILL FISIILTFKG YLISCVWNCY RYINGRNSSD VLVYVTSNDT 
201 TVLLPPYOOA TVNGAAKEPP PPYVSA 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_24ell , frame 1 

SWISSPROT:MTRP HUMAN GOLGI 4 -TRANSMEMBRANE SPANNING TRANSPORTER MTP 
(KIAA0108)., n"- 1, Score = 551, P - 2.9e-53 

SWISSPROT:MTRP_M00SE GOLGI 4 -TRANSMEMBRANE SPANNING TRANSPORTER MTP., N 
- 1, Score = 539, P « 5.3e-52 

TREMBL:HS304981_1 product: "E3 protein"; Human retinoic acid-inducible 
E3 protein mRNA, complete cds-, N = 1, Score = 127, p = 3-4e-06 


>SWISSPROT:MTRP_HUMAN GOLGI 4 -TRANSMEMBRANE SPANNING TRANSPORTER MTP 
(KIAA0108) . 

Length = 233 

HSPs : 

Score = 551 <82.7 bits), Expect = 2.9e-53, P = 2.9e-53 
Identities = 102/221 (46%), Positives = 148/221 (66%) 

Query: 9 RFYSNSCCLCCHVRTGTILLGVWYLIINAWLLILLSALADPDQY NFSSSELGGDF- 64 

RFYS CC CCHVRTGTI+LG WY+++N ++ i-+L + P+ N +G + 

Sbjct: 13 RFYSTRCCGCCHVRTGTIILGTWYMVVNLLMAILLTVEVTHPNSMPAVNIQYEVIGNYYS 72 

Query: 65 -EFMDDANMCIAIAISLLMILICAMATYGAYKQRAAWIIPFFCYQIFDFALNMLVAITVL 123 

E M D N C+ A+S+LM +1 +M YGA + W+IPrFCY++FDF L+ LVAI* L 
Sbjct; 73 SERMAD-NACVLFAVSVLMFIISSMLVYGAISYQVGWLIPFFCYRLFDFVLSCLVAISSL 131 

Query: 124 lYPNSIQEYIRQLPPNFPYRDDVMSVNPTCLVLIILLFISIILTFKGYLISCVWNCYRYI 183 

Y I+EY+ QLP +FPY+DD+++++ +CL+ I+L+F ++ + FK YLI+CVWNCY+YI 
Sbjct: 132 TYLPRIKEYLDQLP-DFPYKDDLLALDSSCLLFIVLVFFALFIIFKAYLINCVWNCYKYI 190 

Query: 184 NGRNSSDVLVYVTSN-DTTVLLPPYDOATVNGAAKEPP PPYVSA 226 

N RN ++ VY +LP Y+ A V KEPPPPY+ A 

Sbjct: 191 NNRNVPEIAVYPAFEAPPQYVLPTYEMA-VKMPEKEPPPPYLPA 233 


Pedant information for DKFZphutel_24ell, frame 1 


Report for DKFZphutel_24ell.l 


[LENGTH! 226 
[MWl 25419.11 
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[pU 

4.65 


(HOMOL} 

SWI SS PROT : MTRP_HU^4AN 

GOLGI 4 -TRANSMEMBRANE SPANNING TRANSPORTER MTP (KIAA0108) 

5e-40 


I PROS I TE J 

CK2_PHOSPH0 SITE 

3 

fPROSITEl 

TYR_PHOSPHO SITE 

1 

[PROSITE) 

PKC PHOSPHO SITE 

1 

IPROSITE] 

ASN^GLYCOSYLATION 

3 

(KW] 

SIGNAL PEPTIDE 4 9 


[KW] 

TRANSMEMBRANE 2 


IKW] 

LOW COMPLEXITY 20, 

80 % 


SEQ MKMVAPWTRFYSNSCCLCCHVRTGTILLGVWYLIINAVVLLILLSALADPDQYNFSSSEL 

SEG xxxxxxxxxxxxxxxx 

PRO ccceeeeeeecccceeeeeeeeccceeecceeehhhhhhhhhhhhhhcccccceeecccc 

MEM 

SEQ GGDFEFMDDANMCI AI AISLLMI LICAMATYGAYKQRAAWI I PFFCYQI FDFALNMLVAI 

SEG XXXXXXKXXXXXXXXXXX 

PRD ccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhh 

MEM MMMMMMMMHMMMMM^Qff<MM^1MMMMMMMMMMMMMM^04^01MMMMMMHMMMMMM 

SEQ TVLIYPNSIQEYIRQLPPNFPYRDDVMSVNPTCLVLIILLFISIILTFKGYLISCVWNCY 

SEG xxxxxxxxxxxxx 

PRD hhhcccchhhhhhhhcccccccccceeeeccccceeehhhhhhhhhhhhhheeeeeeeee 

MEM MMMMMM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ RYINGRNSSDVLVYVTSHDTTVLLPPYDDATVNGAAKEPPPPYVSA 

SEG 

PRD eecccccccceeeeeecccccccccccccccccccccccccccccc 

MEM 


Prosite for DKF2phutel_24ell.l 


PSOOOOl 
PSOOOOl 
PSOOOOl 
PS00005 
P500006 
PS00006 
PS00006 
PS00007 


54->58 
187->191 

198->202 
167->170 
56->60 
I28->132 
196->200 
186-M95 


ASH GLYCOSYLATION 
ASNGLYCOSYLATION 
ASN_GLYCOSYLATION 
PKC_PHOSPHO_SITE 
CK2_PHOSPH0_S I TE 
CK2_PHOSPHO_SITE 
CK2_PHOS PH0_S I TE 
TYR_PHOSPHO SITE 


PDOCOOOOl 
PDOCOOOOl 
PDOCOOOOl 

pixk:oooo5 
pixx:oooo6 

PDOC00006 
PDOC00006 
PDOC00007 


(No Pfam data available for DKFZphutel_24ell . 1) 
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DKFZphutel_24j6 


group: cell structure and motility 

DKFZphutesl_24j6 encodes a novel 571 amino acid protein with strong similarity to rat cell 
adhesion regulator (CARD. 

The novel protein is very similar to Carl and thus seems to be involved in regulation cell- 
cell adhesion. It contains a RGD cell attachment site. 

The new protein can find application in modulation of cell-cell-adhesion. 


strong similarity to rat CARl A.thaliana T19C21.5 

complete cDNA, complete cds, EST hits 

potential frame shift at Bp 1241 according to CARl 

but frame shift might be in CARl sequence! 

ESTs T73366 AA362984 confirm this sequence 

Sequenced by Qiagen 

Locus: /map="939.9 cR from top of Chr2 linkage group" 
Insert length: 3333 bp 

Poly A stretch at pos. 3316, no poiyadenylation signal found 


1 ACGCGTCCGA GCTGGCTCAG GGCGTCCGCT AGGCTCGGAC GACCTGCTGA 
51 GCCTCCCAAA CCGCTTCCAT AAGGCTTTGC CTTTCCAACT TCAGCTACAG 
101 TGTTAGCTAA GTTTGGAAAG AAGGAAAAAA GAAAATCCCT GGGCCCCTTT 
151 TCTTTTGTTC TTTGCCAAAG TCGTCGTTGT AGTCTTTTTG CCCAAGGCTG 
201 TTGTGTTTTT AGAGGTGCTA TCTCCAGTTC CTTGCACTCC TGTTAACAAG 
251 CACCTCAGCG AGAGCAGCAG CAGCGATAGC AGCCGCAGAA GAGCCAGCGG 
301 GGTCGCCTAG TGTCATGACC AGGGCGGGAG ATCACAACCG CCAGAGAGGA 
351 TGCTGTGGAT CCTTGGCCGA CTACCTGACC TCTGCAAAAT TCCTTCTCTA 
401 CCTTGGTCAT TCTCTCTCTA CTTGGGGAGA TCGGATGTGG CACTTTGCGG 
451 TGTCTGTGTT TCTGGTAGAG CTCTATGGAA ACAGCCTCCT TTTGACAGCA 
501 GTCTACGGGC TGGTGGTGGC AGGGTCTGTT CTGGTCCTGG GAGCCATCAT 
551 CGGTGACTGG GTGGACAAGA ATGCTAGACT TAAAGTGGCC CAGACCTCGC 
601 TGGTGGTACA GAATGTTTCA GTCATCCTGT GTGGAATCAT CCTGATGATG 
651 GTTTTCTTAC ATAAACATGA GCTTCTGACC ATGTACCATG GATGGGTTCT 
701 CACTTCCTGC TATATCCTGA TCATCACTAT TGCAAATATT GCAAATTTGG 
751 CCAGTACTGC TACTGCAATC ACAATCCAAA GGGATTG6AT TGTTGTTGTT 
801 GCAGGAGAAG ACAGAAGCAA ACTAGCAAAT ATGAATGCCA CAATACGAAG 
851 GATTGACCAG TTAACCAACA TCTTAGCCCC CATGGCTGTT GGCCAGATTA 
901 TGACATTTGG CTCCCCAGTC ATCGGCTGTG GCTTTATTTC GGGATGGAAC 
951 TTGGTATCCA TGTGCGTGGA GTACGTCCTG CTCTGGAAGG TTTACCAGAA 
1001 AACCCCAGCT CTAGCTGTGA AAGCTGGTCT TAAAGAAGAG GAAACTGAAT 
1051 TGAAACAGCT GAATTTACAC AAAGATACTG AGCCAAAACC CCTGGAGGGA 
1101 ACTCATCTAA TGGGTGTGAA AGACTCTAAC ATCCATGAGC TTGAACATGA 
1151 GCAAGAGCCT ACTTGTGCCT CCCAGATGGC TGAGCCCTTC CGTACCTTCC 
1201 GAGATGGATG GGTCTCCTAC TACAACCAGC CTGTGTTTCT GGCTGGCATG 
1251 GGTCTTGCTT TCCTTTATAT GACTGTCCTG GGCTTTGACT GCATCACCAC 
1301 AGGGTACGCC TACACTCAGG GACTGAGTGG TTCCATCCTC AGTATTTTGA 
1351 TGGGAGCATC AGCTATAACT GGAATAATGG GAACTGTAGC TTTTACTTGG 
1401 CTACGTCGAA AATGTGGTTT GGTTCGGACA GGTCTGATCT CAGGATTGGC 
1451 ACAGCTTTCC TGTTTGATCT TGTGTGTGAT CTCTGTATTC ATGCCTGGAA 
1501 GCCCCCTGGA CTTGTCCGTT TCTCCTTTTG AAGATATCCG ATCAAGGTTC 
1551 ATTCAAGGAG AGTCAATTAC ACCTACCAAG ATACCTGAAA TTACAACTGA 
1601 AATATACATG TCTAATGGGT CTAATTCTGC TAATATTGTC CCGGAGACAA 
1651 GTCCTGAATC TGTGCCCATA ATCTCTGTCA GTCTGCTGTT TGCAGGCGTC 
1701 ATTGCTGCTA GAATCGGTCT TTGGTCCTTT GATTTAACTG TGACACAGTT 
1751 GCTGCAAGAA AATGTAATTG AATCTGAAAG AGGCATTATA AATGGTGTAC 
1801 AGAACTCCAT GAACTATCTT CTTGATCTTC TGCATTTCAT CATGGTCATC 
1851 CTGGCTCCAA ATCCTGAAGC TTTTGGCTTG CTCGTATTGA TTTCAGTCTC 
1901 CTTTGTGGCA ATGGGCCACA TTATGTATTT CCGATTTGCC CAAAATACTC 
1951 TGGGAAACAA GCTCTTTGCT TGCGGTCCTG ATGCAAAAGA AGTTAGGAAG 
2001 GAAAATCAAG CAAATACATC TGTTGTTTGA GACAGTTTAA CTGTTGCTAT 
2051 CCTGTTACTA GATTATATAG AGCACATGTG CTTATTTTGT ACTGCAGAAT 
2101 TCCAATAAAT GGCTGGGTGT TTTGCTCTGT TTTTACCACA GCTGTGCCTT 
2151 GAGAACTAAA AGCTGTTTAG GAAACCTAAG TCAGCAGAAA TTAACTGATT 
2201 AATTTCCCTT ATGTTGAGGC ATGGAAAAAA AATTGGAAAA GAAAAACTCA 
2251 GTTTAAATAC GGAGACTATA ATGATAACAC TGAATTCCCC TATTTCTCAT 
2301 GAGTAGATAC AATCTTACGT AAAAGAGTGG TTAGTCACGT GAATTCAGTT 
2351 ATCATTTGAC AGATTCTTAT CTGTACTAGA ATTCAGATAT GTCAGTTTTC 
2401 TGCAAAACTC ACTCTTGTTC AAGACTAGCT AATTTATTTT TTTGCATCTT 
2451 AGTTATTTTT AAAAACAAAT TCTTCAAGTA TGAAGACTAA ATTTTGATAA 
2501 CTAATATTAT CCTTATTGAT CCTATTGATC TTAAGGTATT TACAT6TATG 
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2551 TGGAAAAACA AAACACTTAA 
2601 GCTTAAAGAG CACCTTTGTA 
2651 ATGAAGCATA TGTAGCACTT 
2701 AGAAGCAAAG CTGTAAAGTA 
2751 TCAAATATGT CAATAGTTTG 
2801 AGAAGGGCAA GAATCCCAAT 
2851 TGTTGTAGAA CATGAGGGTG 
2901 AAAGCCCACA CTTGTGAAGG 
2951 ACTCAGGTAG AATATTTTTA 
3001 CTACATTGTT CTACAGCAAG 
3051 CTTTGAGAAG AATAGAAGAA 
3101 TTTAAAAGTC AGTTTGCAAC 
3151 ACCGTTTATA TGCACTTTCA 
3201 TTCTTTATCC TTGGAGTTTA 
3251 ATAATGATTT GCTATGTTGT 
3301 ATATTTTGAA AATCTTAAAA 


CTAGAATTCT CTAATAAGGT TTATGGTTTA 

TTTTTATTAT CAGATGGGGC AACATATTGT 

CACAGCATGG TTATCATGTA AGCTGCAGGT 

GATTTATCAC ACAATGACTG CATACAGACT 

GTCATAGAAC CTAGAAGCCA AAAGCCACAC 

TTAACTCATG TTATCATCAT TAGTGATCTG 

TAAGCCTTCA GCCTGGCAAG TTACATGTAG 
TTTTGTTTTA CAAATCACTT GATTTAACAC 

TTTTTACTGT TTTATACCCA GAAGTTATTT 

AATATTCATA AAAGTATCCC TTTCAAATGC 

AAAAAGTTTG TATATATTTT AAAAAATTGT 

ATGTCTGTAC CAAGATGGTA CTTTGCCTTA 

TGGAGACTGC AATACGTTGC TATGAGCACT 

ATCCTTTGCT TCATCTTTCT ACAGTATGAC 

AAAATCTTTG TAAAAAATTT CTATATAAAA 

AAAAAAAAAA AAA 


BLAST Results 


Sntry HS389210 from database EMBL: 
human STS SHGC-10164. 

Score = 1592, P = 1.5e-64, identities « 346/364 

Entry HS933343 from database EMBL: 
human STS WI-16551. 
Score = 1193, P = 5.7e-46, identities = 241/244 


Medline entries 


No Medline entry 


Peptide information for frame 3 


ORF from 315 bp to 2027 bp; peptide length: 571 
Category: strong similarity to known protein 


1 MTRAGDHHRQ RGCCGSLADY 
51 VELYGNSLLL TAVYGLWAG 
101 VSVILCGIIL MMVFLHKHEL 
151 AITIQRDWIV WAGEDRSKL 
201 PVIGCGFISG WNLVSMCVEY 
251 LHKDTEPKPL EGTHLMGVKD 
301 SYYNQPVFLA GMGLAFLYMT 
351 ITGIMGTVAF TWLRRKCGLV 
401 SVSPFEDIRS RFIQGESITP 
451 PIISVSLLFA GVIAARIGLW 
501 YLLDLLHFIM VILAPNPEAF 
551 FACGPDAKEV RKENQANTSV 


LTSAKFLLYL GHSLSTWGDR MWHFAVSVFL 

SVLVLGAIIG DWVDKNARLK VAQTSLWQN 
LTMYHGWVLT SCYILIITIA NIANLASTAT 
ANMNATIRRI DQLTNILAPM AVGQIMTFGS 
VLLWKVYQKT PALAVKAGLK EEETELKQLN 
SNIHELEHEQ EPTCASQMAE PFRTFRDGWV 
VLGFDCITTG YAYTQGLSGS ILSILMGASA 
RTGLISGLAQ LSCLILCVIS VFMPGSPLDL 
TKIPEITTEI YMSNGSNSAN IVPETSPESV 
SFDLTVTQLL QENVIESERG IINGVQNSMN 
GLLVLISVSF VAMGHIMYFR FAQNTLGNKL 
V 


BLASTP hits 


No BLASTP hits available 


Alert BLASTP hits for DKFZphutel_24 j6, frame 3 

TREMBLNEW:U76714_1 gene: "CARl"; product: "cell adhesion regulator"; 
Rattus norvegicus cell adhesion regulator (CARD mRNA, complete cds., N 
= 1, Score = 1472, P = 7.2e-151 

TR£MBL:AC004683_5 gene: **T19C21 . 5**; Arabidopsis thaliana chromosome II 
BAG T19C21 genomic sequence, complete sequence., N « 2, Score = 437, P 
= 2.8e-60 

TR£MBL:AF039046_2 gene: "R09B5.4'*; Caenorhabditis elegans cosmid 
R09B5., N - 2, Score - 323, P - 1.5e-43 


>TREMBLNEW:U76714_1 gene: "CARl"; product: "cell adhesion regulator**; 

Rattus norvegicus cell adhesion regulator (CARl) mRNA, complete cds. 
Length "405 
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HSPs; 


Score = 1472 (220.9 bits). Expect = 7.2e-151, P « 7.2e-151 
Identities * 288/319 (90%), Positives = 297/319 (93%) 


Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 


1 MTRAGDHNRQRGCCGSLADYLTSAKFLLYLGHSLSTWGDRMWHFAVSVFLVELYGNSLLL 60 

MT++ D Q GCCGSLA+YLTSAKFLLYLGHSLSTWGDRHWHFAVSVFLVELYGNSLLL 
1 MTKSRDQTHQEGCCGSLANYLTSAKFLLYLGHSLSTWGDRMWHFAVSVFLVELYGNSLLL 60 

61 TAVYGLVVAGSVLVLGAIIGDWVDKNARLKVAQTSLWQNVSVILCGIILMMVFLHKHEL 120 

TAVYGLWAGSVLVLGAIIGDWVDKNARLKVAQTSLVVQNVSVILCGIILMMVFLHK+EL 
61 TAVYGLVVAGSVLVLGAIIGDWVDKNARLKVAQTSLVVQNVSVILCGIILMMVFLHKNEL 120 

121 LTMYHGWVLTSCYILIITIANIAMLASTATAITIQRDWIWVAGEDRSKLANMNATIRRI 180 

L MYHGWVLT CYILIITIANrANLASTATAITIQRDWIWVAGE+RS+LA+MNATIRRI 
121 LNMYHGWVLTVCYILIITIANIANLASTATAITIQRDWIVWAGENRSRLADMNATIRRI 180 

181 DQLTNILAPMAVGQIMTFGSPVIGCGFISGWNLVSMCVEYVLLWKVYQKTPALAVKAGLK 240 

DQLTNILAPMAVGQIMTFGSPVIGCGFISGWNLVSMCVEY LLWKVYQKTPALAVKA LK 
181 DQLTNILAPMAVGQIMTFGSPVIGCGFISGWNLVSMCVEYFLLWKVYQKTPALAVKAALK 240 

241 EEETELKQLNLHKDTEPKPLEGTHLMGVKDSNIHELEHEQEPTCASQMAEPFRTFRDGWV 300 

EE+ELKQL KDTEPKPLEGTHLMG KDSNI ELE EQEPTCASQ+AEPFRTFRDGWV 
241 VEESELKQLTSPKDTEPKPLEGTHLMGEKDSNIRELECEQEPTCASQIAEPFRTFRDGWV 300 

301 SYYNQPVFLAGMGLAF-LY 318 

SYYNQPVFL G F LY 
301 SYYNQPVFLGWHGPGFPLY 319 


Pedant information for DKFZphutel_24 j6, frame 3 


Report for DKFZphutel_24 j6. 3 


[LENGTH] 

IMWJ 

Ipl! 

[HOMOL] 

norvegicus 

( BLOCKS! 

tPROSITEJ 

[PROSITE] 

(PROSITE] 

(PROSITE] 

(PROSITE] 

(PROSITE] 

[PFAM] 

[KW] 

(KW) 


571 

62542.72 
6-08 

TREMBL:U76714_1 gene: •*CAR1"; product: "cell adhesion regulator" 
cell adhesion regulator (CARl) mRNA, complete cds. le-141 

BL00341D 

MYRISTYL 15 
MITOCH_CARRIER 1 
CK2_PH0SPH0_SITE 6 
PROKARLIPOPROTEIN 1 
PKC_PHOSPHO__SITE 4 
ASNGLYCOSYLATION 4 
Laminin B (Domain IV) 
TRANSMEMBRANE 4 
LOW COMPLEXITY 8.76 % 


Rattus 


SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 

SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 


MTRAGDHNRQRGCCGSLADYLTSAKFLLYLGHSLSTWGDRMWHFAVSVFLVELYGNSLLL 

ccccccccccccccccchhhhhhhheeeeccceeecccchhhhhhhhheeeeecccccee 
MMMMMMMMMMMMM 

tavyglvvagsvlvlgaiigdwvdknarlkvaqtslvvqnvsvilcghlmmvflhkhel 

.xxxxxxxxxxxxxxxx , 

ehhhhhhhccceeeeccccccchhhhhhhhhhhhheeeccchhhhhhhhhhhhhhhtihhh 
mmmmmmmmnmmmmmmm mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm 

ltmyhgwvltscyiliitianianlastataitiqrdwivwagedrsklanmnatirri 

xxxxxxxxxxxxxxxxxxxxx 

hhcccccchhhhhhhhhhhhhhhlihhhhhheeeeccceeeeeeccccchhhhhhhhhhhh 
mmmmmmm 

dqltnilapmavgqimtfgspvigcgfisgwnlvsmcveyvllwkvyqktpalavkaglk 
hhhhhhccceeeceeeeeecceeeeeeeeccchhhhhhhhhhhhhhhcccchhhhhhhhh 


eeetelkqlnlhkdtepkplegthlmgvkdsniheleheqeptcasqmaepfrtfrdgwv 
hhhhhhhhhhccccccccccceeeeeecccccccccccccccccccccccccccccccee 

syynqpvflagmglaflymtvlgfdcittgyaytqglsgsilsilmgasaitgimgtvaf 
eeecceeeecccchhhhhhcccccceeeeeeeeccccceeeeeeecccceeeeehhhlihh 
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MEM 


SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRO 
MEM 


TWLRRKCGLVRTGLISGLAQLSCLILCVISVFMPGSPLDLSVSPFEDIRSRFIQGESITP 

XXX 

hhhhhhccccccccchhhhhhhhhhhhhhhhcccccccccccccchhhhhhccccccccc 


TKIPEITTEIYMSNGSNSANIVPETSPESVPIISVSLLFAGVIAARIGLWSFDLTVTQLL 

xxxxxxxxxx 

ccccccceeeeecccccccccccccccccceeeeeehhhhhhhhhhcccchhhhhhhhhh 
MMMMMMMMMMMMMMMMMMMMNMMM 

QENVIESERGIINGVQNSMNYLLDLLHFIMVILAPNPEAFGLLVLISVSFVAMGHIMYFR 

hhhhhccccceeeecccchhhhhhhhhhheeeeeccccccceeeeeeeeccccccceeee 
MtOIMMMMMMMMMMMMMMMMMMMMMMMN . . . 

FAQNTLGNKLFACGPDAKEVRKENQANTSW 

eecccccceeeeccccchhhhhhhhcccccc 


Prosite for DKF2phutel_24j6.3 


PSOOOOl 
PSOOOOl 
PSOOOOl 
PSOOOOl 
PS00003 
PS00005 
PS00005 
PS00005 
PS00006 
PSOG006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00006 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00013 
PS00215 


100->104 
174->178 

434- >43a 
567->571 

23->26 
176->179 
294->297 
487->490 
16->20 
36->40 
294->298 
396'>400 

403- >407 
445->449 

12->18 
65->71 
76->82 
193->199 
267->273 
311->317 
336->342 
339->345 
353->359 
368->374 
373->379 

435- >441 
461->467 
490->496 
494->500 
122->133 

404- >414 


ASN^GLYCOSYLATION 

ASN GL YCOS YLAT I ON 

ASN_GLYCOS YLAT I ON 

ASN_GLYCOSYLATION 

PKC_PHOS PHO_S I TE 

PKC_PH0SPHO_SITE 

PKCPHOS PHO_S I TE 

PKC_PHOS PHO_S I TE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPHO_SITE 

CK2_PH0SPH0_S ITE 

CK2_PH0SPH0_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

PROKARLI POPROTEIN 
MITOCH CARRIER 


PDCXZOOOOl 
PDOCOOOOl 
PDOCOOOOl 
PDOCOOOOl 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOCOOOOB 
PDOC00008 
PDOCOOOOB 
PDOC00008 
PDOCOOOOB 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOCOOOOB 
PDOCOOOOB 
PDOC00013 
PDOC00189 


Pfam for DKF2phutel_24j6.3 


HMM_NAME Laminin B (Domain IV) 

KMM *YWRlPERFLGDQvTsYGGkLe* 

Y+R + LG+++ + G + + 
Query 538 YFRFAQNTLGNKLFACGPDAK 
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DKFZphutel_2h3 


group: differentiation/development 

DKFZphutel_2h3 encodes a novel 267 amino acid protein, with similarity to ITM2 (integral 
membrane protein 2) of chicken and mouse. 

The novel protein contains a prenyl group binding site {CAAX box) and seems to be post- 
translationaliy modified by the attachment of either a farnesyl or a geranyl-geranyl group. 
The similar gallus G. protein E25 a marker for chondro-osteogenic differentiation. 

The new protein can find application as a useful marker for chondro-osteogenic cell 
differentiation and for the modulation of chondro-osteogenic cell differentiation. 


strong similarity to mouse E25 and gallus E3-16 
complete cDNA, EST hits 

complete cds according to E25 start at Bp 56 
putative transmembrane protein (1 TM) 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 2033 bp 

Poly A stretch at pos. 2007, polyadenyiation signal at pos. 1986 


1 GGACCGAGGC TGCACCGGCA GAGGCTGCGG GGCGGACGCG CGGGCCGGCG 
51 CAGCCATGGT GAAGATTAGC TTCCAGCCCG CCGTGGCTGG CATCAAGGGC 
101 GACAAGGCTG ACAAGGCGTC GGCGTCGGCC CCTGCGCCGG CCTCGGCCAC 
151 CGAGATCCTG CTGACGCCGG CTAGGGAGGA GCAGCCCCCA CAACATCGAT 
201 CCAAGAGGGG GAGCTCAGTG GGCGGCGTGT GCTACCTGTC GATGGGCATG 
251 GTCGTGCTGC TCATGGGCCT CGTGTTCGCC TCTGTCTACA TCTACAGATA 
301 CTTCTTTCTT GCACAGCTGG CCCGAGATAA CTTCTTCCGC TGTGGTGTGC 
351 TGTATGAGGA CTCCCTGTCC TCCCAGGTCC GGACTCAGAT GGAGCTGGAA 
401 GAGGATGTGA AAATCTACCT CGACGAGAAC TACGAGCGCA TCAACGTGCC 
4 51 TGTGCCCCAG TTTGGCGGCG GTGACCCTGC AGACATCATC CATGACTTCC 
501 AGCGGGGTCT GACTGCGTAC CATGATATCT CCCTGGACAA GTGCTATGTC 
551 ATCGAACTCA ACACCACCAT TGTGCTGCCC CCTCGCAACT TCTGGGAGCT 
601 CCTCATGAAC GTGAAGAGGG GGACCTACCT GCCGCAGACG TACATCATCC 
651 AGGAGGAGAT GGTGGTCACG GAGCATGTCA GTGACAAGGA GGCCCTGGGG 
701 TCCTTCATCT ACCACCTGTG CAACGGGAAA GACACCTACC GGCTCCGGCG 
751 CCGGGCAACG CGGAGGCGGA TCAACAAGCG TGGGGCCAAG AACTGCAATG 
801 CCATCCGCCA CTTCGAGAAC ACCTTCGTGG TGGAGACGCT CATCTGCGGG 
851 GTGGTGTGAG GCCCTCCTCC CCCAGAACCC CCTGCCGTGT TCCTCTTTTC 
901 TTCTTTCCAG CTGCTCTCTG GCCCTCCTCC TTCCCCCTGC TTAGCTTGTA 
951 CTTTGGACGC GTTTCTATAG AGGTGACATG TCTCTCCATT CCTCTCCAAC 
1001 CCTGCCCACC TCCCTGTACC AGAGCTGTGA TCTCTCGGTG GGGGGCCCAT 
1051 CTCTGCTGAC CTGGGTGTGG CGGAGGGAGA GGCGATGCTG CAAAGTGTTT 
1101 TCTGTGTCCC ACTGTCTTGA AGCTGGGCCT GCCAAAGCCT GGGCCCACAG 
1151 CTGCACCGGC AGCCCAAGGG GAAGGACCGG TTGGGGGAGC CGGGCATGTG 
1201 AGGCCCTGGG CAAGGGGATG GGGCTGTGGG GGCGGGGCGG CATGGGCTTC 
1251 AGAAGTATCT GCACAATTAG AAAAGTCCTC AGAAGCTTTT TCTTGGAGGG 
1301 TACACTTTCT TCACTGTCCC TATTCCTAGA CCTGGGGCTT GAGCTGAGGA 
1351 TGGGACGATG TGCCCAGGGA GGGACCCACC AGAGCACAAG AGAAGGTGGC 
1401 TACCTGGGGG TGTCCCAGGG ACTCTGTCAG TGCCTTCAGC CCACCAGCAG 
14 51 GAGCTTGGAG TTTGGGGAGT GGGGATGAGT CCGTCAAGCA CAACTGTTCT 
1501 CTGAGTGGAA CCAAAGAAGC AAGGAGCTAG GACCCCCAGT CCTGCCCCCC 
1551 AGGAGCACAA GCAGGGTCCC CTCAGTCAAG GCAGTGGGAT GGGCGGCTGA 
1601 GGAACGGGGC AGGCAAGGTC ACTGCTCAGT CACGTCCACG GGGGACGAGC 
1651 CGTGGGTTCT GCTGAGTAGG TGGAGCTCAT TGCTTTCTCC AAGCTTGGAA 
1701 CTGTTTTGAA AGATAACACA GAGGGAAAGG GAGAGCCACC TGGTACTTGT 
1751 CCACCCTGCC TCCTCTGTTC TGAAATTCCA TCCCCCTCAG CTTAGGGGAA 
1801 TGCACCTTTT TCCCTTTCCT TCTCACTTTT GCATGTTTTT ACTGATCATT 
1851 CGATATGCTA ACCGTTCTCA GCCCTGAGCC TTGGAGAGGA GGGCTGTAAC 
1901 GCCTTCAGTC AGTCTCTGGG GATGAAACTC TTAAATGCTT TGTATATTTT 
1951 CTCAATTAGA TCTCTTTTCA GAAGTGTCTA TAGAACAATA AAAATCTTTT 
2001 ACTTCTGAAA AAAAAAAAAA AAAAGGGCGG CCG 


BLAST Results 


Entry B64417 from database EMBL: 

CIT-hsp-2023A7.tr cit-HSP Homo sapiens genomic clone 2023A7. 

Length = 715 
Plus Strand HSPs: 
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Score = 1546 {232.0 bits). Expect - 7.8e-64, P = 7.8e-64 
Identities = 310/311 (99%) 


Medline entries 


96325063: 

Isolation of markers for chondro-osteogenic differentiation using cDNA 
library subtraction. 

Molecular cloning and characterization of a gene belonging to a novel 
multigene family of 

integral membrane proteins. 


Peptide information for frame 2 


ORF from 56 bp to 856 bp; peptide length: 267 
Category: strong similarity to known protein 


1 MVKISFQPAV AGIKGDKADK ASASAPAPAS ATEILLTPAR EEQPPQHRSK 
51 RGSSVGGVCY LSMGMWLLM GLVFASVYIY RYFFLAQLAR DNFFRCGVLY 
101 EDSLSSQVRT QMELEEDVKI YLDENYERIN VPVPQFGGGD PADIIHDFQR 
151 GLTAYHDISL DKCYVIELNT TIVLPPRNFW ELLMNVKRGT YLPC2TYIIQE 
201 EMVVTEHVSD KEALGSFIYH LCNGKDTYRL RRRATRRRIN KRGAKNCNAI 
251 RHFENTFWE TLICGW 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_2h3, frame 2 

SWISSNEW:ITMB_CHICK INTEGRAL MEMBE<ANE PROTEIN 2B (TRANSMEMBRANE PROTEIN 
E3-16)., N = 1, Score = 573, P = 1.3e-55 

SWISSNEW:ITMB MOUSE INTEGRAL MEMBRANE PROTEIN 2B {E25B PROTEIN)., N = 
1, Score = 560, P = 3.2e-54 

SWISSNEW:ITMA_HUMAN INTEGRAL MEMBRANE PROTEIN 2A {E25 PROTEIN)., N = 1, 
Score = 456, P = 3.3e-43 


>SWISSNEW:ITMB_CHICK INTEGRAL MEMBRANE PROTEIN 2B (TRANSMEMBRANE PROTEIN 
E3-16) . 

Length - 262 

HSPs: 

Score = 573 (86.0 bits), Expect = 1.3e-55, P = 1.3e-55 
Identities « 117/264 (44%), Positives = 172/264 (65%) 

Query: 1 mvkisfqpavagikgdkadkasasapapasateilltpae«:eqppqhrskrgssvggvcy 60 

MVK+SF A+A + A+K ++ ++L+ P ++P G 

Sbjct: 1 MVKVSFNSALA—HKEAANKEEENS QVLILPPDAKEPEDWVPAGHKRAWCWC 51 

Query: 61 LSMGMVVLLMGLVFASVYIYRYFFLAQLARDNFFRCGVLY-EDSLS SQVRTQM— 112 

+ G+ +L G++ Y+Y+YF Q + CG+ Y ED LS +Q+++ 

Sbjct: 52 MCFGLAFMLAGVILGGAYLYKYFAFOQ GGVYFCGIKYIEDGLSLPESGAQLKSARYH 108 

Query: 113 ELEEDVKIYLDENYERINVPVPQFGGGDPADIIHDFQRGLTAYHDISLDKCYVIELNTTI 172 

+E++++I +E+ E I+VPVP+F DPADI+HDF R LTAY D+SLDKCYVI LNT++ 
Sbjct: 109 TIEQNIQILEEEDVEFISVPVPEFADSDPADIVHDFHRRLTAYLDLSLDKCYVIPLNTSV 168 

Query: 173 VLPPRNFWELLMNVKRGTYLPQTYIIQEEMVVTEHVSDKEALGSFIYHLCNGKDTYRLRR 232 

V+PP+NF ELL+N+K GTYLPQ+Y+I E+M+VT+ + + + LG FIY LC GK+TY+L+R 
Sbjct: 169 VMPPKNFLELLINIKAGTYLPQSYLIHEQMIVTDRIENVDQLGFFIYRLCRGKETYKLQR 228 

Query: 233 RATRRRINKRGAKNCNAIRHFENTFVVETLIC 264 

+ + I KR A NC IRHFEN F +ETLIC 
Sbjct: 229 KEAMKGIQKREAVNCRKIRHFENRFAMETLIC 260 


Pedant information for DKF2phutel_2h3, frame 2 
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Report for DKF2phutel_2h3 , 2 


(LENGTH! 

(MWJ 

Ipl] 

[HOMOLl 

le-49 

I PROS I TE] 

(PROSITE] 

tPROSITE) 

{PROSITEl 

[PROSITE) ■ 

I PROSITE] 

I PROSITE J 

IKW] 

IKW] 


267 

30253.96 
8.16 

SWISSNEW:ITMB_CHICK INTEGRAL MEMBRANE PROTEIN 2B I TRANSMEMBRANE PROTEIN E3-16) . 

hyristyl 4 
prenylation 1 
camp_phospho_site 3 
ck2 phospho site 3 
tyr"phospho~site 1 

PKC_PHOSPHO_SITE 4 
ASN_GLYCOSYLATION 1 

transmembrane 1 

LOW complexity 15.36 % 


seq mvkisfqpavagikgdkadkasasapapasateilltpareeqppqhrskrgssvggvcy 

SEG xxxxxxxxxxxxxxxx 

pro ccccccccchhhhhhhhhhhhhhhhhccccccceeecccccccccccccccccccccchh 

MEM MMMM 

SEQ lsmgmwllmglvfasvyiyryfflaqlardnffrcgvlyedslssqvrtqmeleedvki 

SEG . .xxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhcchhhhhhhhhhccceeeeeecccccccchhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMMM 

SEQ yldenyerinvpvpqfgggdpadiihdfqrgltayhdisldkcyvielnttivlpprnfw 

SEG 

PRD hhcccceeeeccccccccccccchhhhhhhhhhhhhhhcccceeeeeccceeecccchhh 

mem 


SEQ ellmnvkrgtylpqtyi iqeemwtehvsdkealgsfi yhlcngkdtyrlrrratrrrin 

xxxxxxxxxxxx 

PRD hhhhhhcccccccceeeeehhhhhhhccccchhhhhheeeccccchhhhhhhhhhhhhhh 

MEM 

SEQ krgakncnairhfentfvvetlicgvv 

SEG XX 

PRD hhhhccceeeecccchhhhhheeeccc 

MEM 


Prosite for DKFZphutel_2h3 . 2 


PSOOOOl 

169->173 

PS00004 

50->54 

PS00004 

187->191 

PSOO0O4 

232->236 

PSOO0O5 

49->52 

PS00005 

209->212 

PS00005 

227->230 

PS00005 

235->238 

PS00006 

30->34 

PSOO0O6 

H0->114 

PS00006 

209->213 

PS00007 

119->127 

PS00008 

52->58 

PS00008 

71->77 

PS00008 

138->144 

PS00008 

243->249 

PS00294 

264->268 


ASN_GL YCOS YLAT I ON 

CAMP PHOSPHO SITE 

CAMPPHOS PH0"sITE 

CAMPPHOS PHO_S ITE 

PKC_PHOSPHO_SITE 

PKC PH0SPH0_SITE 

PKC][PHOS PHO_S ITE 

PKC_PHOSPHO_S I TE 

CK2PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PHOSPHO_SITE 

TYR_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

PRENYLATION 


PDOCOOOOl 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00003 
PDOCOOOOd 
PDOC00266 


{No Pfam data available for DKFZphutel 2h3.2) 
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DKFZphmcf l^lall 


group: transmembrane protein 

DKFZphmcf l_lall encodes a novel 393 amino acid protein with weak similarity to S.pombe 
SPBC29A3_3 protein and S. cerevisiae putative membrane protein YDR255c. 

The novel protein contains 1 transmembrane region. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of mammary carcinoma- 
specific genes and as a new marker for mammary carcinoma cells- 


similarity to yDR255c and SPBC29A3.03c 
membrane regions: 1 

Summary DKFZphmcf l la 11 encodes a novel 393 amino acid protein, with 
similarity to YDR255c and SPBC29A3.03c, 


similarity to YDR255c and SPBC29a3.03c 

complete cDNA, complete cds, EST hits 

potential start at Bp 110 matches kozak consensus 

Sequenced by DKFZ 

locus: /map=-542.7 cR from top of Chr5 linkage group- 
Insert length: 1819 bp 

Poly A stretch at pos. 1808, no polyadenylation signal found 


1 cccggcccag cccccgaaga gccgcctcag ccggggggag ttgctcggac 
51 tcaaacgtcc agtcctcgtg cgaccgcgct gggtcggaag tgagcaggct 
101 gaggccacca tggagcagtg tgcgtgcgtg gagagagagc tggacaaggt 

151 CCTGCAGAAG TTCCTGACCT ACGGGCAGCA CTGTGAGCGG AGCCTGGAGG 

201 AGCTGCTGCA CTACGTGGGC CAGCTGCGGG CTGAGCTGGC CAGCGCAGCC 

251 CTCCAGGGGA CCCCTCTCTC AGCCACCCTC tctctggtga tgtcacagtg 

301 ctgccggaag atcaaagata cggtgcagaa actggcttcg gaccataagg 

351 acattcacag cagtgtatcc cgagtgggca aagccattga caggaacttc 

401 gactctgaga tctgtggtgt tgtgtcagat gcggtgtggg acgcgcggga 

451 acagcagcag cagatcctgc agatggccat cgtggaacac ctgtatcagc 

501 AGGGCATGCT CAGCGTGGCC GAGGAGCTGT GCCAGGAATC AACGCTGAAT 
551 GTGGACTTGG ATTTCAACCA GCCTTTCCTA GAGTTGAATC GAATCCTGGA 
601 AGCCCTGCAC GAACAAGACC TGGGTCCTGC GTTGGAATGG GCCGTCTCCC 
651 ACAGGCAGCG CCTGCTGGAA CTCAACAGCT CCCTGGAGTT CAAGCTGCAC 
701 CGACTGCACT TCATCCGCCT CTTGGCAGGA GGCCCCGCGA AGCAGCTGGA 
751 GGCCCTCAGC TATGCTCGGC ACTTCCAGCC CTTTGCTCGG CTGCACCAGC 
801 GGGAGATCCA GGTGATGATG GGCAGCCTGG TGTACCTGCG GCTGGGCTTG 
851 GAGAAGTCAC CCTACTGCCA CCTGCTGGAC AGCAGCCACT GGGCAGAGAT 
901 CTGTGAGACC TTTACCCGGG ACGCCTGTTC CCTGCTGGGG CTTTCTGTGG 
951 AGTCCCCCCT TAGCGTCAGC TTTGCCTCTG GCTGTGTGGC GCTGCCTGTG 
1001 TTGATGAACA TCAAGGCTGT GATTGAGCAG CGGCAGTGCA CTGGGGTCTG 
1051 GAATCACAAG GACGAGTTAC CGATTGAGAT TGAACTAGGC ATGAAGTGCT 
1X01 GGTACCACTC CGTGTTCGCT TGCCCCATCC TCCGCCAGCA GACGTCAGAT 
1151 TCCAACCCTC CCATCAAGCT CATCTGTGGC CATGTTATCT CCCGAGATGC 
1201 ACTCAATAAG CTCATTAATG GAGGAAAGCT GAAGTGTCCC TACTGTCCCA 
1251 TGGAGCAGAA CCCGGCAGAT GGGAAACGCA TCATATTCTG ATTCCTACCT 
1301 GGAAGGAATT TTGTTGAAAG GGGTTTTCAC CTGTGAGCCT TGGTCTGrCT 
1351 CGGTAGGGTG GTCAACTTCA GTGGACTGTG GTTGGTTTCA GAGCGCCTGG 
1401 CTGAGGAGTT CCACTGAGGG GAGCACTGGA GCAGCCCTTT GGCAGAGGCT 
1451 GAGGAGGGAG ATGGACCAGC CCACGCCTGG CACCTGGCTC CATGGCATAA 
1501 GGAAAGGGAG ATGCTGGCCT CTGTGCTCCT GCTGTCTTTT CCTGTTTCTG 
1551 TTTGCGTTTG ACTTAGTAGC AACCGACAGA GTGGCAAGGG ATTTGGTCTT 
1601 CAGCAGTAGA CATCCTTCCA CCCCTGCCCT CAGCCAAGTC TCTTGCTGCC 
1651 ATGCCAATGC TATGTCCACC CTTGCCCCTC GGCCCAAGAG TGTCCAGCGG 
1701 TGGCCCACCT CTTCCTCCCA CTACAGCCTC AACAGTATGT ACCATCTCCC 
1751 ACTGTAAATA GTCCCAGTTA GAACGGAATG CCGTTGTTTT ATAACTTTGA 
1801 ACAAATGTAA AAAAAAAAA 


BLAST Results 


Entry HS579359 from database EMBL: 
human STS WI-6350. 
Score « 1027, P = 9.9e-40, identities - 207/209 
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Medline entries 


No Medline entry 


Peptide information for frame 2 


ORF from HO bp to 1288 bp; peptide length: 393 
Category': similarity to unknown protein 


1 MEQCACVERE LDKVLQKFLT YGQHCERSLE ELLHYVGQLR AELASAALQG 
51 TPLSATLSLV MSQCCRKIKD TVQKLASDHK DIHSSVSRVG KAIDRNFDSE 
101 ICGWSDAVW DAREQQQQIL QMAIVEHLYQ QGMLSVAEEL CQESTLNVDL 
151 DFKQPFLELN RILEALHEQD LGPALEWAVS HRQRLLELNS SLEFKLHRLH 
201 FIRLLAGGPA KQLEALSYAR HFQPFARLHQ REIQVMMGSL VYLRLGLEKS 
251 PYCHLLDSSH WAEICETFTR DACSLLGLSV ESPLSVSFAS GCVALPVLMN 
301 IKAVIEQRQC TGVWNHKDEL PIEIELGMKC WYHSVFACPI LRQQTSDSNP 
351 PIKLICGHVI SRDALNKLIN GGKLKCPYCP MEQNPADGKR IIF 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphmcf l_lall , frame 2 

TREMBL:SPBC29A3_3 gene: '•SPBC29A3.03C'*; product: "hypothetical 
protein"; S.porabe chromosome II cosmid c29A3., N = 2, Score = 302, P = 
3.4e-42 

PIR:S67312 probable membrane protein YDR255c - yeast (Saccharomyces 
cerevisiae), N = 1, Score = 271, p = 5.3e-22 

TREMBL:CET07D1_2 gene: "T07D1.2"; Caenorhabditis elegans cosmid 
T07D1., N - 1, Score = 193, P = 5.6e-13 


>TREMBL:SPBC29A3_3 gene: "SPBC29A3.03C"; product: "hypothetical protein"; 
S.pombe chromosome II cosmid c29A3. 
Length =398 

HSPs: 

Score = 302 (45.3 bits). Expect = 3.4e-42, Sum P(2) = 3.4e-42 
Identities = 55/142 (38%), Positives = 89/142 (62%) 

Query: 252 YCHLLDSSHWAEICETFTRDACSLLGLSVESPLSVSFASGCVALPVLMNIKAVIEQRQCT 311 

Y +LD W + F R+ C+ LG+S+ESPL + +G +ALP+L+ + 
Sbjct: 258 YIDVLDLD-WKSLELLFVREFCAALGMSLESPLDIWNAGAIALPILLKMSSIMKKKHTE 316 

Query: 312 GVWNHKDELPIEIELGMKCWYHSVFACPILRQQTSDSNPPIKLICGHVISRDALNKLING 371 

W + ELP+EI L +HSVF CP+ ++Q ++ NPP+ + CGHVI +++L +L 

Sbjct: 317 — WTSQGELPVEIFLPSSYHFHSVFTCPVSKEQATEENPPMMMSCGHVIVKESLRQLSRN 374 

Query: 372 G— KLKCPYCPMEQNPADGKRIIF 393 

G + KCPYCP E AD R+ F 
Sbjct: 375 GSQRFKCPYCPNENVAADAIRVYF 398 

Score = 161 (24.2 bits). Expect = 3.4e-42, Sum P{2) = 3.4e-42 
Identities = 51/221 (23%), Positives » 102/221 (46%) 

Query: 22 GQHCERSLEELLHYVGQLRAELASAALQGTPLSATLSLVMSQCCRKIKDTVQKLASDHKD 81 

GCLEL +++L+P++LVCK+ L K 

Sbjct: 15 GNKCLAKLNEL ESILKDAKKSCLKD-PTTSMKELVA—CSEKTQQVFDDLKRTEKK 67 

Query: 82 IHSSVSRVGKAIDRNFDSEICGVVSDAVWDAREQQQQILQMAIVEHLYQQGMLSVAEELC 141 

H+S++R GK +++ F+ ++ + + +++++++ + A+ H ++QG + +A C 
Sbjct: 68 FHTSLNRFGKTLEKKFNFDLEDIKLHSSFESKKRE— IDTALSLHFFRQGDVELAHLFC 124 

Query: 142 QESTLNVDLDFKQPFLELNRILEALHEQDLGPALEWAVSHRQRLLELNSSLEFKLHRLHF 201 

+E+ + + F L I++ + +EWA R L SSLE+ L + 

Sbjct: 125 KEAGIEEPSESLHVFTLLKSIVQGIRDKDLKLPIEWASQCRGYLERKGSSLEYTLQKYRL 184 

Query: 202 IRLLAGGPAKQL-EALSYAR-HFQPFARLHQREIQVMMGSLVY 242 
+ K+A+YR+ F + H +IQ M +L + 


544 


wo 01/12659 


PCT/IBOO/01496 


Sbjct: 185 VSNYL— TTKDIMAAIRYCRTNMAEFQKKHLADIQKTMIALFF 225 
Pedant information for DKFZphmcf l_lall, frame 2 
Report for DKFZphmcf l_lall .2 


(LENGTH] 393 

[MW] 44414.77 

Ipll 6.15 

IHOMOLI TREMBL:SPBC29A3_3 gene: "SPBC29A3.03C-; product: "hypothetical protein" 

S.porabe chromosome II cosmid c29A3. 2e~39 


(FUNCATJ 99 unclassified proteins 

(PIRKW] transmembrane protein 2e-21 

[FROSITEJ MYRISTYL 2 

IPROSITE] AMIDATION 1 

(PROSITE] CK2_PH0SPHO_SITE 3 

IPROSITE) PROKARLIPOPROTEIN 1 

(PROSITE] TYRPHOSPHOSITE 3 

f PROSITE) PKC_PHOSPHO_SITE 1 

( PROSITE] ASN_GLYCOSYLATION 1 

(KW) TRANSMEMBRANE 1 


[S, cerevisiae, YDR255c) 8e-23 


SEQ MEQCACVERELDKVLQKFLTYGQHCERSLEELLHYVGQLRAELASAALQGTPLSATLSLV 

PRD ccceeehhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccchhhhh 

MEM 

SEQ MSQCCRKIKDTVQKLASDHKDIHSSVSRVGKAIDRNFDSEICGWSDAVWDAREQQQQIL 

PRD hhhhhhhhhhhhhhhhhh hhh h cccccccchhhhhccccceeee chhhhhhhhhhhhhhh 

MEM 

SEQ QMAIVEHLYCX3GMLSVAEELCQESTLNVDLDFKQPFLELNRILEALHEQDLGPALEWAVS 

PRD hhhhhhhhhhhccchhhhhhhhhhhccccccccchhhhhhhhhhhhhhccccchhhhhhh 

MEM 

SEQ HRQRLLELNSSLEFKLHRLHFIRLLAGGPAKQLEALSYARHFQPFARLHQREIQVMMGSL 

PRD hhhhhhhcccchhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

MEM 

SEQ VYLRLGLEKSPYCHLLDSSHWAEICETFTRDACSLLGLSVESPLSVSFASGCVALPVLMN 
PRD hhcccccccccccccccchhhhhhhhhhhhhhhhhhhhcccccceeeecccccchhhhhh 
MMMMMMMMMMMMMMMMMMMMMMM 

S EQ I KAV I EQRQCTGVWNHKDELPIEIELGMKCWYHSVFACPILRQQTSDSNPPI KLICGHVI 

PRD hhhhhhhhhhhcccccccccceeeeeccceeeeeeeecchhhhhccccccccccccceee 

MEM MMMMMM 

SEQ SRDALNKLINGGKLKC PYCPMEQNPADGKRI I F 

PRD eehhhhhhhccccccccccccccchhhhhcccc 


Prosite for DKFZphmcf l_lall .2 


PSOOOOl 

189->193 

PS00005 

180->183 

PS00006 

28->32 

psooooe 

135->139 

PS00006 

190->194 

PSO00O7 

211->219 

PS00007 

27->36 

PS00007 

244->253 

psooooe 

37->43 

PSOO0O8 

50->56 

PS00009 

387->391 

PS00013 

282->293 


ASNGLYCOSYLATION 

PKCPHOS PHO_S I T E 

CK2_PH0SPH0_SITE 

CK2_PH0SPHO_S I TE 

CK2_PHOSPHO_S I TE 

TYR_PHOSPHO_SITE 

TYR_PHOSPHO SITE 

TYR_PH0SPH0"SITE 

MYRISTYL 

MYRISTYL 

AMIDATION 

PROKAR LIPOPROTEIN 


PDOCOOOOl 
PDOC00005 
PDOC00006 
PDOC0O006 
PDOC00006 
PDOC00007 
PDOC00007 
PDOC00OO7 
PDOC00008 

pcocooooa 

PDOC00009 
PDOC00013 


(No Pfam data available for DKFZphmcf l_la 11 .2) 
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DKFZphmcfl_lc23 


group: mammary carcinoma derived 

DKFZphmcf l_lc23. 1 encodes a novel 311 amino acid proline rich protein. 
No informative BUVST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of mamma carcinoma- 
specific genes. 


unknown, proline rich protein 

complete cDNA, complete cds? potential start at Bp 50, EST hits 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 3077 bp 

Poly A stretch at pos. 3067, polyadenylation signal at pos. 3048 


1 AACTGGCCCC CTCCCCCACC CCCTGCCCCT GAGGAGCAGG ACCTGTCCAT 
51 GGCTGACTTC CCCCCACCAG AGGAGGCTTT TTTCTCTGTG GCCAGCCCTG 
101 AGCCTGCAGG CCCTTCAGGC TCCCCAGAGC TTGTCAGCTC CCCGGCTGCT 
151 TCGTCCTCCT CAGCTACTGC TTTGCAGATT CAGCCCCCGG GTAGCCCAGA 
201 CCCTCCTCCA GCTCCGCCAG CCCCAGCTCC TGCTAGTTCC GCCCCAGGGC 
251 ATGTGGCCAA GCTCCCTCAG AAGGAACCGG TGGGCTGTAG CAAGGGTGGT 
301 GGGCCTCCCA GGGAGGACGT AGGTGCGCCC CTGGTCACGC CCTCGCTCCT 
351 GCAGATGGTG CGGCTGCGCT CCGTGGGTGC TCCAGGAGGG GCTCCCACCC 
401 CAGCACTGGG GCCATCGGCC CCCCAGAAAC CACTGCGAAG GGCCCTGTCA 
451 GGGCGGGCCA GCCCAGTGCC TGCCCCCTCC TCAGGGCTCC ATGCTGCGGT 
501 CCGACTCAAG GCCTGCAGCC TGGCCGCCAG TGAAGGCCTC TCAAGTGCTC 
551 AGCCCAACGG ACCGCCTGAG GCAGAGCCAC GGCCTCCCCA GTCCCCTGCC 
601 TCAACGGCCA GTTTCATCTT CTCCAAGGGC TCTAGGAAGC TGCAGCTGGA 
651 GCGGCCCGTG TCCCCTGAGA CCCAGGCTGA CCTCCAGCGG AATCTGGTGG 
701 CAGAACTCCG GAGCATCTCA GAGCAGCGGC CACCCCAGGC CCCAAAGAAG 
751 TCACCTAAGG CTCCCCCACC TGTGGCCCGC AAGCCGTCTG TGGGAGTCCC 
801 CCCACCCGCC TCCCCCAGTT ACCCTCGAGC TGAGCCCCTT ACTGCTCCTC 
851 CCACCAATGG GCTCCCTCAC ACCCAGGACA GGACTAAGAG GGAGCTGGCG 
901 GAGAATGGAG GTGTCCTGCA GCTGGTGGGC CCAGAGGAGA AGATGGGCCT 
951 CCCGGGCTCA GACTCACAGA AAGAGCTGGC CTGACCACCA GGCACCTCAC 
1001 TGGCACTGCT GACCCATCCC AGAAACACAA TCTCAGGGAC CCGAGCAGCT 
1051 CCAAGGACGA GAGGATACAG CAGACACAAC CTAATAGAGA GGGCGCCTGC 
1101 AGCCTTAACC TCCACGGCCT TCGATACTTA TGCAAGCCTG GTGTTGCTCC 
1151 TGTCCTCAGA GTCATCCTGC GCTCATGCCT TTTCCCGAAT GGGTTCACCT 
1201 CTGGCAGTTG CCGCTTCAGT CTTGGCCTTA GCCTCATCTT GAAGTGGGTA 
1251 GCTGGCGGGA GAGGGTGGCT GCGCCCCCTG CTGGCCCTGA GGCTGCAGAG 
1301 TTGGGAGCAG GACACCTCAC CTGAGTTTCA TTTTTTTTCA TGTCCAAACC 
1351 ATGCACATAC TATAGTCCAG AATCAAAGCA CTTTTGAAAA GTGGCTGCAT 
1401 GGCCATCCTC CAGGGCCCAG GAAGTTGCAT TCCAAGGGCC TGTTTACATG 
1451 GCAGCAGAAT CCATCCCCGG CAGTCAGCCC ATAGCTTGGG ACCAGTCTGT 
1501 GCCCTCCTGC CCAGTCCAGT TTACTCCTCT TGGTTCCTGA AGGTGGCCAA 
1551 GTCATTGTGT TCCCACAGGC TTCTCTAGGC TGGG6GCAGG TGTGGGGCTG 
1601 TGGAATTCCA AAGCACAAAA GGTGCAGAGG GGATTGGCCT TCCTGTGCCT 
1651 CAACTCACCA ACCACCCTCC TGCCTTCCAG TTCTGCCAGG TGCTCCATGC 
1701 TGGGGACAAG TAGGAGACTG CCAGGGCCCA AAGAAATGGG TGAGCAGTAG 
1751 AGTCATCTCG GGGCACTTGG CAGTGTCAAG CACCTGCCCC TTGCCTCCTT 
1801 GACCACACTG GGGTGGGTGG GCCCCCAGCA CTTCAGAGGC AGGAGCCTTT 
1851 GGGCTGAGCA AGCACTGAGG AGGTGGATGG AAGGGAGCAT CTGGAGGGGG 
1901 GGAGCTTCCT TGAGCAGTGG GCCCAGGCCT GGCCCTCCAC ACTTCATTCT 
1951 CTGACCTTTC TCTCTCCTCA TTTCGGTGCA TGTCCTTTCT GCAGCTGCCT 
2001 TTCAGCACAG GTGGTTCCAC TGGGGGCAGC TAACGCTGAG TGACAAGGAT 
2051 GGGAAGCCAC AGGTGCATTT TACTCAAGTC TTCTCTAGTC AATGAGGGGC 
2101 ACCCAGTGCT TCTAGGGCAG GCTGGGTGGT GGTCCCCTAG GTATCAGCCT 
2151 CTCTTACTGT ACTCTCCGGG AATGTTAACC TTTCTATTTT CAGCCTGTGC 
2201 CACCTGTCTA GGCAAGCTGG CTTCCCCATT GGCCCCTGTG GGTCCACAGC 
2251 AGCGTGGCTG CCCCCCAGGG CCACCGCTTC TTTCTTGATC CTCTTTCCTT 
2301 AACAGTGACT TGGGCTTGAG TCTGGCAAGG AACCTTGCTT TTAGCTTCAC 
2351 CACCAAGGAG AGAGGTTGAC ATGACCTCCC CGCCCCCTCA CCAAGGCTGG 
2401 GAACAGAGGG GATGTGGTGA GAGCCAGGTT CCTCTGGCCC TCTCCAGGGT 
2451 GTTTTCCACT AGTCACTACT GTCTTCTCCT TGTAGCTAAT CAATCAATAT 
2501 TCTTCCCTTG CCTGTGGGCA GTGGAGAGTG CTGCTGGGTG TACGCTGCAC 
2551 CTGCCCACTG AGTTGGGGAA AGAGGATAAT CAGTGAGCAC TGTTCTGCTC 
2601 AGAGCTCCTG ATCTACCCCA CCCCCTAGGA TCCAGGACTG GGTCAAAGCT 
2651 GCATGAAACC AGGCCCTGGC AGCAACCTGG GAATGGCTGG AGGTGGGAGA 
2701 GAACCTGACT TCTCTTTCCC TCTCCCTCCT CCAACATTAC TGGAACTCTA 
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2751 TCCTGTTAGG ATCTTCTGAG CTTGTTTCCC TGCTGGGTGG GACAGAGGAC 
2801 AAAGGAGAAG GGAGGGTCTA GAAGAGGCAG CCCTTCTTTG TCCTCTGGGG 
2851 TAAATGAGCT TGACCTAGAG TAAATGGAGA GACCAA/^GC CTCTGATTTT 
2901 TAATTTCCAT AA/U^TGTTAG AAGTATATAT ATACATATAT ATATTTCTTT 
2951 AAATTTTTGA GTCTTTGATA TGTCTAAAAA TCCATTCCCT CTGCCCTGAA 
3001 GCCTGAGTGA GACACATGAA GAAAACTGTG TTTCATTTAA AGATGTTAAT 
3051 TAAATGATTG AAACTTGAAA AAAAAAA 


BLAST Results 

No BLAST result 

Medline entries 

No Medline entry 

Peptide information for frame 1 


ORF from 49 bp to 981 bp; peptide length: 311 
Category: putative protein 
Classification: unset 


1 MADFPPPEEA FFSVASPEPA GPSGSPELVS SPAASSSSAT ALQIQPPGSP 
51 DPPPAPPAPA PASSAPGHVA KLPQKEPVGC SKGGGPPRED VGAPLVTPSL 
101 LQMVRLRSVG APGGAPTPAL GPSAPQKPLR RALSGRASPV PAPSSGUIAA 
151 VRLKACSLAA SEGLSSAQPN GPPEAEPRPP QSPASTASFI FSKGSRKLQL 
201 ERPVSPETQA DLQRNLVAEL RSISEQRPPQ APKKSPKAPP PVARKPSVGV 
251 PPPASPSYPR AEPLTAPPTN GLPHTQDRTK RELAENGGVL QLVGPEEKMG 
301 LPGSDSQKEL A 

BLASTP hits 

Ko BLASTP hits available 

Alert BLASTP hits for DKFZphmcf l_lc23, frame 1 

PIR:S49915 extensin-like protein - maize, N = 1, Score - 215, P = 
6. le-15 

PIR:A28996 proline-rich protein M14 precursor - mouse, N = 1, Score = 
191, P = 3.8e-13 

>PIR: 349915 extensin-like protein - maize 
Length - 1,188 

HSPs: 

Score 215 (32.3 bits). Expect = 6. le-15, P = 6. le-15 
Identities » 81/269 (30%), Positives = 115/269 (42%) 

PPPEEAFFS VASPEPAGPSGSPELVSSPAASSSSATALQIQPPGSP— DPPP- 

PPP S V SP P P SP PA +SS ++ PP +P PPP 


PPPPASP +P P K PP + +P+PS + P 

PPPPPPAKSTPPP-EEYPT--PPTSVKSSPPPEKSLPPPTLIPSPPPQEKPTPPSTPSKP 711 

PTPALGPSAPQKPLRRA-LSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPE 174 
P+ PS P++P+ + ++SP PAP S +LA S + + PP 


PP +P +S +Q+ P +P++ L V+ + + PP AP 

SPPPAPQVKSS PPPVQVSSPPPAPKSSPPLAP — VSSPPQVEKTSPPPAPL 823" 


SP P + P V V PPP S P P+++PP P 
SPPLAPK-SSPPHVVVSSPPPVVKSSPPPAPVSSPPLTPKP 

= 206 (30.9 bits), Expect = 9.1e-14, P = 9.1e-14 


Query: 

5 

Sbjct: 

598 

Query: 

56 

Sbjct: 

655 

Query: 

116 

Sbjct: 

712 

Query: 

175 

Sbjct: 

772 

Query : 

234 

Sbjct: 

824 

Score = 

206 
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Identities = 82/261 (31%), Positives ^ 108/261 (41%) 

Query: 17 PEPAG-PSGSPELVSSPAASS SSATALQIQPPGSPDPPPAP PAPAPASSAPGHV 69 

P P G P SP + PAAS+ ST + P P+P P P P P P +P 
Sbjct: 410 PTPGGGPPSSP-VPGKPAASAPMPSPHTPPDVSPEPLPEPSPVPAPAPMPMPTPHSPPAD 468 

Query: 70 AKLPQKEPV-GCSKGGGPPRBDVGAPLVTPSLLQMVRLRSVGAPGGAPTPALGPSAPQKP 128 

+P PV G S P V P + +V+L AP G+P P + ++P P 

Sbjct: 469 DYVPPTPPVPGKSPPATSPSPQVQPPAASTPPPSLVKLSPPQAPVGSPPPPVKTTSPPAP 528 

Query: 129 LRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPCAEPRPPQSPASTAS 188 

+ G SP P P S + +K+ AG + P PPE P PP AS 
Sbjct: 529 I GSPSP-PPPVSWSPPPPVKSPPPPAPVG SPP~PPEKSPPPPAPVASPPP 577 

Query: 189 FIFSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQRPPQAPKKSPKAPPPVARKPS- 247 

+ S L PP++ VA +PPPSPP PVA P 

Sbjct: 578 PVKSPPPPTLVASPP--PPVKSPPPPAPVASPPPPVKSPPPPTPVASPPPPAPVASSPPP 635 

Query: 248 VGVPPP ASPSYPRAEPLTAPPTNGLPHTQD 277 

+ PPP +SP P P PP P ++ 
Sbjct: 636 MKSPPPPTPVSSPPPPEKSPPPPPPAKSTPPPEE 669 

Score = 202 (30.3 bits). Expect = 2.9e-13, P = 2.9e-13 
Identities = 81/254 (31%), Positives « 110/254 (43%) 

Query: 16 SPEPAGPSGSPELV— SSP— AASSSSATALQIQPPGSP-DPPPAPPAPAPASSAPGHVA 70 

SP PA P SP L SSP SS ++ PP +P PP P PA S P HV+ 
Sbjct: 817 SPPPA-PLSSPPLAPKSSPPHVWSSPPPWKSSPPPAPVSSPPLTPKPA SPPAHVS 872 

Query: 71 KLPQ KEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGAPTPALGPSAPQ 126 

P+ P + PP E +P TP L ++S P +P + P + 

Sbjct: 873 SPPEVVKPSTPPAPTTVISPPSEPKSSPPPTPVSLPPPIVKSSPPPAMVSSPPMTPKSSP 932 

Query: 127 KPLRRAL SGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEAEPRPPQSP 183 

P+ + + ++SP PAP S A K+ ALP PPE + PP +P 
Sbjct: 933 PPVVVSSPPPTVKSSPPPAPVSSPPATP— KSSPPPAPVNL P — PPEVKSSPPPTP 984 

Query: 184 ASTASFIFSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQRPPQAPKKSPKAPPPVA 243 

S+ + P PE ++ V+ + PP AP SP PPPV 

Sbjct: 985 VSSPPPAPKSSPPPAPMSSPPPPEVKSPPPPAPVSSPPPPVKSPPPPAPVSSP — PPPVK 1042 

Query: 244 RKPS VGVPPPASPSYPRAEPLTAPP 268 

P V PPP S P P+++PP 
Sbjct: 1043 SPPPPAPVSSPPPPVKSPPPPAPISSPP 1070 

Score = 190 (28.5 bits). Expect « 7.9e-12, P - 7.9e-12 
Identities « 74/264 (28%), Positives = 111/264 (42%) 

Query: 5 PPPEEAFFSVASPEPAGPSGSPELVSSPAAS-SSSATALQIQPPGSPDPPPAPPAPAPAS 63 

PPP S PE + P P +P + T+++ PP PP p+p 

Sbjct: 639 PPPPTPVSSPPPPEKSPPPPPPAKSTPPPEEYPTPPTSVKSSPPPEKSLPPPTLIPSPPP 698 

Query: 64 SAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGAPTPALGPS 12 3 

P K P K PP+E V +P TP V +P PTP P 

Sbjct: 699 QEKPTPPSTPSKPPSSPEKPS-PPKEPVSSPPQTPK—SSPPPAPVSSP— PPTPVSSPP 753 

Query: 124 APQKPLRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPCAEPRPPQSP 183 

A P+ S ++SP PAP S A ++K+ + + + P PP + PP +P 
Sbjct: 754 A-LAPVSSPPSVKSSPPPAPLSSPPPAPQVKS SPPPVQVSSP—PPAPKSSPPLAP 806 

Query: 184 ASTASFIFSKGSRKLQLERP-VSPETQADLQRNLVAELRSISEQRPPQAPKKSPKAPPPV 242 

S+ + LP ++P++ +V+ + + PP AP SP P 

Sbjct: 807 VSSPPQVEKTSPPPAPLSSPPLAPKSSPP— HWVSSPPPWKSSPPPAPVSSPPLTPKP 864 

Query: 243 ARKPS-VGVPP PASPSYPR AEPLTAPP 268 

A P+ V PP P++P p +EP ++PP 

Sbjct: 865 ASPPAHVSSPPEWKPSTPPAPTTVISPPSEPKSSPP 901 


Score = 189 (28.4 bits). Expect = l.Oe-11, P = l.Oe-11 
Identities = 86/271 (31%), Positives = 112/271 (41%) 


Query: 5 PPPEEAFFSVASPEPAGPSGSPEL-VSSP— AASSSSATALQIQPPG— SPDPPPAP 56 

PPP AS P P S P + VSSP A SS A PP PPPAP 
Sbjct: 768 PPP--APLSSPPPAPQVKSSPPPVQVSSPPPAPKSSPPLAPVSSPPQVEKTSPPPAPLSS 825 

Query: 57 PAPAPASSAPGHVAKLPQKCPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGAP 116 

P AP SS P V P PV S PP V +P +TP V +P 

Sbjct: 826 PPLAPKSSPPHWVSSPP — PWKSS PPPAPVSSPPLTPKPASPPA— HVSSPPEVV 878 

Query: 117 TPALGPSAPQKPLRRALSGRASPVPAPSSGLHAAVRLKAC-SLAASEGL— SSAQP— 169 
P+ P AP + +-^5? P P S V+ ++ +S + SS P 
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Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 


879 KPST-PPAPTTVISPPSEPKSSPPPTPVSLPPPIVKSSPPPAMVSSPPMTPKSSPPPWV 937 

170 -NGPPEAEPRPPQSPASTASFIFSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQRP 228 
+ PP + PP +P S+ + p PE ++ v+ + p 

938 SSPPPTVKSSPPPAPVSSPPATPKSSPPPAPVNLP-PPEVKSSPPPTPVSSPPPAPKSSP 996 

229 PQAPKKSPKAPPPVARKPS VGVPPPASPSYPRAEPLTAPP 268 

P AP SP PPP + P V PPP S P P+++PP 

997 PPAPMSSP--PPPEVKSPPPPAPVSSPPPPVKSPPPPAPVSSPP 1038 


Score = 181 (27.2 bits). Expect = 8.8e-ll, P = 8.8e-ll 
Identities - 73/277 (26%), Positives = 105/277 (37%) 


Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct; 
Query: 
Sbjct: 
Query: 
Sbjct: 


-PA 55 


3 DFPPPEEAFFSVASPEPAGPSGSPELVSSPAASSSSATALQIQPP GSPDPP- 

V P S SP+ V PAAS+ + +++ PP GSP PP + 
469 DYVPPTPP VPGKSPPATSPSPQ-VQPPAASTPPPSLVKLSPPQAPVGSPPPPVKTTS 524 

56 PPAPAPASSAPGHVAKL PQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGA 111 

PPAP + SPV++ PKP + GPP+ P p ++S 
525 PPAPIGSPSPPPPVSWSPPPPVKSPPPPAPVGSPPPPEKSPPPPAPVASPPPPVKSPPP 584 

112 PG— GAPTPALGPSAPQKPLRRA LSGRASPVPAPSSGLHAAVRLKACSLAASEGLSS 166 

P +PP+ PP+ + PPS AV ++ + 

585 PTLVASPPPPVKSPPPPAPVASPPPPVKSPPPPTPVASPPPPAPVASSPPPMKSPPPPTP 644 

167 AQPNGPPEAEPRPPQSPASTASFIFSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQ 226 

PPE P PP PA + + ++ PE L+ + 

645 VSSPPPPEKSP-PPPPPAKSTPPPEEYPTPPTSVKSSPPPEKSLP-PPTLIPSPPPQEKP 702 

227 RPPQAPKKSPKAPP-PVARKPSVGVPPPASPSYPRAEPLTAPP 268 

PP P K P +P P K V PP S P P+++PP 
703 TPPSTPSKPPSSPEKPSPPKEPVSSPPQTPKSSPPPAPVSSPP 745 


Score - 177 (26.6 bits). Expect » 2.6e-10, p = 2.6e-10 
Identities = 78/264 (29%), Positives = 105/264 (39%) 


Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 


56 


5 PPPEEAFFSVASPEPAGP SGSPELVSSPAASSSSATALQIQPPGSP— DPPPAP— 

PPP +P+PA P S PE+V P+ + T I PP P PPP P 

850 PPPAPVSSPPLTPKPASPPAHVSSPPEWK-PSTPPAPTTV— ISPPSEPKSSPPPTPVS 906 

57 -PAPAPASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGA 115 
P P SS P + P P PP V +P P++ V +p 

907 LPPPIVKSSPPPAMVSSPPMTPKS SPPPWVSSP— PPTVKSSPPPAPVSSPPAT 959 

116 PTPALGPSAPQKPLRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEA 175 

P + P+ P ++SP P P S A+ S +SS P PPE 

960 PKSSPPPAPVNLPPPEV-— KSSPPPTPVSSPPPAPK SSPPPAPMSSP-P— PPEV 1009 

176 EPRPPQSPASTASFIFSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQRPPQAPKKS 235 
+ PP +P S+ + P P ++ v+ + PP AP S 

1010 KSPPPPAPVSSPPPPVKSPPPPAPVSSP-PPPVKSPPPPAPVSSPPPPVKSPPPPAPISS 1068 

236 PKAPPPVARKPS— VGVPPPASPSYPRAEPLTAPP 268 
P PPPV P V PPP S P P+++PP 
1069 P— PPPVKSPPPPAPVSSPPPPVKSPPPPAPVSSPP 1102 


Score 177 (26.6 bits). Expect = 2.6e-10, P = 2,6e-10 
Identities = 82/267 (30%), Positives - 110/267 (41%) 


Query; 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 


17 PEPAG-PSGSPELVSSPAASS— SSATALQIQPPGSPDPPPAP— PAPAPASSAPGHV 69 
P P G P SP + PAAS+ ST + P P+P p p PPP +p 

410 PTPGGGPPSSP-VPGKPAASAPMPSPHTPPDVSPEPLPEPSPVPAPAPMPMPTPHSPPAD 468 

70 AKLPQKEPV-GCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGAPTPALGPSAPQKP 128 
+P PV G S P V P + +V+L AP G+P P + ++P P 

469 DYVPPTPPVPGKSPPATSPSPQVQPPAASTPPPSLVKLSPPQAPVGSPPPPVKTTSPPAP 528 

129 LRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEAEPRPPQSPASTAS 188 

+ G SP P P S + +K+ AG + P PPE P PP AS 
529 I GSPSP-PPPVSWSPPPPVKSPPPPAPVG SPP— PPEKSPPPPAPVASPPP 577 

189 FIFSKGSRKLQLERPV SPETQADLQRNLVAELRS ISEQRPPQA PK 233 

+ 5 L P SPA+ + ++S ++ PP p 

578 PVKSPPPPTLVASPPPPVKSPPPPAPVA-SPPPPVKSPPPPTPVASPPPPAPVASSPPPM 636 

234 KSPKAPPPVARKP SVGVPPPASPSYPRAEPLTAPPTN 270 

KSP P PV+ P PPP + s P E PPT+ 

637 KSPPPPTPVSSPPPPEKSPPPPPPAKSTPPPEEYPTPPTS 676 


Score = 170 (25.5 bits). Expect = 1.6e-09, p = 1.6e-09 
Identities = 78/279 (27%), Positives = 108/279 (38%) 
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Query: 5 PPPEEAFFSVASPEPAGPSGSPELVSSPAASSSSATALQIQPPGSPDPPPAPPAPAPASS 64 

PP S S + P +P + P SS A+ PP +P +PP p SS 

Sbjct: 883 PPAPTTVISPPSEPKSSPPPTPVSLPPPIVKSSPPPAMVSSPPMTPKS--SPP-PVVVSS 939 

Query: 65 APGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPG—GAPTPALGP 122 

P V P PV PP +P P L ++S P +P PA 

Sbjct: 940 PPPTVKSSPPPAPVS SPPATPKSSPPPAPVNLPPPEVKSSPPPTPVSSPPPAPKS 994 

Query: 123 SAPQKPLRRALSG--RASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEAEPRPP 180 

S P P+ ++ P PAP S V+ S +SS P PP + PP 

Sbjct: 995 SPPPAPMSSPPPPEVKSPPPPAPVSSPPPPVK SPPPPAPVSS— P— PPPVKSPPP 1046 

Query: 181 QSPASTASFIFSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQRPPQAPKKSPKAPP 240 
+ P 3+ + P P ++ V+ + PP AP SP PP 

Sbjct: 1047 PAPVSSPPPPVKSPPPPAPISSP-PPPVKSPPPPAPVSSPPPPVKSPPPPAPVSSP— PP 1103 

Query: 241 PVARKPS VGVPPPAS PSYPRAEPLTAPPTNGLPHTQDRTKREL 283 

P+ P V PPPA PS P P+ + + PP P + ++ L 
Sbjct: 1104 PIKSPPPPAPVSSPPPAPVKPPSLPPPAPVSSPPPVVTPAPPKKEEQSL 1152 

Score = 169 (25.4 bits). Expect = 2.1e-09, P = 2.1e-09 
Identities = 75/266 (28%), Positives = 104/266 (39%) 

Query: 3 DFPPPEEAFFSVASPEPAGPSGSPELVSSPAASSSSATALQIQPP GSPDPP PA 55 

D+ PP V PS SP+ V PAAS+ + +++ PP GSP PP + 

Sbjct: 469 DYVPPTPP VPGKSPPATSPSPQ-VQPPAASTPPPSLVKLSPPQAPVGSPPPPVKTTS 524 

Query: 56 PPAPAPASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGA 115 

PPAP + S P V+ + PV PP VG+P P V +P 
Sbjct: 525 PPAPIGSPSPPPPVSVVSPPPPVKSP PPPAPVGSP— PPPEKSPPPPAPVASP 575 

Query: 116 PTPALGPSAPQKPLRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEA 175 

P P P P +♦ P PAP + V+ S ++S P P + 

Sbjct: 576 PPPVKSPPPPTLVASPPPPVKSPPPPAPVASPPPPVK SPPPPTPVASPPPPAPVAS 631 

Query: 176 EPRPPQSPASTASFIFSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQRPPQAPKKS 235 

P P +SP K P P S+ PP+ 

Sbjct: 632 SPPPMKSPPPPTPVSSPPPPEKSP— PPPPPAKSTPPPEEYPTPPTSVKSSPPPEKSLPP 689 

Query: 236 PK APPPVARK — PSVGVPPPASPSYPRA—EPLTAPP 268 

P +PPP + PS PP+SP P EP+++PP 
Sbjct: 690 PTLIPSPPPQEKPTPPSTPSKPPSSPEKPSPPKEPVSSPP 729 

Score = 168 (25.2 bits). Expect = 2.7e-09, P = 2.7e-09 
Identities = 75/267 (28%), Positives = 102/267 (38%) 

Query: 2 ADFPPPEEAFFSVASPE-PAGPSGSPELVSSPAASSSSATALQIQPPGSPDPP-PAPPAP 59 

A PPP + ++ P+ P G P +SP A S + SP PP +PP P 

Sbjct: 496 ASTPPP— SLVKLSPPQAPVGSPPPPVKTTSPPAPIGSPSPPPPVSWSPPPPVKSPPPP 553 

Query: 60 APASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGAPTPA 119 

AP S P P PV PP + P + S V+ AP +P P 

Sbjct: 554 APVGSPPPPEKSPPPPAPVASPP PPVKSPPPPTLVASPPPPVKSPPPPAPVASPPPP 610 

Query: 120 LGPSAPQKPLRRALSGRASPVPAPSSGLHAAVRLKACSL-AASEGLSSAQPNGPPEAEPR 178 

+ P P+ + P PAP + ++ +S P PP A+ 

Sbjct: 611 VKSPPPPTPVA SPPPPAPVASSPPPMKSPPPPTPVSSPPPPEKSPPPPPPAKST 664 

Query: 179 PP— QSPASTASFIFSKGSRKLQLERPV SPETQADLQRNLVAELRSISEQRPPQAPK 233 

PP^^PSSKLPSPQ S ++P +P 

Sbjct: 665 PPPEEYPTPPTSVKSSPPPEK-SLPPPTLIPSPPPQEKPTPPSTPSKPPSSPEKP — SPP 721 

Query: 234 KSPKAPPPVARKPSVGVPPPASPSYPRAEPLTAPP 268 

K P + PP K S PPPA S P P+++PP 
Sbjct: 722 KEPVSSPPQTPKSS PPPAPVSSPPPTPVSSPP 753 

Score = 166 (24.9 bits). Expect = 4.6e-09, P « 4.6e-09 
Identities = 81/268 (30%), Positives = 108/268 (40%) 

Query: 5 PPPEEAF— FSVASPEPAGPSGSPE-LVSSPAASSSS ATALQIQPPGSPDPPP— 54 

PPPE++ VASP P S P LV+SP S A PP PPP 

Sbjct: 560 PPPEKSPPPPAPVASPPPPVKSPPPPTLVASPPPPVKSPPPPAPVASPPPPVKSPPPPTP 619 

Query: 55 — APPAPAPASSAPGHVAKLPQKEPVGC SKGGGPPREDVGAPLVTPSLLQMVRLRS 108 

+ PP PAP +S+P + P PV K PP P ++S 

Sbjct: 620 VASPPPPAPVASSPPPMKSPPPPTPVSSPPPPEKSPPPPPPAKSTPPPEEYPTPPTSVKS 679 

Query: 109 VGAPGGA-PTPALGPSAPQKPLRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSA 167 

P + PPLPSP P + + ++P PSS + + S SS 
Sbjct: 680 SPPPEKSLPPPTLIPSPP— PQEKP-TPPSTPSKPPSSPEKPSPPKEPVSSPPQTPKSSP 736 
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Query: 
Sbjct: 
Query: 
Sbjct: 


168 QPNGPPEAEPRPPQSPASTASFIFSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQR 227 

P PPSP + A+ SSK P+P+ ++ + 

737 PPAPVSSPPPTPVSSPPALAP-VSSPPSVKSS— PPPAPLSSPPPAPQVKSSPPPVOVSS 793 

228 PPQAPKKSPKAPPPVARKPSVGVPPPASPSYPRAEPLTAPP 268 
PP APK SP P+A P V PP + P PL++PP 

794 PPPAPKSSP PLA--P-VSSPPQVEKTSPPPAPLSSPP 827 


Score « 165 (24.8 bits). Expect = 6.0e-09, P = 6.0e-09 
Identities = 79/264 (29%), Positives = 105/264 (39%) 

Query: 5 PPPEEAFFSVASPEPAG-PSGSP— ELVSSPAASSSSATALQIQPPGSPDPPP-APPAPA 60 

PPP + + + P P G PS P +VS P S P GSP PP +PP PA 

Sbjct: 517 PPPVK—TTSPPAPIGSPSPPPPVSVVSPPPPVKSPPPPA-— PVGSPPPPEKSPPPPA 570 

Query: 61 PASSAPGHVAKLPQKEPVGCSKG GGPPREDVGAP— LVTPSLLQMVRLRSVGAPGG 114 

P +S P V P V PP V +P + +P V AP 

Sbjct: 571 PVASPPPPVKSPPPPTLVASPPPPVKSPPPPAPVASPPPPVKSPPPPTPVASPPPPAPVA 630 

Query: 115 APTPALGPSAPQKPLRRALSGRASPVPAP SSGLHAAVRLKACSLAASEGLSSAQPNG 171 

+ P + P P+ SP P P S+ S+ +S + P 

Sbjct: 631 SSPPPMKSPPPPTPVSSPPPPEKSPPPPPPAKSTPPPEEYPTPPTSVKSSPPPEKSLP— 688 

Query: 172 PPEAEPRPPQSPASTASFIFSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQRPPQA 231 

PP P PP T SK P SPE + + v+ + PP A 

Sbjct: 689 PPTLIPSPPPQEKPTPPSTPSKP PSSPEKPSP-PKEPVSSPPQTPKSSPPPA 739 

Query: 232 PKKSPKAPPPVARKPSVGV— PPPASPSYPRAEPLTAPP 268 

P SP P PV+ P++ PP+ S P PL++PP 

Sbjct: 740 PVSSPP-PTPVSSPPALAPVSSPPSVKSSPPPAPLSSPP 777 

Score =162 (24,3 bits). Expect = 1.3e-08, P = 1.3e-08 
Identities = 76/272 (27%), Positives = 99/272 (36%) 


Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct : 
Query: 
Sbjct: 
Query: 
Sbjct: 


2 ADFPPPEEAFFSVASPEPAG-PSGSPELVSSPAASSSSATALQIQPPGSPDPPPAPPAPA 60 
A P P SPEP PSP P + SA PPPP +PPA + 

427 ASAPMPSPHTPPDVSPEPLPEPSPVPAPAPMPMPTPHSPPADDYVPPTPPVPGKSPPATS 486 

61 PASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGAPTP" 118 
P+ A P V S PP+ VG+P P . V+ S AP G+P+P 
487 PSPQVQPPAASTPPPSLVKLS PPQAPVGSP~PPP VKTTSPPAPIGSPSPPP 536 

119 ALGPSAPQK-PLRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPE 174 

+ PPKPAGSPPS A S ++PP 

537 PVSVVSPPPPVKSPPPPAPVG— SPPPPEKSPPPPAPVASPPPPVKSPPPPTLVASPPPP 594 

175 AEPRPPQSPASTASFIFSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQRPPQAPKK 234 

+ PP +P ++ + P P A + + pp p+K 

595 VKSPPPPAPVASPPPPVKSPPPPTPVASPPPPAPVASSPPPMKSPPPPTPVSSPPP-PEK 653 

235 SPKAPPPVARKPSVGVPPPASPSYPRAEPLTAPPTNGLP 273 
SP ppp p PP p+ p + + PP LP 

654 SPPPPPPAKSTP PPEEYPTPPTSVKSSPPPEKSLP 688 


Score « 159 (23.9 bits). Expect - 2.8e-08, P - 2.8e-08 
Identities = 77/264 (29%), Positives « 103/264 (39%) 


Query : 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 


5 PPPEEAFFSVASPEPAGPSGSPELVSSPAASSSSATALQIQPPGSP— DPPPAP PAP 59 

PPP V+SP P P SP P SS ++ PP +P PP p p p 

916 PPPA MVSSP-PMTPKSSPP PWVSSPPPTVKSSPPPAPVSSPPATPKSSPPP 966 

60 APASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGAPTPA 119 
AP + P V P PV S P AP+ +P + V+ AP +P P 

967 APVNLPPPEVKSSPPPTPVS-SPPPAPKSSPPPAPMSSPPPPE-VKSPPPPAPVSSPPPP 1024 

120 LGPSAPQKPLRRALSG-RASPVPAPSSGLHAAVRLKACSLAASEG-— LSSAQPNGPPEA 175 
+ P P+ ++ P PAP S V+ S +SPP + 

1025 VKSPPPPAPVSSPPPPVKSPPPPAPVSSPPPPVKSPPPPAPISSPPPPVKSPPPPAPVSS 1084 

176 EPRPPQSPASTASFIFSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQRPPQAPKKS 235 
P P +SP A S ++ P P A + A ++ S PP AP S 

1085 PPPPVKSPPPPAPV SSPPPPIKSPPPP APVSSPPPAPVKPPS— LPPPAPVSS 1135 

236 PK— APPPVARKPSVGVPPPA-SPSYPRAEPLTAPP 268 
P P+K +PPPA S P + PP 

1136 PPPVVTPAPPKKEEQSLPPPAESQPPPSFNDIILPP 1171 


Score « 143 (21.5 bits). Expect = l,ee-06, P - 1.8e-06 
Identities ^ 59/179 (32%), Positives » 77/179 (43%) 


551 


wo 01/12659 


PCT/IBOO/01496 


Query: 3 DFPPPEEAFrsVASPEP-AGPSGSPELVSSPAASSSSATA-LQIQPPGSP--DPPP— -A 55 

+ PPPE S P P + P +P+ PA SS ++ PP +p PPP + 

Sbjct: 970 NLPPPEVK— SSPPPTPVSSPPPAPKSSPPPAPMSSPPPPEVKSPPPPAPVSSPPPPVKS 1027 

Query: 56 PPAPAPASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGA 115 

PP PAP SS P V p PV PP + P S V+ AP + 

Sbjct: 1028 PPPPAPVSSPPPPVKSPPPPAPVSSPP PPVKSPPPPAPISSPPPPVKSPPPPAPVSS 1084 

Query: 116 ptpalgpsapqkplrralsg-raspvpapssglhaavrlkacslaaseglssaqpngppe 174 

P P + P P+ ++ P PAP S A +K SL +SS P PP 

Sbjct: 1085 PPPPVKSPPPPAPVSSPPPPIKSPPPPAPVSSPPPAP-VKPPSLPPPAPVSS — P— PPV 1139 

Query: 175 AEPRPPQ 181 
P PP+ 

Sbjct: 1140 VTPAPPK 1146 

Score = 133 (20.0 bits). Expect = 2.3e-05, P = 2.3e-05 
Identities - 50/132 (37%), Positives = 59/132 (44%) 

Query: 1 MADFPPPEEAFFSVASPEPAGP-SGSPELVSSP AASSSSATALQIQPPGSP — DPPP 54 

M+ PPPE V SP P P S P V SP A SS ++ PP +P PPP 

Sbjct: 1001 MSSPPPPE VKSPPPPAPVSSPPPPVKSPPPPAPVSSPPPPVKSPPPPAPVSSPPP 1055 

Query: 55 — APPAPAPASSAPGHVAKLPQKEPVGCSKG GGPPREDVGAPLVTPSLLQMVRLRS 108 

+PP PAP SS P V P PV PP V +P P + 

Sbjct: 1056 PVKSPPPPAPISSPPPPVKSPPPPAPVSSPPPPVKSPPPPAPVSSP—PPPIKSPPPPAP 1113 

Query: 109 VGAPGGAPT--PALGPSAP 125 

V +P AP P+L P AP 
Sbjct: 1114 VSSPPPAPVKPPSLPPPAP 1132 


Score * 110 (16.5 bits). Expect « 8.0e-03, P = 8.0e-03 
Identities = 41/121 (33%), Positives « 49/121 (40%) 


Query: 5 PPPEEAFFS VASPEPAGP-SGSPELVSSP AASSSSATALQIQPPGSP--DPPP 54 

PPP S V SP P P S P V SP A SS ++ PP +P PPP 

Sbjct: 1060 PPPPAPISSPPPPVKSPPPPAPVSSPPPPVKSPPPPAPVSSPPPPIKSPPPPAPVSSPPP 1119 

Query: 55 AP PAPAPASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRS 108 

AP P PAP SS P V P K+ + PP E P +L + 

Sbjct: 1120 APVKPPSLPPPAPVSSPPPWTPAPPKKE EQSLPPPAESQPPPSFNDIILPPIMANK 1176 

Query: 109 VGAP 112 
+P 

Sbjct: 1177 YASP 1180 


Score = 108 (16.2 bits). Expect = 1.3e-02, P = 1.3e-02 
Identities = 46/155 (29%), Positives = 67/155 (43%) 


Query; 114 GAPTPALGPSAPQKPLRRALSGRASPVPAPSSGLHAAVR-LKACS-LAASEGLSSAQPNG 171 

G PTP GP + P A S +P+P+P + -t-LS-l-A-t- P+ 
Sbjct: 408 GYPTPGGGPPSSPVPGKPAAS APMPSPHTPPDVSPEPLPEPSPVPAPAPMPMPTPHS 464 

Query: 172 PPEAEPRPPQSPASTASFIFSKGSRKLQLERPVSPETQ ADLQRNLVAELRSISEQR 227 

PP + PP P S + S ++Q +P + Q + + + 

Sbjct: 465 PPADDYVPPTPPVPGKSPPATSPSPQVQPPAASTPPPSLVKLSPPQAPVGSPPPPVKTTS 524 

Query: 228 PPQAPKKSPKAPPPVARKPSVGVPPPASPSYPRAEPLTAPP 268 

PP AP SP PPPV SV PPP s P P+ +PP 
Sbjct: 525 PP-APIGSPSPPPPV SWSPPPPVKSPPPPAPVGSPP 560 


Pedant information for DKFZphmcfl_lc23, frame 1 


Report for DKFZphincfl_lc23.1 


(LENGTH) 311 

(MW) 31534.58 

(pi] 9.48 • 

(KWJ All_Alpha 

[KWl LOW COMPLEXITY 38.59 % 


SEQ MADFPPPEEAFFSVASPEPAGPSGSPELVSSPAASSSSATALQIQPPGS PDPPPAPPAPA 

SEG xxxxxxxxxxxxxxx . . xxxxxxxxxxxx xxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccccccceeeeecccccccccccccccc 

SEQ PASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGAPTPAL 

SEG xxxxxx xxxxxxxxxxx 
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PRD 


SEO 
SEG 


cccccccccccccccccccccccccccccccccccchhhhhhhhhhhccccccccccccc 

GPSAPQKPLRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEAEPRPP 
XXXXX •-»-----.•..,,, xxxxxxx 

PRD cccccchhhhhhhhhcccccccccchhhhhhhhhAhhhhhhccc^ 

SEG ^^^^^'^^^^^^^'^^S^K^Ql'ERPVSPETQADLQRNLVAELRSISEQRPPQAPKKSPKAPP 

PRD ccccccceeeecccchhhhh^cccc^chhhhhhAhh^^^^^ 

SEQ PVARKPSVGVPPPASPSYPRAEPLTAPPTNGLPHTQDRTKRELAENGGVLQLVGPEEKMG 

SEG xxxxxxxxxxxxxxxxxxxxxxx c-c-m-io 

PRD ccccccccccccccccccccccccccccccccccccchhhhhhhcccceee^ 

SEQ LPGSDSQKELA 

SEG 

PRD ccccccccccc 

{NO Prosite data available for DKFZphmcf l_lc23 . 1) 
(No Pfam data available for DKFZphmcf l_lc23. 1) 
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DKFZphmcf l_lel5 


group: transmembrane protein 

DKFZphmcf llelS encodes a novel 454 amino acid protein with similarity to C. elegans proteins 
and transporter proteins. 

The novel protein is similar to the PTR2 family of proton/oligopeptide symporter proteins and 
the D-xylose-proton symporter. Thus, the protein is a transporter of a so far unknown 
compound . 

The new protein can find application as a new transporter in eukaryotic cells, e.g. in drug 

transport into cells. 

similarity to D-XYLOSE TRANSPORTER 
membrane regions: 9 

complete cDNA, complete cds, EST hits 

matchs cDNA encoding cell growth inhibiting factor (E12646) 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 1957 bp 

Poly A stretch at pos. 1947, polyadenylation signal at pos. 1929 


1 GGTGCAGCGC CCGGGCTGAG CGACAGCAAG TGCAGCGGGC TCCTACCCCG 
51 GGTGAGGGGT GGCCTCCGCG TGGGATCGTG CCCTCTTCAG CCCGCTCCTG 
101 TCCCCGACAT CACGTGTATT CCGCACGTCC CCTCCGCGCT GTGTGTCTAC 
151 TGAGACGGGG AGGCGTGACA GGGCCCGGGT CCCTTCTCAG TGGTGCTCTG 
201 TGCTTCAGGG CAAGCTCCCC GTCTCCGGGC GCACTTCCCT CGCCTGTGTT 
251 CGGTCCATCC TCCTTTCTCC AGCCTCCTCC CCTCGCAGGT GGGATCGTCG 
301 GTGGGACCGG AGCGCGGGCG GGCGCGGCCC CCCGGGACCA TGGCCGGGTC 
351 CGACACCGCG CCCTTCCTCA GCCAGGCGGA TGACCCGGAC GACGGGCCAG 
401 TGCCTGGCAC CCCGGGGTTG CCAGGGTCCA CGGGGAACCC GAAGTCCGAG 
451 GAGCCCGAGG TCCCGGACCA GGAGGGGCTG CAGCGCATCA CCGGCCTGTC 
501 TCCCGGCCGT TCGGCTCTCA TAGTGGCGGT GCTGTGCTAC ATCAATCTCC 
551 TGAACTACAT GGACCGCTTC ACCGTGGCTG TGTTCATCTC CAGTTACATG 
601 GTGTTGGCAC CTGTGTTTGG CTACCTGGGT GACAGGTACA ATCGGAAGTA 
651 TCTCATGTGC GGGGGCATTG CCTTCTGGTC CCTGGTGACA CTGGGGTCAT 
701 CCTTCATCCC CGGAGAGCAT TTCTGGCTGC TCCTCCTGAC CCGGGGCCTG 
751 GTGGGGGTCG GGGAGGCCAG TTATTCCACC ATCGCGCCCA CTCTCATTGC 
801 CGACCTCTTT GTGGCCGACC AGCGGAGCCG GATGCTCAGC ATCTTCTACT 
851 TTGCCATTCC GGTGGGCAGT GGTCTGGGCT ACATTGCAGG CTCCAAAGTG 
901 AAGGATATGG CTGGAGACTG GCACTGGGCT CTGAGGGTGA CACCGGGTCT 
951 AGGAGTGGTG GCCGTTCTGC TGCTGTTCCT GGTAGTGCGG GAGCCGCCAA 
1001 GGGGAGCCGT GGAGCGCCAC TCAGATTTGC CACCCCTGAA CCCCACCTCG 
1051 TGGTGGGCAG ATCTGAGGGC TCTGGCAAGA AATCTCATCT TTGGACTCAT 
1101 CACCTGCCTG ACCGGAGTCC TGGGTGTGGG CCTGGGTGTG GAGATCAGCC 
1151 GCCGGCTCCG CCACTCCAAC CCCCGGGCTG ATCCCCTGGT CTGTGCCACT 
1201 GGCCTCCTGG GCTCTGCACC CTTCCTCTTC CTGTCCCTTG CCTGCGCCCG 
1251 TGGTAGCATC GTGGCCACTT ATATTTTCAT CTTCATTGGA GAGACCCTCC 
1301 TGTCCATGAA CTGGGCCATC GTGGCCGACA TTCTGCTGTA CGTGGTGATC 
1351 CCTACCCGAC GCTCCACCGC CGAGGCCTTC CAGATCGTGC TGTCCCACCT 
1401 GCTGGGTGAT CCTGGGAGCC CCTACCTCAT TGGCCTGATC TCTGACCGCC 
1451 TGCGCCGGAA CTGGCCCCCC TCCTTCTTGT CCGAGTTCCG GGCTCTGCAG 
1501 TTCTCGCTCA TGCTCTGCGC GTTTGTTGGG GCACTGGGCG GCGCAGCCTT 
1551 CCTGGGCACC GCCATCTTCA TTGAGGCCGA CCGCCGGCGG GCACAGCTGC 
1601 ACGTGCAGGG CCTGCTGCAC GAAGCAGGGT CCACAGACGA CCGGATTGTG 
1651 GTGCCCCAGC GGGGCCGCTC CACCCGCGTG CCCGTGGCCA GTGTGCTCAT 
1701 CTGAGAGGCT GCCGCTCACC TACCTGCACA TCTGCCACAG CTGGCCCTGG 
1751 GCCCACCCCA CGAAGGGCCT GGGCCTAACC CCTTGGCCTG GCCCAGCTTC 
1801 CAGAGGGACC CTGGGCCGTG TGCCAGCTCC CAGACACTAC ATGGGTAGCT 
1851 CAGGGGAGGA GGTGGGGGTC CAGGAGGGGG ATCCCTCTCC ACAGGGGCAG 
1901 CCCCAAGGGC TCGGTGCTAT TTGTAACGGA ATAAAATTTG TAGCCAGAAA 
1951 AAAAAAA 


BLAST Results 


Entry E12646 from database EMBL: 

cDNA encoding cell growth inhibiting factor. 

Score = 304 6, P = 2.2e-131, identities = 640/659 
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Medline entries 


No Medline entry 


Peptide information for frame 1 


ORF from 340 bp to 1701 bp; peptide length: 454 
Category; similarity to known protein 


1 MAGSDTAPFL SQADDPDDGP VPGTPGLPGS TGNPKSEEPE VPDQEGLQRI 
51 TGLSPGRSAL IVAVLCYINL LNYMDRFTVA VFISSYMVLA PVFGYLGDRY 
101 NRKYLMCGGI AFWSLVTLGS SFIPGEHFWL LLLTRGLVGV GEASYSTIAP 
151 TLIADLFVAD QRSRMLSIFY FAIPVGSGLG YIAGSKVKDM AGDWHWALRV 
201 TPGLGVVAVL LLFLVVREPP RGAVERHSDL PPLNPTSWWA DLRALARNLI 
251 FGLITCLTGV LGVGLGVEIS RRLRHSNPRA DPLVCATGLL GSAPFLFLSL 
301 ACARGSIVAT YIFIFIGETL LSMNWAIVAD ILLYWIPTR RSTAEAFQIV 
351 LSHLLGDAGS PYLIGLISDR LRRNWPPSFL SEFRALQFSL MLCAFVGALG 
401 GAAFLGTAIF lEADRRRAQL HVQGLLHEAG STDDRIWPQ RGRSTRVPVA 
451 SVLI 


BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphmcf l_lel5, frame 1 

TREMBL:CEC13C4_1 gene: "C13C4.5"; Caenorhabditis elegans cosmid C13C4, 
N = 3, Score = 441, p = 5.2e-76 

TREMBL:CEC39E9_10 gene: "C39E9.10"; Caenorhabditis elegans cosmid 
C39E9, N = 2, Score = 449, P = 8.2e-69 

TREMBL:CEF09A5_1 gene: "F09A5.1"; Caenorhabditis elegans cosmid F09A5, 
N - 3, Score » 413, P = 9.1e-60 

TREMBL:ATF6HH 18 gene: rFSHll . 180"; product: "predicted protein"; 
Arabidopsis thaliana DNA chromosome 5, BAC clone F6H11 (ESSAII 
project), N = 3, Score = 193, P = 2.5e-24 

SWISSPROT:XYLT_LACBR D-XYLOSE- PROTON SYMPORT (D-XYLOSE TRANSPORTER)., N 
= 1, Score = 180, P = 7.9e-ll 


>TREMBL:CEC39E9_10 gene: "C39E9.10"; Caenorhabditis elegans cosmid C39E9 
Length » 488 

HSPs: 

Score » 449 (67.4 bits). Expect = 8.2e-69, Sum P(2) = 8.2e-69 ' 

Identities = 88/204 (43%), Positives = 125/204 (61%) 

Query: 58 SALIVAVLCYINLLNYMDRFTVAVFISSYMVLAPVFGYLGDRYNRKYLMCGGIAFWSLVT 117 

+ ++ V Y N+ + + VF+ S+MV +PV GYLGDR+NRK++M G+ W 

Sbjct: 29 AGVLTQVQTYYNISDSLGGLIQTVFLISFMVFSPVCGYLGDRFNRKWIMIIGVGIWLGAV 88 

Ouery: 118 LGSSFIPGEHFWLLLLTRGLVGVGEASYSTIAPTLIADLFVADQRSRMLSIFYFAIPVGS 177 

LGSSF+P HFWL L+ R VG+GEASYS +AP+LI+D+F +RS + IFYFAIPVGS 
Sbjct: 89 LGSSFVPANHFWLFLVLRSFVGIGEASYSNVAPSLISDMFNGQKRSTVFMIFYFAIPVGS 148 

Query: • 178 GLGYIAGSKVKDMAGDWHWALRVTPGLGVVAVLLLFLVVREPPRGAVER HSDLPPL 233 

GLG+I GS V + G W W +RV+ G++ ++ L L EP RGA ++ D+ 
Sbjct: 149 GLGFIVGSNVATLTGHWQWGIRVSAIAGLIVMIALVLFTYEPERGAADKAMGESKDVWT 208 

Query: 234 NPTSWWADLRALARNLIFGLITCLTG 259 

T++ DL L + L+ C G 

Sbjct: 209 TNTTYLEDLVILLKTPT — LVACTWG 232 

Score = 267 (40.1 bits). Expect = 8.2e-69, Sum P(2) = 8.2e-69 
Identities = 74/212 (34%), Positives « 113/212 (53%) 

Query: 249 LIFGLITCLTGVLGVGLGVEISRRL RHSNPRADPLVCATGLLGSAPFLFLSL 300 

L FG it G++GV G +S+ L R RA PLV G L +APFL + + 

Sbjct: 277 LYFGAITTAGGLIGVIFGSMLSKWLVAGWGPFRRLQTDRAQPLVAGGGALLAAPFLLIGM 336 

Query: 301 ACARGSIVATYIFIFIGETLLSMNWAIVADILLYVVIPTRRSTAEAFQIVLSHLLGDAGS 360 
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S+V YI IF G T + NW + D+L V+ P RRSTA ++ +++SHL GDA 
Sbjct: 337 IFGDKSLVLLYIMIFFGITFMCFNWGLNIDMLTTVIHPNRRSTAFSYFVLVSHLFGDASG 396 

Query: 361 PYLIGLISDRLRRN— WPPSFLSEFRALQFSLMLCAFVGALGGAAFLGTAIFIEADRR— 416 

PYLIGLISD +R +P ++ +L + C + L + +++ + +DR+ 

Sbjct: 397 PYLIGLISDAIRHGSTYPKD QYHSLVSATYCCVALLLLSAGLYFVSSLTLVSDRKKF 453 

Query: 417 RAQLHVQGLLHEA--GSTD~DRIVVPQRGRSTRV 44 7 

RA++ + L + STD +RI + S + R+ 

Sbjct: 454 RAEMGLDDLQSKPIRTSTDSLERIGINDDVASSRL 488 

Score = 70 (10.5 bits). Expect = 5.9e-24, Sum P(2) = 5.9e-24 
Identities = 25/89 (28%), Positives = 41/89 (46%) 

Query: 62 VAVLCYINLLNYMDRFTVAVFISSyMVLAPVFGYLGDRYNRKYLMCGGIAFWSLVT--LG 119 

V L +NLLNY+DR+TVA ++ + LG +L+ +S V LG 

Sbjct: 11 VTALFVVNLLNYVDRYTVAGVLTQVQTYYNISDSLGGLIQTVFLI— SFMVFSPVCGYLG 68 

Query: 120 SSFIPGEHFWLLLLTRGLVGVGEASYSTIAP 150 

F W++++ G + +G S+ P 

Sbjct: 69 DRF NRKWIMIIGVG-IWLGAVLGSSFVP 95 


Pedant information for DKFZphmcf l_lel5, frame 1 


Report for DKFZphmcf l_lel5. 1 


[LENGTH) 454 

(MW) 49013.35 

Ipl] 7.66 

(HOMOL) TREMBL:CEC13C4_1 gene: "01304.5"; Caenorhabditis elegans cosmid C13C4 2e-51 

[BLOCKS] BL01022D 

[PROSITE] MYRISTYL 11 

(PROSITE] CAMP_PHOSPHO_SITE 1 

(PROSITE] CK2_PHOSPH0_SITE 3 

[PROSITE] PROKAR_LIPOPROTEIN 1 

(PROSITE] GLYCOSAMINOGLYCAN 1 

(PROSITE) PKC_PHOSPH0_SITE 4 

fKW) TRANSMEMBRANE 8 

(KW) LOW COMPLEXITY 15.42 % 


SEQ MAGSDTAPFLSQADDPDDGPVPGTPGLPGSTGNPKSEEPEVPDQEGLQRITGLSPGRSAL 


SEG 
PRO 
MEM 


SEG 
PRD 


xxxxxxxxxxxxxxxx 

cccccceeeeeecccccccccccccccccccccccccccccccccceeeecccccchhhh 
MMMMMMMMMMMMMMMMMMMMMMM 


SEQ IVAVLCYINLLNYMDRFTVAVFISSYMVLAPVFGYLGDRYNRKYLMCGGIAFWSLVTLGS 

SEG 

PRD hhhhhhhhccccccccceeeeeehhhhheeeecccccccccceeeeeeeccceeeeeecc 

MEM MMMMMM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ SFIPGEHFWLLLLTRGLVGVGEASYSTIAPTLIADLFVADQRSRMLSIFYFAIPVGSGLG 

SEG xxxxxxxxxxxx 

PRD cccccchhhhhhhhhhccccccceeeeecceeeccccccccchhhhheeeeeecccccce 
MMMMMMMMMMMMMMMM MMMMMMMMHMMMMMMM 

SEQ YIAGSKVKDMAGDWHWALRVTPGLGWAVLLLFLWREPPRGAVERHSDLPPLNPTSWWA 


xxxxxxxxxxxxx 

eeecccccccccccceeeeeeccchhhhhhhhhhhhcccccchhhhhccccccccccchh 
MEM MMMMMMMMM 


SEQ DLRALARNLIFGLITCLTGVLGVGLGVEISRRLRHSNPRADPLVCATGLLGSAPFLFLSL 

SEG xxxxxxxxxxxxxxxx 

PRD hbhhhhhhhhhhheeeecccceeehhhhhhhhhhccccccceeecccceeeecccceeec 

MEM MMMMMMMMMMMMMMMMMMMHMMMMM 

SEQ ACARGSIVATYIFIFIGETLLSMNWAIVADILLYVVIPTRRSTAEAFQIVLSHLLGDAGS 

SEG 

PRD ccccchhhhheeeeeeccccccccchhhhhhheeeeeccccchhhhhhcccccccccccc 

MEM MMMM MMMMMMMMMMMMMMMMMMMMM MMMMMMMMMMMMM 

SEQ PYLIGLISDRLRRNWPPSFLSEFRALQFSLMLCAFVGALGGAAFLGTAIFIEADRRRAQL 

SEG xxxxxxxxxxxxx 

PRD ceeehhhhhhhhcccccchhhhhhhhhhhhhhhhhhhhJihhccccceeeeeehhhhhhhh 

MEM MMMMMMMM MM 

SEQ HVQGLLHEAGSTDDRIWPQRGRSTRVPVASVH 
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SEG 
PRD 
MEM 


hhhhhhhhccccceeeeeeccccccceeeeeccc 
MMMMMMMMMMMMMMMMMMMMIy^^MM^ff^ 


Prosite for DKFZphmcf l lelS. 1 

PS00002 177->181 GLYCOSAMINOGLYCAN PDOC00002 

PS00004 340->344 CAMP_PHOSPHO_SITE PDOC00004 

PS00005 270->273 PKC_PH0SPHO_SITE P0OC00005 

PS00005 . 339->342 PKC_PHOSPHO_SITE PDOC00005 

PS00005 368->371 PKC_PHOSPHO_SITE PD0C00005 

PS00005 444->447 PKC_PHOSPHO_SITE PDOC00005 

PS00006 11->15 CK2_PH0SPH0_SITE PDOC00006 

PS00006 342->346 CK2_PH0SPH0_SITE PDOC00006 

PS00006 431->435 CK2_PHOSPHO SITE PDOC00006 

PS00008 26->32 MYRISTYL " PDOC00008 

PS00008 32->38 MYRISTYL PDOC00008 

PS00008 52->58 MYRISTYL PDOC00008 

PS00008 139->145 MYRISTYL PDOC00008 

PS00008 176->182 MYRISTYL PDOC00008 

PS00008 252->258 MYRISTYL PDOC00008 

PS00008 262->268 MYRISTYL PDOCOOOOB 

PS00008 266->272 MYRISTYL PDOCOOOOB 

PS00008 288->294 MYRISTYL PDOCOOOOB 

PS00008 305->311 MYRISTYL PDOC00008 

PS00008 397->403 MYRISTYL PDOC00008 

PS00013 292->303 PR0KAR_LIPOPROTEIH PDOC00013 

(NO Pfam data available for DKFZphmcfl_lel5.1) 
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DKrZphincfl_lgl3 


group: mammary carcinoma derived 

DKFZphmcf l_lgl3 encodes a novel 573 amino acid protein with very weak similarity to the human 
KIAA0543 protein and Musca domestica hermes transposase. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of mammary carcinoma- 
specific genes. 


similarity to KIAA0766 

commplete cDNA, complete cds, few EST hits 

on genomic level encoded by AC005020, no splicing, genomic? 

Sequenced by OKFZ 

Locus : unknown 

Insert length: 2210 bp 

Poly A stretch at pos. 2200, polyadenylation signal at pos. 2176 


1 GAAACCTGAT CTCATAAAAC CTAGGTCACA AAGGACAGCC CTGCAAAACA 
51 GACCCTATTT GGATCAAGTG AGCCAGTTCC TGGAACCTGA ATAATGACTC 
101 CTGAATCAAG GGATACTACA GATTTGTCTC CAGGGGGTAC CCAGGAGATG 
151 GAAGGCATCG TGATAGTGAA GGTGGAGGAG GAAGATGAAG AAGACCATTT 
201 TCAAAAGGAA AGAAACAAAG TAGAGTCATC GCCACAAGTT CTCAGTCGCT 
251 CTACAACTAT GAATGAGAGA GCCTTATTGT CATCGTATTT AGTTGCATAT 
301 AGAGTGGCAA AAGAGAAAAT GGCTCACACA GCGGCTGAAA AAATTATCCT 
351 TCCAGCATGT ATGGACATGG TACGGACAAT TTTTGATGAC AAATCAGCTG 
401 ATAAACTAAG AACTATACCT CTTAGTGATA ATACAATATC TCGTCGAATC 
451 TGTACGATTG CAAAACATTT GGAAGCAATG CTTATTACAC GGCTGCAGTC 
501 CGGTATAGAC TTTGCAATCC AACTCGATGA GAGCACTGAT ATTGCAAGTT 
551 GTCCCACACT CTTGGTTTAT GTCAGATATG TGTGGCAAGA TGATTTTGTA 
601 GAGGATCTCT TATGTTGTTT AAATTTAAAT TCACATATAA CTGGATTAGA 
651 TTTATTTACT GAATTAGAAA ACTGCCTTCT TGGTCAGTAT AAATTAAACT 
701 GGAAACATTG TAAAGGAATT TCAAGTGATG GAACAGCAAA TATGACCGGA 
751 AAACACAGCA GACTTACTGA AAAATTGTTA GAAGCAACCC ACAACAATGC 
801 TGTTTGGAAT CACTGTTTTA TTCATCGAGA AGCTTTGGTA TCCAAAGAAA 
851 TTTCACCAAG TCTGATGGAT GTATTGAAAA ATGCAGTGAA AACTGTTAAT 
901 TTTATTAAAG GAAGCTCACT GAATAGCCGA CTTCTCGAAA TATTTTGTTC 
951 AGAGATTGGA GTGAACCACA CCCACTTATT GTTTCATACA GAAGTTCGTT 
lOOl GGCTTTCTCA AGGAAAAGTA TTGAGCAGAG TATATGAACT CAGGAACGAG 
1051 ATTTACATTT TTCTCGTTGA AAAGCAATCT CATTTGGCAA ATATTTTTGA 
1101 AGACGACATT TGGGTAACT^ AATTGGCATA TTTAAGTGAT ATTTTTGGCA 
1151 TTCTTAATGA ATTAAGCCTG AAAATGCAGG GGAAAAACAA TGATATATTT 
1201 CAGTATCTTG AACATATTCT AGGATTCCAA AAGACGTTAT TATTGTGGCA 
1251 AGCAAGACTT AAAAGTAACC GCCCTAGCTA CTATATGTTT CCAACATTAT 
1301 TGCAACACAT CGAAGAGAAC ATTATTAATG AAGACTGCTT AAAAGAAATA 
1351 AAATTAGAGA TATTGTTGCA TCTCACTTCT TTGTCTCAAA CTTTTAATTA 
1401 TTACTTTCCG GAAGAGAAAT TTGAATCATT AAAGGAAAAT ATTTGGATGA 
1451 AAGATCCATT TGCTTTTCAA AACCCAGAAT CAATAATTGA GTTAAACTTG 
1501 GAGCCTGAAG AAGAGAATGA ATTATTGCAG CTCAGTTCAT CATTCACACT 
1551 AAAGAATTAT TATAAGATAT TAAGTTTATC AGCATTTTGG ATTAAGATTA 
1601 AAGATGACTT TCCACTGCTA AGTAGGAAGA GTATATTGCT GTTACTACCA 
1651 TTCACAACTA CATATTTGTG TGAACTAGGA TTTTCAATCT TGACACGGTT 
1701 AAAAACAAAG AAGAGAAATA GGCTCAATAG TGCACCAGAT ATGCGGGTAG 
1751 CATTATCTTC ATGTGTTCCT GACTGGAAGG AACTTATGAA CAGACAAGCA 
1801 CACCCATCAC ATTAAATACA AACTTTACAA AATTCTGTGT ATAGCCAGGT 
1851 GTGGTGGCTT ACGCCTGTAA TCCCAGCAGT GGGAGACCGA GGTGGGCAGA 
1901 TCACTTGAGT TCAAGACCAG CCTGGCCAAC ATGGTGAAAC CCCATCTCTA 
1951 CTAAAAATAG AAACCTTAGC CAGGCGTGGT GGCACATGCC TGCAGTCCCA 
2001 GTTACTTGGG TGCCTGAGGC AGGAGAATCT CTTAAACCAG GAAGGCAGAG 
2051 ATTGCAGTGA GCTGAGATAA TCCCACTGCA TTCCAGCCTG GGCAACAGCG 
2101 TGAGACTTCA TCTCAAAAAA AAAAAATTGT ATTTGTACTT TTAAAGGGAT 
2151 TTTGCAGTAT GTTGTAGTTA AACGTTAATA AAATTATATT TGTAATTAGG 
2201 AAAAAAAAAA 


BLAST Results 


Entry AC005020 from database EMBL: 

Homo sapiens clone GS259H13; HTGS phase 1, 4 unordered pieces. 
Score « 9110, P = O.Oe+00, identities = 1822/1822 
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Medline entries 

No Medline entry 

Peptide information for frame 1 


ORE from 94 bp to 1812 bp; peptide length: 573 
Category: similarity to unknown protein 


1 MTPESRDTTD LSPGGTQEME GIVIVKVEEE DEEDHFQKER NKVESSPQVL 
51 SRSTTMNERA LLSSYLVAYR VAKEKMAHTA AEKIILPACM DMVRTIFDDK 
101 SADKLRTIPL SDNTISRRIC TIAKHLEAML ITRLQSGIDF AIQLDESTDI 
151 ASCPTLLVYV RYVWQDDFVE DLLCCLNLNS HITGLDLFTE LENCLLGQYK 
201 LNWKHCKGIS SDGTANMTGK HSRLTEKLLE ATHNNAVWNH CFIHREALVS 
251 KEISPSLMDV LKNAVKTVNF IKGSSLNSRL LEIFCSEIGV NHTHLLFHTE 
301 VRWLSQGKVL SRVYELRNEI YIFLVEKQSH LANIFEDDIW VTKLAYLSDI 
351 FGILNELSLK MQGKNNDIFQ YLEHILGFQK TLLLWQARLK SNRPSYYMFP 
401 TLLQHIEENI INEDCLKEIK LEILLHLTSL SQTFNYYFPE EKFESLKENI 
451 WMKDPFAFQN PESIIELNLE PEEENELLQL SSSFTLKNYY KILSLSAFWI 
501 KIKDDFPLLS RKSILLLLPF TTTYLCELGF SILTRLKTKK RNRLNSAPDM 
551 RVALSSCVPD WKELMNRQAH PSH 

BLASTP hits 

Entry AC004877 3 from database TREHBLNEW: 

gene: "WOGSC:H"dJ0751H13.2"; product: "KIAA0543 protein"; Homo sapiens 

PAC clone DJ0751H13 from 7q35-qter, complete sequence. 

Score =86, P « 4.4e-03, identities - 46/179, positives = 78/179 

Entry MD36211_1 from database TREMBL: 

product: "Heraes transposase"; Musca domestica Hermes transposase 

gene, complete, cds. 

■Score = 105, P « 3.0e-02, identities * 101/465, positives « 202/465 


Alert BLASTP hits for DKFZphmcf l_lgl3, frame 1 

TREMBL :AB018309_1 gene: ••KIAA0766"; product: "KIAA0766 protein"; Homo 
sapiens mRNA for KIAA0766 protein, complete cds., N « 1, Score » 300, P 
= l.le-23 

>TREMBL:ABO18309_l gene: "KIAA0766''; product: "KIAA0766 protein"; Homo 
sapiens mRNA for KIAA0766 protein, complete cds. 
Length = 607 

HSPs: 

Score = 300 (45.0 bits). Expect = l.le-23, P = l.le-23 
Identities = 120/485 (24%), Positives - 229/485 (47%) 

CMD-MVRTIFDDKSADKLRTIPLSDNTISRRICTIAKHLEAMLITRLQSGIDFAIQLDES 
CM+ ++R + + L+ + LS + +RI +1 ++L L R + +++ LD+ 


+A LLV++R V + + EDLL +NL H + G + LE+ L L+ + 
FVAYEMYLLVFI RGVGPELEVQEDLLTIINLTHHFS VGALMS AI LES— LQTAGLSLQR 240 

KGISSDGTANMTGKHSRLTEKLLEATHNNAVWN — HC — FIHREALVSKEISPSLMDVL 261 
G+++ T M G++S L + E + WN H F+H E L S ++ + ++ 


IK + + +E H + + WL +GK L ++ LR E+ 


FLV + + F D W+ +L DI L ELS +f+ +HI F+ 


Query: 

89 

Sbjct: 

124 

Query: 

148 

Sbjct: 

183 

Query: 

206 

Sbjct: 

241 

Query: 

262 

Sbjct: 

299 

Query: 

321 

Sbjct: 

359 
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Query: 

381 

Sbjct : 

418 

Query: 

437 

Sbjct: 

474 

Query: 

496 

Sbjct: 

526 

Query: 

552 

Sbjct: 

586 

Score « 

290 


L L+Q ++ 


FP L + ++E 


-NIINEDCLKEIKLEILLHLTSLSQTFNY 436 
N +E + ++++ L + F 


+F + +F 


F+ 


+K+4'+ + +PF F+ + I + +E 
-IKKDLELFSNPFNFKPEYAPISVRVE- 


L +L ++ L N Y+I L 
-LTKLQANTNLWNEyRIKDL 525 


+ +P++ 


F + +CE FS LTR + 


-PDMR 551 
R 


VA + P W +L+ R+ + S+ 


Identities = 120/485 (24%), Positives 


22, P = 1.5e-22 
228/485 (47%) 


Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 


89 CMD-MVRTIFDDKSADKLRTIPLSDNTISRRICTIAKHLEAMLITRLQSGIDFAIQLDES 147 
CM+ ++R + + L+ + LS + +RI +1 ++L L R + +++ LD+ 

124 CMEVLLREVLPEH-VSVLQGVDLSPDITRQRILSIDRNLRNQLFNRARDFKAYSLALDDQ 182 

148 TDIASCPTLLVYVRYVWQD-DFVEDLLCCLNLNSHIT-GLDLFTELENCLLGQYKLNWKH 205 

+A LLV++R V + + EDLL +NL H + G + LE+ h L+ + 
183 AFVAYENYLLVFIRGVGPELEVQEOLLTIINLTHHFSVGALMSAILES— LQTAGLSLQR 240 

206 CKGISSDGTANMTGKHSRLTEKLLEATHNNAVWNHCFIHREALVSKEISPSLMDV-LKNA 264 

G+++ T M G++S L + E + WN IH + E+ S DV + 
241 MVGLTTTHTLRMIGENSGLVSYMREKAVSPNCWN—VIHYSGFLHLELLSSY-DVDVNQI 297 

265 VKTVN FIKGSSLNSRLLEIFCSEIGVNHTHLLFHTEVR-WLSQGKVLSRVYELRNE 319 

+ T++ IK + + +E H + + WL +GK L ++ LR E 

298 INTISEWIVLIKTRGVRRPEFQTLLTESESEHGERVNGRCLNNWLRRGKTLKLIFSLRKE 357 

320 lYIFLVEKQSHLANIFEDDIWVTKLAYLSDIFGILNELSLKMQGKNNDIFQYLEHILGFQ 379 

+ FLV + + F D W+ +L DI L ELS +++ +HI F+ 

358 MEAFLVSVGATTVH-FSDKQWLCDFGFLVDIMEHLRELSEELRVSKVFAAAAFDHICTFE 416 

380 KTLLLWQARLKSNRPSYYMFPTLLQHIEENIINEDCLKEIKL EILLHLTSLSQTFN 435 

L L+Q ++ + FP L + ++E + + ++K+ ++L + F 

417 VKLNLFQRHIEEKNLTD— FPALREWDE— LKQQNKEDEKIFDPDRYQMVICRLQKEFE 472 

436 YYFPEEKFESLKENIWM'KDPFAFQNPESIIELNLEPEEENELLQLSSSFTLEtNYYKILS 494 

+ F + +F +K+++ + +PF F+ + I + +E L +L ++ L N Y+I 

473 RHrKDLRF--IKKDLELFSNPFNFKPEYAPISVRVE LTKLQANTNLWNEYRIKD 524 

495 LSAFWIKIK-DDFPLLSRKSILLLLPFTTTYLCELGFSILTRLKTKKRNRLNSA— PDM 550 

L F+ + + +P++ + + F + +CE FS LTR + L 
525 LGQFYAGLSAESYPIIKGVACKVASLFDSNQICEKAFSYLTRNQHTLSQPLTDEHLQALF 584 

551 RVALSSCVPDWKELMNRQAHPSH 573 

RVA + P W +L+ R+ + S+ 
585 RVATTEMEPGWDDLV-RERNESN 606 


Pedant information for DKFZphmcfl_lgl3, frame 1 


Report for DKFZphmcf l_lgl3. 1 


f LENGTH] 573 
[MW] 66276.85 
(pU 5.82 

tHQMOL] TREMBL:AB018309_1 gene: 

mRNA for KIAA0766 protein, complete cds. le-18 

[PROSITEJ 

(PROSITEJ 
f PROSITE] 
I PROS I TE] 
[PROSITEJ 
[KW] 
[KW] 


KIAA0766"; product: "KIAA0766 protein"; Homo sapiens 


MYRISTYL 3 

CK2_PH0SPH0_SITE 

TYR_PHOSPHO_SITE 

PKC_PHOS PHO_S I TE 

ASN_GLYC0SYLATION 

All^Alpha 

LOW COMPLEXITY 


10 

1 

9 

2 

8.90 % 


SEQ 

SEG 
PRD 

SEQ 


MTPESRDTTDLSPGGTQEMEGIVIVKVEEEDEEDHFQKERNKVESSPQVLSRSTTMNERA 
xxxxxxx 

ccccccccccccccccccceeeeeeeeccccchhhhhhhhhhcccccceeecccchhhhh 
LLSSYLVAYRVAKEKMAHTAAEKIILPACMDMVRTIFDDKSADKLRTIPLSDNTISRRIC 


560 


wo 01/12659 


PCT/IBOO/01496 


SEG 
PRD 


hhhhhhhhhhhhhhhhhhhhhhhhhhhchhhhhhhhhhcccccceeeeecccchhhhhhh 


SEQ TIAKHLEAMLITRLQSGIDFAIQLDESTDIASCPTLLVYVRYVWQDDFVEDLLCCLNLNS 

SEG 

PRD hhhhhhhhhhhhhhhhhheeeccccccccccccccceeeeeeeccchhhhhhhhhhccce 

SEQ HITGLDLFTELENCLLGQYKLNWKHCKGISSDGTANMTGKHSRLTEKLLEATHNNAVWNH 

SEG 

PRD eeeehhhhhhhhhhhhhhhccccccccccccccceeeecccchhhhhhhhhhccccceee 

SEQ CFIHREALVSKEISPSLMDVLKNAVKTVNFIKGSSLNSRLLEIFCSEIGVNHTHLLFHTE 

SEG 

PRD hhhhhhhhhhhhcccchhhhhhhhhhhheeecccccchhhhhhhhhhccccchhhhhhhh 

S EQ VRWLSQGK VLSRV YELRNEI YI FLVEKQSHLANI FEDDI WVTKLAYLSDI FGI LNELSLK 

SEG : 

PRD cccccccchhhhhhhhhhhhhhhhhhhhchhhhhcccceeehhhhhhhhhhhhhhhhhhh 

SEQ MQGKNNDIFQYLEHILGFQKTLLLWQARLKSNRPSYYMFPTLLQHIEENIINEDCLKEIK 

SEG xxxxx 

PRD hhccccccchhhhhhhhhhhhhhhhhhhhhcccccccccchhhhhhhhhhhhcchhhhhh 

SEQ LEILLHLTSLSQTFNYYFPEEKFESLKENIWMKDPFAFQNPESIIELNLEPEEENELLQL 

SEG xxxxx xxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhccchhhhhhhhhhhhhcccccccccccceeecccchhhhhhhhh 

SEQ SSSFTLKNYYKILSLSAFWIKIKDDFPLLSRKSILLLLPFTTTYLCELGFSILTRLKTKK 

S EG XXX xxxxxxxxxxx 

PRD hhcccchhhhhhhhhhhhhcccccccccchhhhhhhhhccceeeeehhhhhhhhhhhhhh 

SEQ RNRLNSAPDMRVALSSCVPOWKELMNRQAHPSH 

SEG 

PRD hcccccccccceeeccccccchhhhhhhccccc 


Prosite for DKFZphmcf l_lgl3 . 1 


PSOQOOl 

PSOOOOl 
PS0O005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS0OOO6 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00007 
PS00008 
PS00008 
PS00008 


136- >140 
183->187 
189->193 
256->260 
445->449 
463->467 
546->550 
364->372 

137- >143 
273->279 
289->295 


216->220 

291->295 
116->119 
218->221 
225->228 
358->361 
391->394 
445->448 
485->488 
510->513 
538->541 


55->59 
79->83 

95->99 


ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 
PKC_PHOS PHO_S ITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKCPHOS PHO_SITE 
PKC_PHOS PHO_S ITE 
PKC_PHOS PHO_S ITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC_PHOS PHO__S ITE 
CK2_PH0SPH0_SITE 
CK2_PH0SPH0__SITE 
CK2_PH0S PHO_S I TE 
CK2_PH0S PHO_S I TE 
CK2_PHOSPHO_SITE 
CK2_PH0SPH0_S ITE 
CK2_PHOSPH0_SITE 
CK2_PH0SPH0_SITE 
CK2_PH0SPH0 SITE 
CK2_PH0SPH02SITE 
T Y R_PHOS PHO_S ITE 
MYRISTYL 


MYRISTYL 
MYRISTYL 


PDOCOOOOl 

PDOCOOOOl 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00008 


{No Pfam data available for DKFZphmcf l_lgl3. 1) 
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DKFZphtes3_14g5 


group: testes derived 

DKFZphtes3_14g5 encodes a novel 379 amino acid protein with strong similarity to murine cell 
growth regulating nucleolar protein LYAR. 

The novel protein is very similar to murine Ly-1 antibody reactive clone protein (LYAR) . It 
contains a ATP/GTP-binding site motif A (P-loop, interacts with one of the phosphate groups of 
a ATP/GTP nucleotide), but not the zinc finger motif and and nuclear localization signals of 
lyar . 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 


strong similarity to cell growth regulating nucleolar protein LYAR, of 
mouse 

complete cDNA, CQii9>lete cds, EST hits 

Sequenced by BMFZ 

Locus : unknown 

Insert length: 1503 bp 

Poly A stretch at pos. 1467, polyadenylation signal at pos. 1440 


1 CCCAGAGGTC CGACCTGGGA GGCTGGGGCT CAGAGAGCAA TGTTTGCTGT 
51 CTTCCATTGG AGTGACTGAA TTTCTACATG ACGGCTTTTT GACAAGACTT 
101 AAAACCTGTC TTGGATAGAG AATATTTAGC CATTTACCTA AAAATGGTAT 
151 TTTTTACATG CAATGCATGT GGTGAATCAG TGAAGAAAAT ACAAGTGGAA 
201 AAGCATGTGT CTGTTTGCAG AAACTGTGAA TGCCTTTCTT GCATTGACTG 
2 51 CGGTAAAGAT TTCTGGGGCG ATGACTATAA AAACCACGTG AAATGCATAA 
301 GTGAAGATCA GAAGTATGGT GGCAAAGGCT ATGAAGGTAA AACCCACAAA 
351 GGCGACATCA AACAGCAGGC GTGGATTCAG AAAATTAGTG AATTAATAAA 
4 01 GAGACCCAAT GTCAGCCCCA AAGTGAGAGA ACTTTTAGAG CAAATTAGTG 
451 CTTTTGACAA CGTTCCCAGG AAAAAGGCAA AATTTCAGAA TTGGATGAAG 
501 AACAGTTTAA AAGTTCATAA TGAATCCATT CTGGACCAGG TGTGGAATAT 
551 CTTTTCTGAA GCTTCCAACA GCGAACCAGT CAATAAGGAA CAGGATCAAC 
601 GGCCACTCCA CCCAGTGGCA AATCCACATG CAGAAATCTC CACCAAGGTT 
651 CCAGCCTCCA AAGTGAAAGA CGCCGTGGAA CAGCAAGGGG AGGTGAAGAA 
701 GAATAAAAGA GAAAGAAAGG AAGAACGGCA GAAGAAAAGG AAAAGAGAAA 
751 AGAAAGAACT AAAGTTAGAA AACCACCAGG AAAACTCAAG GAATCAGAAG 
801 CCTAAGAAGC GCAAAAAGGG ACAGGAGGCT GACCTTGAGG CTGGTGGGGA 
8 51 GGAAGTCCCT GAGGCCAATG GCTCTGCAGG GAAGAGGAGC AAGAAGAAGA 
901 AGCAGCGCAA GGACAGCGCC AGTGAGGAAG AGGCACGCGT GGGCGCAGGG 
951 AAGAGGAAGC GGAGGCACTC GGAAGTTGAA ACAGATTCTA AGAAGAAAAA 
1001 GATGAAGCTC CCAGAGCATC CTGAGGGCGG AGAACCAGAA GACGATGAGG 
1051 CTCCTGCAAA AGGTAAATTC AACTGGAAGG GAACTATTAA AGCAATTCTG 
1101 AAACAGGCCC CAGACAATGA AATAACCATC AAAAAGCTAA GGAAAAAGGT 
1151 TTTAGCTCAG TACTACACAG TGACAGATGA GCATCACAGA TCCGAAGAGG 
1201 AACTCCTGGT CATCTTTAAC AAGAAAATCA GCAAGAACCC TACCTTTAAG 
1251 TTATTAAAGG ACAAAGTCAA GCTTGTGAAA TGAACATTTG TGTATTTAAA 
1301 AATTGAATCC ATTCTGCTGA CTTCTTCCTT TCACTGCTGT TTATAAAATG 
1351 TGTAATG7VAT TCTAACAACT CAAATTTTGC TTTTTGAAGC TGTATTTTTA 
1401 AGTTAAGAAA ATATATTTTT GGTATAACTT TTATGAGAAA AATAAAATAT 
1451 ATTCTGGTCC AAACTTCAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
1501 AAA 


BLAST Results 


NO BLAST result 


Medline entries 


93259460: 

LYAR, a novel nucleolar protein with zinc finger DNA-binding motifs, is 
involved in cell 

growth regulation. 


562 


wo 01/12659 PCT/lBOO/01496 


Peptide information for frame 3 


ORF from 144 bp to 1280 bp; peptide length: 379 
Category: strong similarity to known protein 
Classification: Cell division 
Prosite motifs: ATP_GTP_A (60-68) 


1 MVFFTCNACG ESVKKIQVEK HVSVCRKCEC LSCIDCGKDF WGDDYKNHVK 

51 CISEDQKYGG KGYEGKTHKG DIKQQAWIQK ISELIKRPNV SPKVRELLEQ 

101 ISAFDNVPRK KAKFQNWMKN SLKVHNESIL DQVWNIFSEA SNSEPVNKEQ 

151 DQRPLHPVAN PHAEISTKVP ASKVKDAVEQ QGEVKKNKRE RKEERQKKRK 

201 REKKELKLEN HQENSRNQKP KKRKKGQEAD LEAGGEEVPE ANGSAGKRSK 

251 KKKQRKDSAS EEEARVGAGK RKRRHSEVET DSKKKKMKLP EHPEGGEPED 

301 DEAPAKGKFN WKGTIKAILK QAPDNEITIK KLRKKVLAQY YTVTDEHHRS 

351 EEELLVIFNK KISKNPTFKL LKDKVKLVK 

BIASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKF2phtes3_14g5, frame 3 

PIR:A40683 cell growth regulating nucleolar protein LYAR - mouse, N » 
1, Score = 1410, P » 2.7e-144 

5WISSPROT:YQ58_CAEEL HYPOTHETICAL 28.5 KD PROTEIN C16C10,8 IN 
CHROMOSOME III., N 1, Score = 381, P » 2.9e-35 

TREMBL:ACO03O58_18 gene: "F27F23.18"; product: "putative RNA-binding 
protein"; Arabidopsis thaliana chromosome II BAC F27F23 genomic 
sequence, complete sequence., N = 3, Score = 139, P = 4e-15 

PIR:S70049 nucleic acid-binding protein YCR087c-a - yeast 
(Saccharomyces cerevisiae), N »» 1, Score » 164, P « 1.4e-ll 


>PIR:A40683 cell growth regulating nucleolar protein LYAR - mouse 
Length » 388 

HSPs: 

Score = 1410 (211.6 bits). Expect = 2.7e-144, P « 2.7e-144 
Identities = 275/388 (70%), Positives = 317/388 (81%) 

Query: 1 MVFFTCNACGESVKKIQVEKHVSVCRNCECLSCIDCGKDFWGDDYKNHVKCISEDQKYGG 60 

MVFFTCNACGESVKKIQVEK VS CRNCECLSCIDCGKDFWGDDYK+HVKCISE QKYGG 
Sbjct: 1 MVFFTCNACGESVKKIQVEKQVSNCRNCECLSCIDCGKDFWGDDYKSHVKCISEGQKYGG 60 

Query: 61 KGYEGKTHKGDIKQQAWIQKISELIKRPNVSPKVRELLEQISAFDNVPRKKAKFQNWMKN 120 

KGYE KTHKGO KQQAWIQKI+ELIK+PNVSPKVRELL+QISAFDNVP KKAKFQNWMKN 
Sbjct: 61 KGYEAKTHKGDAKQQAWIQKINELIKKPNVSPKVRELLQQISAFDNVPIKKAKFQNWMKN 120 

Query: 121 SLKVHNESILDQVWNIFSEASNSEPVNKEQDQRPLHPVANPHAEIS-TKVPASKVKDAVE 179 

SLKVH++S+L+QVW+IFSEAS+SE ++Q Q P H A PHAE+ TKVP++K E 
Sbjct: 121 SLKVHSDSVLEQVWDIFSEASSSE— QDQQQPPSH-TAKPHAEMPITKVPSAKTNGTTE 176 

Query: 180 QQGEVKKNKRERKEERQKKRKREKKELKLENHQENSRNQKPKKRKKGQEADLEAGGEEVP 239 

+Q E KKNKRERKEERQK RK+EKKELKLENHQEN R QKPKKRKK QEA EA GE+ 
Sbjct: 177 EQTEAKKNKRERKEERQKNRKKEKKELKLENHQENLRGQKPKKRKKNQEAGHEAAGEDGA 236 

Query: 240 EANG SAGKRSKKKKQRKDSASEEEA RVGAGKRKR-RHSBVETDSKKKKM 287 

+ +G G+ S++ R E+ A + AGBCRKR +HS E+ KKKKM 

Sbjct: 237 DGSGFPEKKKAQGGQASEEGADRNGGPGEDRAEGQTKTAAGfCRKRPKHSGAESGYKKKKM 296 

Query: 288 KLPEHPE6GEPEDDEAPAKGKFNWKGTIKAILKQAPDNEITIKKLRKKVLAQYYTVTDEH 347 

KLPE PE GE -t-D EAP+KGKFNWKGTIKA+LKQAPDNEI++KKL+KKV+AQY+ V ++ 
Sbjct: 297 KLPEQPEEGEAKDHEAPSKGKFNWKGTIKAVLKQAPDNEISVKKLKKKVIAQYHAVMNDT 356 

Query: 348 HRSEEELLVIFNKKISKNPTFKLLKDKVKLVK 379 

EEELL IFN+KIS+NPTFK+LKD+VKL+K 
Sbjct: 357 SHHEEELLAIFNRKISRNPTFKVLKDRVKLLK 388 


Pedant information for DKFZphtes3_14g5, frame 3 
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Report for DKFZphtes3_14g5. 3 


(LENGTH] 379 

[MWJ 43634.03 

[pi] 9.59 

(HOMOLJ PIR:A40683 cell growth regulating nucleolar protein LYAR - mouse le-122 

(FUNCAT) 04.99 other transcription activities (S. cerevisiae, YCR087c-aI 2e-ll 

[BLOCKS! BL00603D Thymidine kinase cellular-type proteins 

[BLOCKS] BL00530C 
fPROSITE] ATP GTP_A 1 

(KWJ All^Alpha 

[KWl LOW_COMPLEXITY 18.73 % 

SEQ MVFFTCNACGESVKKIQVEKHVSVCRNCECLSCIDCGKDFWGDDYKNHVKCISEDQKYGG 
SEG 

PRD ccccccccccccchhhhhhhheeecccccceeeccccccccccccccceeeeeccccccc 

S EQ KGYEGKTHKGDI KQQAW I QK I SEL I K R PN VS PKVRELLEQI S AFDNVPRKKAKFQNWMKN 

SEG 

PRD cccccccccchhhhhhhhhhhhhhhhhccchhhhhhhhhhhhcccccchhhhhhhhhhhc 

SEQ SLKVHNESILDQVWNIFSEASNSEPVNKEQDQRPLHPVANPHAEISTKVPASKVKDAVEQ 
SEG 

PRD cccccchhhhhhhhhhhhhhhcchhhhhhhhcccccccccccccceeecccccchhhhhh 

SEQ QGEVKKNKRERKEERQKKRKREKKELKLENHQENSRNQKPKKRKKGQEADLEAGGEEVPE 

SEG . . . .xxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccchhhhhhchhhhhccccccc 

SEQ ANGSAGKRSKKKKQRKDSASEEEARVGAGKRKRRHSEVETDSKKKKMKLPEHPEGGEPED 
SEG . . xxxxxxxxxxxxxxxxxx xxxxxxxxxxx 

PRD cccccccchhhhhhhhccchhhhhhhhhcccccccccccccchhhhhhcccccccccccc 

SEQ DEAPAKGKFNWKGTIKAILKQAPDNEITIKKLRKKVLAQYYTVTDEHHRSEEELLVIFNK 
SEG xxxxx 

PRD cccccceeeehhhhhhhhhhhccccccchhhhhhhhhhhhhhhccchhhhhhhhhhhhhh 

SEQ KISKNPTFKLLKDKVKLVK 

SEG xxxxxxxxxxx 

PRD ccccccchhhhhhhhhccc 


Prosite for DKFZphtes3_14g5 . 3 
PS00017 60->68 ATP GTP A PDOC00017 


(No Pfam data available for DKFEphtes3_14g5.3) 
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group: nucleic acid management 

DKrzphtes3_14h21 encodes a novel 648 amino acid protein with strong similarity to mus musculus 
RNA helicase and several RNA-dependent ATPases from the DEAD box family. 

RNA helicases comprise a large family of proteins that are involved in basic biological 
systems such as nuclear and mitochondrial splicing processes, RNA editing, rRNA processing, 
translation initiation, nuclear mRNA export, and mRNA degradation. RNA. helicases are essential 
factors in cell development and differentiation, and some of them play a role in transcription 
and replication of viral single-stranded RNA genomes. The members of the largest subgroup, the 
DEAD and DEAH box proteins, exhibit a strong dependence of the unwinding activity on ATP 
hydrolysis. The novel protein contains a DEAD-box and a ATP/GTP-binding site motif A (P-loop) 
and is a new member of this subgroup. 

The new protein can find application in modulating RNA metabolism and gene expression. 


strong similarity to RNA helicases 

start at Bp 33 matches Kozak consensus ACNatg 

Sequenced by BMFZ 

Locus: unknown 

Insert length: 2200 bp 

Poly A stretch at pos. 2166, polyadenylation signal at pos. 2140 


1 CAACGACGTC GGACGCGCCC CTTCTTGGAA CAATGTCCCA CCACGGAGGA 
51 GCTCCCAAGG CCTCTACGTG GGTCGTTGCT AGTCGGCGAA GCTCGACAGT 
101 GTCCCGAGCG CCAGAGAGGA GGCCGGCGGA GGAGTTGAAT CGAACAGGTC 
151 CTGAGGGATA TAGTGTCGGC AGAGGTGGTC GCTGGAGAGG CACCTCTAGG 
201 CCCCCGGAGG CCGTGGCCGC TGGTCACGAG GAACTGCCGC TGTGTTTTGC 
251 TTTGAAGAGC CACTTTGTTG GCGCGGTAAT CGGTCGTGGT GGGTCAAAAA 
301 TAAAGAATAT ACAAAGTACA ACAAACACCA CAATCCAAAT AATACAAGAA 
351 CAACCAGAAT CATTAGTCAA AATTTTTGGC AGCAAGGCAA TGCAAACGAA 
401 AGCAAAAGCA GTGATAGACA ATTTTGTTAA AAAGCTAGAA GAAAATTACA 
451 ATTCAGAATG CGGAATTGAT ACTGCATTCC AACCTTCTGT TGGAAAAGAT 
501 GGAAGCACAG ATAACAATGT TGTTGCAGGA GATCGGCCAT TGATAGATTG 
551 GGATCAAATT AGAGAGGAAG GTTTGAAATG GCAAAAAACA AAGTGGGCAG 
601 ATTTACCACC AATTAAGAAA AACTTTTATA AAGAGTCCAC TGCCACAAGT 
651 GCCATGTCAA AAGTAGAAGC AGATAGTTGG AGGAAAGAAA ATTTTAATAT 
701 AACGTGGGAT GACTTGAAGG ATGGGGAGAA ACGACCTATC CCCAATCCTA 
151 CCTGCACATT TGATGACGCC TTTCAATGTT ATCCTGAGGT TATGGAAAAC 
801 ATTAAAAAGG CAGGTTTTCA AAAGCCAACA CCTATTCAGT CACAGGCATG 
851 GCCCATTGTG TTGCAAGGAA TAGATCTTAT AGGAGTAGCC CAGACTGGAA 
901 CAGGAAAGAC ATTGTGTTAT TTAATGCCTG GATTTATTCA TCTGGTCCTT 
951 CAACCCAGCC TTAAAGGTCA AAGGAATAGA CCCGGCATGT TAGTTCTAAC 
1001 TCCCACTCGG GAATTAGCAC TTCAAGTAGA AGGAGAATGT TGCAAATATT 
1051 CATATAAAGG GCTTCGGAGT GTTTGTGTAT ATGGTGGTGG AAATAGAGAT 
1101 GAACAAATAG AAGAGCTTAA AAAAGGTGTA GATATCATAA TTGCAACTCC 
1151 CGGAAGATTG AATGATCTGC AAATGAGTAA CTTCGTCAAT CTGAAGAATA 
1201 TAACCTACTT GGTTTTAGAT GAAGCAGACA AGATGTTGGA CATGGGATTT 
1251 GAACCCCAGA TAATGAAGAT TTTGTTAGAT GTGCGCCCAG ATAGGCAGAC 
1301 AGTTATGACC AGTGCTACAT GGCCTCATTC AGTTCATCGC CTCGCACAAT 
1351 CTTATTTGAA AGAACCAATG ATTGTCTATG TTGGTACATT GGATCTAGTT 
1401 GCTGTAAGTT CAGTGAAGCA AAATATAATT GTAACCACCG AGGAAGAGAA 
1451 ATGGAGTCAC ATGCAAACTT TTCTACAGAG TATGTCATCC ACAGACAAAG 
1501 TCATTGTCTT CGTTTCTCGA AAAGCTGTTG CGGATCACTT ATCAAGTGAC 
1551 CTAATACTTG GAAATATATC AGTAGAGTCT CTGCATGGAG ATAGAGAACA 
1601 GAGAGATCGG GAGAAAGCAT TAGAGAACTT TAAAACAGGC AAAGTGAGAA 
1651 TACTAATTGC AACTGATCTA GCCTCTAGAG GACTTGATGT CCATGACGTT 
1701 ACACATGTCT ATAATTTTGA CTTTCCACGG AATATTGAAG AATACGTACA 
1751 CCGAATAGGG CGCACGGGAA GAGCAGGGAG GACTGGTGTT TCCATTACAA 
1801 CTTTGACTAG AAATGATTGG AGGGTTGCCT CTGAATTGAT TAATATTCTG 
1851 GAAAGAGCAA ATCAGAGTAT TCCAGAGGAG CTTGTATCAA TGGCTGAGAG 
1901 GTTTGAGGCA CATCAACGGA AAAGGGAAAT GGAAAGAAAA ATGGAAAGAC 
1951 CTCAAGGAAG GCCCAAGAAG TTTCATTAAT GTCTTCTGTA CTAGTGGG6T 
2001 AGAGAATTCA AGATTTTTTA GAAATATAGT AAGACAGAAG TATTGGACAT 
2051 GTTGGCAGTA TGAAGAGACC GGACTGATTT GACTGATTCT TAAAATAATA 
2101 GTGTTTGAAA ATATAGAATC CAGTGTTTTA TACTTTCTTT AATAAAAATA 
2151 GAAGTATTTA AACTTGAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
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No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 3 


ORF from 33 bp to 1976 bp; peptide length: 648 
Category: strong similarity to known protein 
Classification: Nucleic acid management 
Prosite motifs: ATP_GTP_A (286-294) 
DEADATPHELICASE (394-403) 


1 MSHHGGAPKA STWVVASRRS STVSRAPERR PAEELNRTGP EGYSVGRGGR 
51 WRGTSRPPEA VAAGHEELPL CFALKSHFVG AVIGRGGSKI KNIQSTTNTT 
101 IQIIQEQPES LVKIFGSKAM QTKAKAVIDN FVKKLEENYN SECGIDTAFQ 
151 PSVGKDGSTD NNVVAGDRPL IDWDQIREEG LKWQKTKWAD LPPIKKNFYK 
201 ESTATSAMSK VEADSWRKEN FNITWDDLKD GEKRPIPNPT CTFDDAFQCY 
251 PEVMENIKKA GFQKPTPIQS QAWPIVLQGI DLIGVAQTGT GKTLCYLMPG 
301 FIHLVLQPSL KGQRNRPGML VLTPTRELAL QVEGECCKYS YKGLRSVCVY 
351 GGGNRDEQIE ELKKGVDIII ATPGRLNDLQ MSNFVNLKNI TYLVLDEADK 
401 MLDMGFEPQI MKILLDVRPD RQTVMTSATW PHSVHRLAQS YLKEPMIVYV 
451 GTLDLVAVSS VKQNIIVTTE EEKWSHMQTF LOSMSSTDKV IVFVSRKAVA 
501 DHLSSDLILG NISVESLHGD REQRDREKAL ENFKTGKVRI LIATDLASRG 
551 LDVHDVTHVY NFDFPRNIEE YVHRIGRTGR AGRTGVSITT LTRNDWRVAS 
601 ELINILERAN QSIPEELVSM AERFEAHQRK REMERKMERP QGRPKKFH 

BLA5TP hits 

NO BLASTP hits available 

Alert BLASTP hits for DKFZphtea3_14h21, frame 3 

TREMBL:CEY54G11A_9 gene: •'Y54G11A.3"; Caenorhabditis elegans cosmid 
Y54G11A, N ^ 1, Score = 1008, P = l.le-101 

TREMBL:SPBP8B7_16 gene: "dbp2"; "SPBP8B7 . 1 6c" ; product: "p68-li)ce 
protein."; S.pombe chromosome II pi p8B7., N = 1, Score = 971, p « 
9.1e-98 

PIR:S13757 RNA helicase DBP2 - yeast {Saccharomyces cerevisiae), N 1, 
Score = 970, P = 1.2e-97 

PIR:S14 048 RNA helicase dbp2 - fission yeast (Schizosaccharomyces 
pombe), N = 1, Score = 961, P = le-96 

PIR:A57514 RNA helicase HEL117 - rat, N = 2, Score « 888, P = 7.8e-91 


>TREMBL:CEY54G11A_9 gene: ''Y54G11A. 3"; Caenorhabditis elegans cosmid 
Y54G11A 

Length = 504 

HSPs: 

Score = 1008 (151.2 bits). Expect = l.le-101, P l.le-101 
Identities = 211/473 (44%), Positives = 298/473 (63%) 

Query: 174 DQIREEGLKWQKTKWADLPPIKKNFYKESTATSAMSKVEADSWRKENFNITWDDLKDGEK 233 

D++++E W K PI ++ YK +S + + 

Sbjct: 23 DRLKDENFSWMK PIVRDLYKIPNEQKNLSPEQLQELYTNGGVMKVYPFREEST 75 

Query: 234 RPIPNPTCTFDDAFQCYPEVMENIKKAGFQKPTPIQSQAWPIVLQGIDLIGVAQTGTGKT 293 

IP P +F+ AF +M I+K GF+KP+PIQSQ WP++L G D IGV+QTG+GKT 

Sbjct: 76 VKIPPPVNSFEQArGSNASIMGEIRKNGFEKPSPIQS(»«WPLLLSGQDCIGVSQTGSGKT 135 

Query: 294 LCYLMPG FIHLVLQPSL KGQRNRPGMLVLTPTRELALQVEGECCKYSYKGLRSVC 348 

L +L+P +H+ Q + + Q+ P +LVL+PTRELA Q+EGE KYSY G +SVC 

Sbjct: 136 LAFLLPALLHIDAQLAQYEKNDEEQKPSPFVLVLSPTRELAQQIEGEVKKYSYNGYKSVC 195 

Query: 349 VYGGGNRDEQIEELKKGVDIIIATPGRLNDLQMSNFVNLKNITYLVLDEADKMLDMGFEP 408 
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+YGGG+R EQ+E + GV+I + IATPGRL DL ++L ++TY+VLDEAD+MLDMGFE 

Sbjct: 196 LYGGGSRPEQVEACRGGVEIVIATPGRLTDLSNDGVISLASVTYVVLDEADRMLDMGFEV 255 

Query: 409 QIMKILLDVRPDRQTVMTSATWPHSVHRLAQSYLKEPMIVyVGTLDLVAVSSVKQNIIVT 468 

I +IL ++RPDR +TSATWP V +L Y KE ++ G+LDL + SV Q 
Sbjct: 256 AIRRILFEIRPDRLVALTSATWPEGVRKLTDKYTKEAVMAVNGSLDLTSCKSVTQFFEFV 315 

Query: 4 69 TEEEKW SHMQTFLQSMSSTD-KVI VFVSRKAVADHLSSDLILGNISVESLHGDREQR 524 

+ ++ + FL + + K+I+FV K +ADHLSSD + 1+ + LHG R Q 

Sbjct: 316 PHDSRFLRVCEIVNFLT7VAHGQNYKMIIFVKSKV^4ADHLSSDFCMKGI^ISQGLHGGRSQS 375 

Query: 525 DREKALENFKTGKVRILIATDLASRGLDVHDVTHVYNFDFPRNIEEYVHRIGRTGRAGRT 584 

DRE +L ++G+V+IL+ATDLASRG+DV D+THV N+DFP +IEEYVHR+GRTGRAGR 
Sbjct: 3-76 DREMSLNMLRSGEVQILVATDLASRGIDVPDITHVLNYDFPMDIEEYVHRVGRTGRAGRK 435 

Query: 585 GVSITTLTRNDWRVASELINILERANQSIPEELVSMAERFEAHQRKREMERKMERPQGRP 644 

G +++ L ND LI ILE++ Q +P++L AE++ K + R RP R 

Sbjct: 436 GEAMSFLWWNDRSMFEGLIQILEKSEQEVPDQLRRDAEKYRL— KCQSGRDGPRPSFRN 492 

Query: 645 KK 646 
K 

Sbjct: 493 NK 494 


Pedant information for DKFZphtes3_14h21, frame 3 
Report for DKFZphtes3_14h21 .3 


I LENGTH J 

648 



[MWI 

72873.51 



Ipll 

8.84 



[HOMOL) 
101 

TREMBL:CEY54G11A_9 gene: •'y54GllA.3-; Caenorhabditis elegans cosmid yS4GllA le- 

(FUNCATl 

04.01,04 rrna processing (S. cerevisiae, YNL112w) 2e-97 



[FUNCATJ 

30.10 nuclear organization [S. cerevisiae, YNL112w) 2e-97 



[FUNCATJ 

04.05.03 mrna processing (splicing) (S. cerevisiae, YPL119c) 

4e 

-72 

{FUNCAT] 

30.03 organization of cytoplasm (S. cerevisiae, YOR204w) 

2e 

-70 

(FUNCATl 

05.04 translation (initiation, elongation and termination) (S. cerevisiae. 

yOR204w) 2e- 

-70 



( FUNCATl 

06.10 assembly of protein complexes (S. cerevisiae, YBR237w) 

le 

-61 

(FUNCAT] 

1 genome replication, transcription, recombination and repair 


(H. 

influenzae. 

HI0892) 2e-49 


(FUNCAT) 

j mrna translation and ribosome biogenesis (H. influenzae, HI0231 RNA] le-48 

(FUNCAT) 

04-99 other transcription activities [S. cerevisiae, YDL160c] 

9e 

-45 

(FUNCAT) 

04.05.01.07 chromatin modification (S. cerevisiae, YMR290ci 

3e 

-44 

(FUNCAT) 

09.01 biogenesis of cell wall (S. cerevisiae, YJL033w) 

2e 

-36 

(FUNCAT) 

98 classification not yet clear-cut (S. cerevisiae, YOR046c) 

7e 

-32 

(FUNCAT) 

30.16 mitochondrial organization (s. cerevisiae, YDR194c) 

2e 

-28 

(FUNCAT) 

99 unclassified proteins [S. cerevisiae, YGL064c] 5e-10 



(FUNCAT) 

11.10 cell death (S. cerevisiae, YMRl90c) 2e-08 



(FUNCAT) 

03.19 recombination and dna repair (S. cerevisiae, YMR190c) 

2e 

-08 

(FUNCAT) 

r general function prediction (M. jannaschii, MJ1401) 

le-07 

(blcx:ks) 

BL00039D DEAD-box subfamily ATP-dependent helicases proteins 



(BLOCKS) 

BL00039C DEAD-box subfamily ATP-dependent helicases proteins 



(BLOCKS) 

BL00039B DEAD-box subfamily ATP-dependent helicases proteins 



(BLOCKS] 

BLO0O39A DEAD-box subfamily ATP-dependent helicases proteins 



[PIRKW] 

nucleus 4e-96 



(PIRKW) 

RNA binding 3e-87 



(PIRKW) 

DEAD box 5e-50 



(PIRKW) 

transmembrane protein 4e-27 



(PIRKW) 

DNA binding 3e-67 



(PIRKW) 

recF recombination pathway 3e-10 



(PIRKW) 

ATP 4e-96 



(PIRKW) 

purine nucleotide binding 5e-50 



(PIRKW) 

P-loop 4e-96 



(PIRKW) 

hydrolase 9e-45 



(PIRKW) 

protein biosynthesis 5e-50 



(PIRKW) 

ATP binding le-61 



(SUPFAM) 

WW repeat homology 8e-88 



(SUPFAM) 

DEAD/H box helicase homology 4e-96 



(SUPFAM) 

unassigned DEAD/H box helicases 7e-87 



(SUPFAM) 

ATP-dependent RNA helicase DBPl 4e-96 



(SUPFAM) 

ATP-dependent RNA helicase DHHl 2e-43 , 



(SUPFAM) 

recQ protein 3e-10 



(SUPFAM] 

Blocun's syndrome helicase 5e-07 



(SUPFAM) 

translation initiation factor eIF-4A 5e-50 



(SUPFAM) 

recQ helicase homology 3e-10 



(SUPFAM) 

tobacco ATP-dependent RNA helicase DBIO 8e-88 



(PROSITE) 

DEAD_ATP_HELICASE 1 
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[PROSITE] ATP_GTP_A 1 

(PFAM) Helicases conserved C~terminal domain 

IPFAMl KH domain family of RNA binding proteins 

(PF/VMJ DEAD and DEAH box helicases 

[KW] Alpha_Beta 

IKW] LOW_COMPLEXITY 8.49 % 

SEQ MSHHGGAPKASTWVVASRRSSTVSRAPERRPAEELNRTGPEGYSVGRGGRWRGTSRPPEA 

SEG xxxxxxxxxxxxxxxxx 

PRO cccccccccceeeeeecccccccccccccccccccccccccccccccccccccccccccc 

SEQ VAAGHEELPLCFALKSHFVGAVIGRGGSKIKNIQSTTNTTIQIIQEQPESLVKIFGSKAM 

SEG . .- xxxxxxxxxxxxxxx 

PRD cccccccccchhhhhcccceeeecccccccccccccccceeeeecccccceeeeeccchh 

SEQ QTKAKAVI DNFVKKLEENYNSECGI DTAFQPSVGKDGSTDNNVVAGDRPLI DWDQI REEG 

SEG 

PRD hhhhhhhhhhhhhhhhhhhccccccccccccccccccccccccccccccccccccccccc 

SEQ LKWQKTKWADLPPI KKNFYKESTATSAMSKVEADSWRKENFNITWDDLKDGEKRPI PNPT 

SEG 

PRD chhhhhhhcccccccccccccccccchhhhhhhhhhhhhhheeeeecccccccccccccc 

SEQ CTFDDAFQCYPEVMENIKKAGFQKPTPIQSQAWPIVLQGIDLIGVAQTGTGKTLCYLMPG 

SEG 

PRD ccccccccccchhhhhhhhhhcccccccccccccccccccceeeeeecccccceeeecce 

SEQ FIHLVLQPSLKGQRNRPGMLVLTPTRELALQVEGECCKYSYKGLRSVCVYGGGNRDEQIE 

SEG 

PRD eeeeccccccccccccceeeeeccchhhhhhhhhhhhhhhccceeeeeeccccccchhhh 

SEQ ELKKGVDI 1 1 ATPGRLNDLQMSNFVNLKNITYLVLDEADKMLDMGFEPQIMKILLDVRPD 

SEG 

PRD hhhhceeeeeeccccchhhhhhhccccccceeeehhhhhhhhhcccchhhhhhhhhhccc 

SEQ RQTVMTSATWPHSVHRLAQSYLKEPMI VYVGTLDLVAVSSVKQNI IVTTEEEKWSHMQTF 

SEG 

PRD ceeeeeecccchhhhhhhhhhhhheeeeeecccccccccccceeehhhhhchhhhhhhhh 

SEQ LQSMSSTDKVIVFVSRKAVADHLSSDLILGNISVESLHGDREQRDREKALENFKTGKVRI 

SEG 

PRD hhhhcccceeeeeeehhhhhhhhhhhhhhcccceeecccccchhhhhhhhhhhhccccee 

SEQ LIATDLASRGLDVHDVTHVYNFDFPRNI EEYVHRIGRTGRAGRTGVSITTLTRNDWRVAS 

SEG xxxxxxxxxxxx 

PRD eeehhhhhhcccccceeeeeeeccccccccceeeecccccccccceeeeeeccccchhhh 

SEQ ELINILERANQSIPEELVSMAERFEAHQRKREMERKMERPQGRPKKFH 

SEG xxxxxxxxxxx 

PRD hhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhccccccccccc 


Prosite for OKF2phtes3_14h21 . 3 

PS00017 286->294 ATP_GTP_A PDOC00017 
PS00039 394->403 DEAD_ATP_HELICASE PDOC00039 


Pfam for DKFZphtes3_14h21 .3 
HMM_NAME DEAD and DEAH box helicases 

HMM * gLpPWI LRn I yeMGFEk PTPI QQqAI Pil LeGRDVMACAQTGSGKTAAF 

P++++NI+++GF KPTPIQ+QA+PI+L+G D+++ AQTG+GKT+++ 
Query 248 QCYPEVMENIKKAGFQKPTPIQSQAWPIVLQGIDLIGVAQTGTGKTLCY 296 

HMM II PMLQHIDwdPWpqpPQd . . PrALILAPTRELAMQIQEEcRkFgkHMng 

L+P ++H+ +P +++ Q+ P +L+L+PTRELA+Q++ EC K+++ + 
Query 297 LMPGFIHLVLQP-SLKGQRNRPGMLVLTPTRELALQVEGECCKYSYK-G- 343 

HMM IRImcIYGGtnMRdQMRmLeRGpPHIVIATPGRLIDHIERgtldLDrleM 
+R++C+YGG N ++Q+++L++G+ +I+IATPGRL D+ +++ ++L++H-+ 
Query 344 LRSVCVYGGGNRDEQIEELKKGV-DIIIATPGRLNDLQMSNFVNLKNITY 392 

HMM LVMDEADRMLDMGFIDQIRrlMrqlPMpwNRQTMMFSATMPdelqELARr 

LV+DEAD+MLDMGF-H-Qr++I+ ++ ++RQT+M SAT+P ++ +LA 
Query 393 LVLDEADKMLDMGFEPQIMKILLDVR— PDRQTVMTSATWPHSVHRLAQS 440 
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HMM 
Query 


FMRNPIRInId . MdElTtnEnI kQwYiyVerEMWKf dcLcrLIe* 
++++P + ++ D +++ +KQ +1+ E++K + ++++ 
441 YLKEPMI VYVGTLDLVAVS-SVKQNI I VTT-EEEKWSHMQTFLQ 


482 


HMM_NAME 
HMM 
Query 
HMM 

Query 


KH domain family of RNA binding proteins 

*rIiIPedhMGMIIGKGGsNIRqIREEYgvrINIPdecCeDstdRIITIt 
+ + ++++G++IG+GGS I++I++ ++++I I++E+ + + + I 
71 ' CFALKSHFVGAVIGRGGSKIKNIQSTTNTTIQIIQEQ-P-— ESLVKIF 115 


G* 

G 

116 G 


116 


HMMNAME 

HMM 

Query 

HMM 

Query 


Helicases conserved C-terminal domain 


497 


♦EileeWLknl GIrvmYIHGdMpQeERdelMddFNnGEynVLIcTD 

+ +++ L+ + +I+V ++HGD++Q++R++++++F++G+ ++LI+TD 
KAVADHLSSDLILGNISVESLHGDREQRDREKALENFKTGKVRILIATD 


545 


VggRGIDI PdVNHVINYDMPWNPEqYIQRIGRTgRIG* 
+++RG+D+ DV HV+N+D+P+N+E Y++RIGRTGR+<3 
546 LASRGLDVHDVTHVYNFDFPRNIEEYVHRIGRTGRAG 
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DKFZphtes3_14pl4 


group: testes derived 

DKFZphtes3_14pl4 encodes a novel 159 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP mot if e. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 


unknown 

complete cDNA, complete cds, few EST hits 

Sequenced by BMFZ 

Locus : unknown 

Insert length: 3969 bp 

Poly A stretch at pes. 3948, polyadenylation signal at pos, 3927 


1 GAAGCCCAGG CTCTCCTTAG TTGACTGTGT GTTAATCACC CAGCAATTTC 
51 ATTACTCAAC AGCTCTCCAG AGTTGCACAT TACAGCTGGG GTAGAAATTG 
101 GGTGCTGAAG GCCAGGCAGA GCATTTGGCT GTAGGGAGGC CGATCCTCCT 
151 CGGGCCTGTT ACCGGCGGGT CTTTGTTCTT AGACCTGGGG TTCTTGGCCT 
201 CACGGATTCC AAGGAATGGA ACGTTGGGCC ATGCGTGTGA ACGAGCTCTA 
251 TGTCGATGAC CCAGACAAGG ACAGCGGTGG CAAGATCGAC GTCAGTCTGA 
301 ACATCAGTTT ACCCAATCTG CACTGCGAGT TGGT7GGGCT TGACATTCAG 
351 GATGAGATGG GCAGGCACGA AGTGGGCCAC ATCGACAACT CCATGAAGAT 
401 CCCGCTGAAC AATGGGGCAG GCTGCCGCTT CGAGGGGCAG TTCAGCATCA 
451 ACAAGGTATG GAAGCCCTGC CTCAGCCCTT TCTACCTGCT CCCCTTTCCT 
501 GCTGTCTCCC CGCTCCCTGG AAACTGGTTG TGGAGGCACT CACTCGACCT 
551 GACCCTGACA CAGCCCCCAG CAAGCGAGGG TTCGTGTCCA GCTGCCTGGC 
601 CGTTCCTGCT GAGAATCTGG ATGGGGGTCC AGGCTCCCTG GGGTTTTAAG 
651 CCCCTGATGG CTGGTTCAGG AAGGAGCTAC TCTTCTCTCC AGTGAGGGGG 
701 ACAATGATGA GAAGACCTGA GGATTTGCAG CCCCCAGCCC TGGGTTCAAG 
751 TCCCAGCTCT ACCCCTTCTT GGCCCCTACA AGTCACTTGA CCCATCTTAG 
801 GCTGAGGGTG TGATGGCGAT AATAGTATCA CGATACCACC CACTTCACAA 
851 AGTTTGTGTG GGGATTAAAT GAGCTAATGC AGATTCATTC ATTCAGAAAA 
901 ATTTTTGAAT GGCACGTTCT GTGTTCCAGG GTCGGTGATA GGCTCTGGGG 
951 CAGCGTTCCT GGGCTGGTGG GGCTCCCATT CTGGTAGAGG GAGACAGTCT 
1001 ACAAACCAGA AAGCATCAGG GATGCTAAGT GCAGTGATGA GGAATAAAGC 
1051 CAAGGGGAGT GAGATGAGGT GGGCTTGAAA GTACCTTGTC CGCTCAGAAG 
1101 GACCATTCAA GGTTCACTGT TGTTTTGTCC TCAGAACCAG GAGCTTCAGA 
1151 TCCTAAGTCA AGTGGGTGAA CGCAGTGCCC TTGGGAGGGC CGAGGCACCC 
1201 GGTGGCAGCT GGCAGGGTTT TGCTCAGCAC GTGCCGGCCT TCCTCGAAGC 
1251 TCGGTACTGT CACAGTGGAG CCTCTCAACA ACGCTGTGAG GCAGCACCAT 
1301 TTGACAGGTT AGGATGCTGG GGCCCAGAGA GGTTAAGTGT CTTGCCCGAG 
1351 GTCACACAGC TATCTGCATG TCCCACAACT CCCCTTCCCA GCCCCAGCCA 
1401 AACTGAGCCA CTGGCCACTC CTGGCTTCTC CTTGTCCCTC CTGCAGCCTC 
1451 TGCTCAGAAC GCCCTTCCTC CAGACCCTGA CACCTGAGCT GGGGTTGCAA 
1501 AGTCACTGGC CACATCCAGC CCAAAGATAA ATTTTGTTTG TCCAGTATAG 
1551 CATTTAACTG CATCAGAACC AGTATGAAAA GACCAGGAAT CCAGATTTCT 
1601 GGCTTTTAAA AGTCAGAGGC TCTCACTACA CTGGGTCCGT GTTCCCGCTA 
1651 TGACAATGAC CTGGCACCAA TGGGCAGTGT TCCCCTTTAG AGAGGGTGTG 
1701 TGCTGTCCCT TCCCACAGTC CCTGGCAGGC GGCTGGAAGG CCAGGCCTGG 
1751 TCATCTGTCA AGCAGGGTGG ACTTCTTACG TGACAGTTCA GGGCTCCCTT 
1801 AAGTGCTAAA GCAGAAGCTG CAAGGCTTTC TTAAGGTTTC GAGTGTTGCT 
1851 GGGAGAAATC TGCTGCATGT TGTGGGTTAA AGGGAGTCTC TCACCAGCCC 
1901 AGGCCCTCAG GAGGAGGAGA TACCAGGAGG CAGGGATGCT GGGGGTCGTG 
1951 GTTCACTGGG GGCTCTCTCT GCCCATGAGC TGCCACACAG CACCTTTGCC 
2001 ATGCCCCGTA ATTTGGATTT TATGGTGGTT GTGATGGAAA GCCATTTGAG 
2051 GGTTTTGAAC AGGGAGGCAA TGTAATCAGA TTTATGCCTT AGAACTGGAC 
2101 TATCCAATAG GTTGCCACCA GCCACATAAG GCTATTTAAA TTAATTCAAA 
2151 TTAAATGTAC AATTCAGTCA CTCATTCTCA TCAACCACAT TTCAAGTGCT 
2201 CAAAGCCACG TGCTGGCTAG GGGCCACAGC GTTAGACAGT GCAGAGAGAA 
2251 AGCACTTCCA TCGCTGAGCA AAGTTCTGCT GGACCGCACA CCCTTAGAAG 
2301 GATGGCTCTG GTGGCCGGGC GCGGTGGCTC AAACCTGTAA TCCCAGCACT 
2351 TTGGGAGGGC GAGGTGGGTG GATCACGAGG TCAGGAGATC GAGACCATCC 
2401 CGGCTAACAT GGTGAAACCC TGCCTCTACT AAAAATACAA AAAAAAACAA 
2451 AATTAGCCGG GCGTGGTTGC GGGCACCTGT AGTCCCAGCT ACTCAGGAGG 
2501 CTGAGGCGGG AGAATGGCAT GAACCCGGGA GGTGGAGCTT GCAGTGAGCC 
2551 AAGATCGTAC CACTGCACTC CAGTCTGGGC GACAGAGTGA GACTCCATCT 
2601 CAAAACAAAC AAAAAAAGGA TGGGGCTGGG CTGGAGAGGG TGGCAGGCAG 
2651 TGGTTGTGGC AGTGGAGCTG GGGAGATGTG GTCGGATTAG GGAGGTAGAA 
2701 TCAATAAGAC TCAGTGAAGA ATCGGATGTG GGGGTAAGGG CACATGTGGA 
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2751 AGCAAAGAAA CCTTTGACGT CTTTGTCTTG ACAACCGGGT GGTCCTGTTT 
2801 CTAGACATGG AAGCTTAGAA AAGCCTGGAG TCTGTGGGAA GTAGGTAGGG 
2851 CTGGGCACTG GTCATTCCAC TCTGGTTTCC TTTGGGGTTC CCATTAGGTG 
2901 TCTACAGGGA GAGGTGAAAT TGGAAGTTGG AGGTGTGGAG AGTTCAGGAG 
2951 AGGGTTCTGG ACCACAGATG TTGAGGTGGG AGTCATTAGT GAATAGATGA 
3001 TGTTGGAAGT CATGGGTCCT CAGAGTGGGG GCTCCTTAAG CCTCCAGGCC 
3051 AGCAGCATCA GCATCACCTG GGAGATTGTT AGGAATGCAG ATTCTCAGGC 
3101 CCCCCTAAGA CCCACCGACT CTGTGCTAGA ACAAGCGCCC CTCAGAGATT 
3151 CTGATGCCAC TGAAGTTTGA GGAGCATTGG TTTAAGCAAG ATTACCTACG 
3201 GAGAGGCTGT AGATCCGTGT TCTAAACCTG GGGTCCACAG ACACCCCCAA 
3251 GAAGAGCGGA TTGAATGCAA GAGATCTATG AAGTTGGATG GGGGAAAAAT 
3301 TGACATCTTT ATTTTTGCTA AACTCGATCT AAAGTTTAGC ATTTCCATCT 
3351 GCGATGAATG TAGGCCACAA ACCACAGTAG TATTAGCAGT GCCTGGGACC 
3401 TCCTCAACAA CAGAAATTGC CGGTATTTAT AGCACGTTAC AGTTGTTGCA 
3451 GATAATTTCC AGAGACTGTT TATATGCACC ACTGTTTTAA AATTACGGTG 
3501 ATTGGCCAGG TGCAGTGGCT CACACCTGTA ATCCCAGCAC TTTGGGAGGC 
3551 CAAAGTGGGT GGATCACTTG AGGAGTTCAA GACCAGCCTG GTCAACATGT 
3601 CAAAACCCTG TATCTACAAA AAAATACAAA AGTTAACCAA GCCTATGCTT 
3651 GTAGTCACAG CTACTCGGGA GGCCGAGGTG GGAGGGTCTT CTGAGCCCAG 
3701 GGAGGTAGAG GCTTCAGTGA GCTGAGATCG CACCACCACA CTCCAGCCTG 
3751 GGTGACAGAG TGAAACCCTT AATCAATCAG TCAATAAAAA TTACAGTAAT 
3801 TATTAGACCC ACCACTAGGT CATCTTATTT GATGCATCAG TAAAGCAGCA 
3851 TATTCAAATG TGGATTTTTA AATATTTTAA TTACTATTTA AATATCTCTT 
3901 TACTTTGTAA TCCTATGCAT TTTACGCATT AAAACATTTT AAGCATTTAA 
3951 AAAAAAAAAA AAAAAAAAA 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 3 


ORF from 216 bp to 692 bp; peptide length: 159 
Category: putative protein 
Classification: no clue 


1 MERWAMRVNE LYVDDPDKDS GGKIDVSLNI SLPNLHCELV GLDIQDEMGR 
51 HEVGHIDNSM KIPLNNGAGC RFEGQFSINK VWKPCLSPFY LLPFPAVSPL 
101 PGNWLWRMSL DLTLTQPPAS EGSCPAAWPF LLRIWMGVQA PWGFKPLMAG 
151 SGRSYSSLQ 


BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_14pl4, frame 3 
No Alert BLASTP hits found 

Pedant information for DKF2phtes3_14pl4, frame 3 


Report for DKFZphtes3_14pl4.3 


(LENGTH) 159 

IMW) 17778,55 

Ipl] 5.74 

[FUNCATl 99 unclassified proteins CS. cerevisiae, YAL042wl 5e-04 

(KWJ Alpha_Beta 


SEQ MERWAMRVNELYVDDPDKDSGGKIDVSLNISLPNLHCELVGLDIQDBMGRHEVGHIDNSM 

PRD ccchhhhhhhhccccccccccceeeeeeccccccccceeeehhhhhhcccceeecccccc 

SEQ KIPLNNGAGCRFEGQFSINKVWKPCLSPFYLLPFPAVSPLPGNWLWRHSLDLTLTQPPAS 

PRD eeecccccceeecccccccccccccccccccccccccccccccccccccccccccccccc 
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SEQ EGSCPAAWPFLLRIWMGVQAPWGFKPLMAGSGRSYSSLQ 
PRO ccccccchhhhhhhhhhhccccccccccccccccccccc 

{No Prosite data available for DKF2phtes3_14pl4 . 3) 
(No Pfam data available for DKF2phtes3_14pl4 .3) 
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DKFZphtes3_14p7 


group: testes derived 

DKFZphtes3_14p7 encodes a novel 702 amino acid protein with very weak similarity to kinesin 
associated protein KAP3. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes. 


weak similarity to kinesin associated protein KAP3 

complete cDNA, con^ilete cds, few EST hits 

Sequenced by BMFZ 

Locus: unknown 

Insert length: 2497 bp 

Poly A stretch at pos. 2424, polyadenylation signal at pos. 2400 


1 GGAATCCAAA GAAACAGTTA TGATGGGGGA CTCTATGGTG AAT^TAAATG 
51 GGATTTATTT AACAAAATCA AATGCTATTT GCCACTTAAA GAGTCACCCA 
101 CTTCAGCTAA CTGATGATGG AGGCTTCAGT GAAATAAAGG AGCAAGAAAT 
151 GTTCAAAGGA ACAACATCTT TACCATCTCA TCTCAAGAAT GGAGGGGACC 
201 AGGGGAAGAG ACATGCGAGG GCCTCATCAT GCCCCAGTAG CTCAGACCTG 
251 AGCAGGCTGC AAACCAAAGC AGTCCCAAAA GCTGACCTGC AAGAAGAGGA 
301 CGCAGAAATA GAAGTAGACG AAGTCTTTTG GAATACAAGG ATTGTACCGA 
351 TTTTGCGTGA ATTAGAAAAG GAAGAAAACA TTGAAACGGT TTGTGCTGCT 
401 TGCACACAAC TTCATCATGC TTTAGAGGAA GGAAACATGC TTGGAAATAA 
451 ATTTAAGGGA AGAAGTATTC TCCTGAAGAC CCTGTGTAAA CTAGTTGATG 
501 TTGGTTCAGA CTCGCTCAGC CTTAAACTTG CAAAAATAAT TCTAGCACTT 
551 AAAGTGAGTA GAAAGAATCT TCTTAATGTC TGCAAACTTA TATTTAAAAT 
601 TAGCAGGAAT GAGAAGAATG ATTCTTTGAT TCAAAATGAC AGCATTCTGG 
651 AATCATTATT GGAGGTACTA AGAAGTGAAG ACCTGCAAAC TAACATGGAA 
701 GCTTTTTTAT ACTGTATGGG GTCTATAAAG TTCATTTCTG GAAATCTGGG 
751 ATTTCTTAAT GAAATGATCA GCAAAGGTGC TGTGGAAATA CTGATAAATT 
801 TGATAAAACA AATAAATGAG AACATCAAGA AATGTGGTAC ATTTTTGCCT 
851 AATTCGGGCC ACTTGCTAGT CCAGGTGACT GCTACATTGA GAAACTTGGT 
901 TGATTCATCA TTAGTAAGAA GTAAGTTCCT AAACATCAGT GCCCTTCCCC 
951 AGCTCTGCAC GGCAATGGAA CAGTACAAGG GTGACAAGGA CGTCTGTACC 
1001 AATATTGCCA GAATATTCAG CAAACTTACT TCTTACCGTG ACTGCTGCAC 
1051 AGCCTTGGCC AGCTATTCCA GATGTTATGC CTTATTTCTG AATCTAATTA 
1101 ACAAATACCA GAAGAAGCAG GATTTAGTCG TCCGTGTTGT TTTTATTCTT 
1151 GGCAACCTGA CGGCAAAAAA TAACCAGGCT CGTGAACAAT TTTCCAAAGA 
1201 GAAAGGGAGC ATCCAAACTC TGCTGTCATT ATTCCAGACG TTCCATCAGC 
1251 TGGATCTGCA TTCCCAGAAG CCGGTGGGCC AACGAGGCGA GCAGCACAGG 
1301 GCGCAGAGGC CGCCGTCAGA GGCAGAGGAC GTGCTCATCA AGCTGACTCG 
1351 TGTGCTGGCC AACATTGCCA TCCACCCGGG CGTGGGCCCG GTGCTGGCCG 
1401 CCAACCCGGG GATAGTGGGC CTGCTCCTGA CCACGCTG6A ATACAAGTCA 
1451 CTTGATGATT GTGAGGAGCT GGTGATCAAT GCTACAGCGA CAATCAACAA 
1501 TTTATCTTAC TACCAAGTGA AGAATTCCAT AATTCAAGAC AAAAAGCTAT 
1551 ATATTGCTGA ATTGCTCTTA AAGCTTCTTG TCAGTAACAA CATGGATGGA 
1601 ATCCTGGAGG CTGTGCGTGT TTTCGGA/VAT CTCTCCCAGG ACCATGATGT 
1651 CTGCGATTTC ATTGTGCAGA ACAATGTCCA CAGGTTCATG ATGGCGCTGC 
1701 TGGATGCTCA GCATCAGGAT ATCTGCTTTT CTGCCTGTGG TGTTCTCCTC 
1751 AATCTCACTG TGGATAAAGA CAAGCGTGTC ATCTTGAAAG AAGGAGGTGG 
1801 CATTAAAAAG TTAGTGGACT GTTTAAGAGA TTTGGGTCCT ACTGATTGGC 
1851 AGCTGGCCTG CTTGGTTTGT ATVAACTTTAT GGAACTTCAG TGAAAACATC 
1901 ACTAATGCTT CGTCATGTTT TGGAAATGAA GACACCAACA CACTCTTACT 
1951 CTTGCTCTCA TCATTTTTAG ATGAAGAACT AGCACTGGAT GGCAGTTTTG 
2001 ATCCAGACCT AAAAAACTAT CACAAACTCC ATTGGGAAAC AGAATTCAAA 
2051 CCTGTGGCAC AGCAGCTTCT AAACCGAATT CAGAGACATC ACACCTTCCT 
2101 GGAACCCCTG CCCATTCCCT CTTTCTAACA TGATGCAGAT TAACAGTAGA 
2151 AACGAGAACT CACGTCTCCC TCATTCTTAA GAACTGGTAA CAAACGTGAA 
2201 CATTTTTTTC AGCATTAACA AATGTGGAAA GTTTTTCAAG AACTGGTTTT 
2251 AGTGAGTAGC TGAAGTATTT TTTAAAATTA AGCATTTCTT CTTGTTAGGT 
2301 ATTATGGAAA AATGAATATA CACATTATAT TTCCTGTTGA GAGAAATGTA 
2351 AGATGAAAAT ATGTGCATTT TCAAGTAAAT GACTTTTTCT TCTATTCTCT 
2401 ATTAAACAAT TTAGTTCTAG TCTTAAAAAA AAAAAAAAAA AAAAAA/UUWV 
2451 AAA7UUVAAAA AAAAAAAAAA AAAAAAAAAA A7UUVAAAAAA AAAAAAA 


BLAST Results 
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No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 2 


ORF from 20 bp to 2125 bp; peptide length: 702 
Category: putative protein 


1 MMGDSMVKIN 
51 LPSHLKNGGD 
101 EVFWNTRIVP 
151 LLKTLCKLVD 
201 DSLIQNDSIL 
251 SKGAVEILIN 
301 SKFLNISALP 
351 RCYALFLNLI 
LLSLFQTFHQ 
IHPGVGPVLA 
501 KNSIIQDKKL 
551 NNVHRFMMAL 
601 CLRDLGPTDW 
651 DEELALDGSF 
701 SF 


401 
451 


GIYLTKSNAI 
QGKRHARASS 
ILRELEKEEN 
VGSDSLSLKL 
ESLLEVLRSE 
LIKQINENIK 
QLCTAMEQYK 
NKYQKKQDLV 
LDLHSQKPVG 
ANPGIVGLLL 
YIAELLLKLL 
LDAQHQDICF 
QLACLVCKTL 
DPDLKNYHKL 


CHLKSHPLQL 
CPSSSDLSRL 
lETVCAACTO 
AKTILALKVS 
DLQTNMEAFL 
KCGTFLPNSG 
GDKDVCTNIA 
VRVVFILGNL 
QRGEQHRAQR 
TTLEYKSLDD 
VSNNMDGILE 
SACGVLLNLT 
WNFSENITNA 
HWETEFKPVA 


TDDGGFSEIK 
QTKAVPKADL 
LHHALEEGNM 
RKNLLNVCKL 
YCMGSIKFIS 
HLLVQVTATL 
RIFSKLTSYR 
TAKNNQAREQ 
PPSEAEDVLI 
CEELVINATA 
AVRVFGNLSQ 
VDKDKRVILK 
SSCFGNEDTN 
QQLLNRIQRK 


EQEMFKGTTS 
QEEDAEIEVD 
LGNKFKGRSI 
IFKISRNEKN 
GNLGFLNEMI 
RNLVDSSLVR 
DCCTALASYS 
FSKEKGSIQT 
KLTRVLANIA 
TINNLSYYQV 
DHDVCDFIVQ 
EGGGIKKLVD 
TLLLLLSSFL 
HTFLEPLPIP 


BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_14p7, frame 2 

TREMBL:MMD367_1 product: "KAP3B"; Mus musculus mRNA for KAP3B, 
complete cds., N = 2, Score - 97, p = 0.00039 


>TREMBL:MMD367_1 product: "KAP3B"; Mus musculus mRNA for KAP3B, complete 
cds . 

Length « 772 


HSPs: 

Score « 97 (14.6 bits). Expect = 3.9e-04, Sum P{2) = 
Identities = 45/163 (27%), Positives = 77/163 (47%) 


3.9e-04 


Query: 442 LTRVLANIAIHPGVGPVLAANPGIVGLLLTTLEYKSLDDCEELVINATATTNNLSYYQVK 501 

L +++ NI+ H G P VG L + S D+ EE VI T+ NL+ + 
Sbjct: 483 LMKMIRNISQHDG— PTKNLFIDYVGDLAAQI SSDEEEEFVIECLGTLANLTIPDLD 537 

Query: 502 -NSIIQDKKLYIAELLLKLLVSNNMDG-ILEAVRVFGNLSQDHDVCDFIVQNNVHRFMMA 559 

++++ KL + L KL D +LE V + G +S D + ++ + ++ 

Sbjct: 538 WELVLKEYKL-VPFLKDKLKPGAAEDDLVLEWIMIGTVSMDDSCAALLAKSGIIPALIE 596 

Query: 560 LLDAQHQDICFSACGVLL NLTVDKDKR-VILKEGGGIKKLVDCLRD 604 

LL+AQ +D F C ++ + + R VI+KE L+D + D 

Sbjct: 597 LLNAQQEDDEF-VCQIIYVFYQMVFHQATRDVIIKETQAPAYLIDLMHD 644 

Score = 77 (11:6 bits). Expect 3.9e-04, Sum P(2) = 3,9e-04 
Identities » 42/178 (23%), Positives ^ 82/178 (46%) 

Query: 169 KLAKIILALKVSRKNLLNVCK-LIFKISRNEKNDSLIQNDSILESLLEVLRSEDLQTNME 227 

K K L V ++ LL V L+ ++ + + + ++N +1+ L++ L + N E 
Sbjct: 263 KTFKKYQGLVVKQEQLLRVALYLLLNLAEDTRTELKMRNKNIVHMLVKALDRD NFE 318 

Query: 228 AFLYCMGSIKFISGNLGFLNEMISKGAVEILINLIKQINENIKKCGTFLPNSGHLLVQVT 287 

+ + +K +S + N+M+ VE L+ +1 +E++ L + + 

Sbjct: 319 LLILWSFLKKLSIFMENKNDMVEMDIVEKLVKMIPCEHEDL LNITLR 366 

Query: 288 ATLRNLVDSSLVRSKFLNISALPQLCTAM—EQYKGDKDVCT— NIARI — FSKLTSYRD 341 

L D-l- L R+K + + LP+L + E YK +C +1+ F + +Y D 
Sbjct: 367 LLLNLSFDTGL-RNKMVQVGLLPKLTALLGNENYK-QIAMCVLYHISMDDRFKSMFAYTD 424 
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Query: 342 CCTAL 346 
C L 

Sbjct: 425 CIPQL 429 

Score = 69 {10.4 bits). Expect = 2.6e+00, Sum P(2) = 9.2e-01 
Identities = 35/146 (23%), Positives = 70/146 (47%) 

Query: 512 lAELLLKLLVSNNMDGILEAVRVFGNLSQDHDVCOriVQNNVHRFMMALLDAQHQDICFS 571 

I +L+K L +N + ++ V LS + + +V+ ++ ++ ++ +H+D+ 

Sbjct: 304 IVHMLVKALDRDNFELLILVVSFLKKLSIFMENKNDMVEMDIVEKLVKMIPCEHEDLLNI 363 

Query: 572 ACGVLLNLTVDKDKRVILKEGGGIKKLVDCLRDLGPTDW-QLACLVCKTLWNFSENITNA 630 

+LLNL+ D R + + G + KL L G ++ Q+A +C L++ S + 
SbDCt: 364 TLRLLLNLSFDTGLRNKMVQVGLLPKLTALL— -GNENYKQIA— MC-VLYHISMD-DRF 416 

Query: 631 SSCFGNEDT-NTLLLLLSSFLDEELALD 657 

S F D L+ +L DE + L+ 
Sbjct: 417 KSMFAYTDCIPQLMKMLFECSDERIDLE 44 4 

score = 68 (10.2 bits). Expect = 3.2e-03, Sum P{2) « 3.2e-03 
Identities = 18/58 (31%), Positives - 30/58 (51%) 

Query: 190 LIFKISRNEKN-DSLIQNDSILESLLEVLRSE DLQTNMEAFLYCMGSIKFISG 241 

LI +++RN N + L+ N++ L +L VLR + +L TN+ +C S G 

Sbjct: 155 LILQLARNPDNLEELLLNETALGALARVLREDWKQSVELATNIIYIFFCFSSFSHFHG 212 

Score = 65 (9.8 bits). Expect = 6.4e+00, Sum P(2) = l.Oe+00 
Identities = 26/122 (21%), Positives = 53/122 (43%) 

Query: 283 LVQVTATLRNL VDSSLVRSKFLNISALPQLCTAMEQYKGDKDVCTNIARIFSKLTS 338 

+++ TL NL +D LV ++ +p L ++ + D+ + I s 

Sbjct: 521 VIECLGTLANLTIPDLDWELVLKEY-— KLVPFLKDKLKPGAAEDDLVLEVV-IMIGTVS 576 

Query: 339 YRDCCTALASYSRCYALFLNLINKYQKKQDLVVRWFILGNLTAKNNQAREQFSKEKGSI 398 

D C AL + S + L+N Q+ + V +++++ + + R+ KE + 

Sbjct: 577 MDDSCAALLAKSGIIPALIELLNAQQEDDEFVCQIIYVFYQMVF-HQATRDVIIKETQAP 635 

Query: 399 QTLLSL 404 
L+ L 

Sbjct: 636 AYLIDL 641 

Score « 65 (9.8 bits). Expect - 6.4e+00, Sum P(2) =• l.Oe+00 
Identities - 44/177 (24%), Positives « 79/177 (44%) 

Query: 481 CE-ELVINATATIN-NLSYYQ-VKNSIIQDKKLYIAELLLKLLVSNNMDGILEAVRVFGN 537 

CE E ++N T + NLS+ ++N ++Q ++ LLL + N IA+V + 
Sbjct: 355 CEHEDLLNITLRLLLNLSFDTGLRNKMVQ VGLLPKLTALLGKENYKQI— AMCVLYH 409 

Query: 538 LSQDHDVCD-FIVQNNVHRFMMALLDAQHQDICFSACGVLLNLTVDKDKRVILKEGGGIK 596 
cw - ""^ ° F + + + M L + + I +NL +K ++ EG G+K 

Sb^ct: 410 ISMDDRFKSMFAYTDCIPQLMKMLFECSDERIDLELISFCINLAANKRNVQLICEGNGLK 469 

Query: 597 KLVDCLRDLGPTDWQLACLVCKTLWNFSENITNASSCFGNEDTNTLLLLLSSFLDEELAL 656 

R L D L+ K + N S++ + F + L +SS +EE + 

Sbjct: 470 MLMK— RALKLKD PLLMKMIRNISQHDGPTKNLF-IDYVGDLAAQISSDEEEEFVI 522 

Query: 657 D 657 
+ 

Sbjct: 523 E 523 

Score - 61 (9.2 bits). Expect - 1.6e-02, Sum P(2) = 1.6e-02 
Identities = 20/66 (30%), Positives ^ 34/66 (51%) 

Query: 304 lnisalpqlctam-eqykgdkdvctniarifskltsyrdcctalasysrcyalflnlink 362 

LN +AL L + E +K ++ TNI IF +S+ + Y + AL +N+I+ 

Sbjct: 171 LNETALGALARVLREDWKQSVELATNIIYIFFCFSSFSHFHGLITHY-KIGALCMNIIDH 229 

Query: 363 YQKKQDL 369 
K+ +L 

Sbjct: 230 ELKRHEL 236 


Pedant information for DKFZphtes3_14p7, frame 2 
Report for DKF2phtes3_14p7.2 


[LENGTH] 708 
IMW) 79266.35 
ipl] 6.57 
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(FUNCATl 

30.25 vacuolar and 

lysosomal organization {S. cerevisiae, YEL013wl 3e-04 

(FUNCATJ 

06.04 pcotein targeting, sorting and translocation [S. cerevisiae, YELOlBwl 

3e-04 



[FUNCATl 

09,25 vacuolar and 

lysosomal biogenesis [S. cerevisiae, yEL013wJ 3e-04 

[ BLOCKS ) 

BL00923F Aspartate 

and glutamate racemases proteins 

(BLOCKS J 

BL00288B Tissue inhibitors of inetalloproteinases proteins 

[PROSITEI 

MYRISTYL 9 


tPROSITE] 

AMI DAT I ON 1 


IPROSITEJ 

CK2 PHOSPHO SITE 

12 

[PROSITEJ 

PKC~PHOSPHO~SITE 

7 

fPROSITE] 

ASN~GL YCOS YLAT I ON 

11 

[KW] 

Alpha Beta 


fKWl 

LOW_COMPLEXITY 

7.49 % 


SEQ ESKETVMMGDSMVKINGIYLTKSNAICHLKSHPLQLTDDGGFSEIKEQEMFKGTTSLPSH 
SEG 

PRD cccceeeecccceeeccccccccceeeeecccccccccccccchhhhhhhhccccccccc 

SEQ LKNGGDQGKRHARASSCPSSSDLSRLQTKAVPKADLQEEDAEIEVDEVFWNTRIVPILRE 

SEG xxxxxxxxxx 

PRD cccccccchhhhhhcccccccchhhhhhhccccchhhhhhhhhhhcccccceeehhhhhh 

SEQ LEKEENIETVCAACTQLHHALEEGNMLGNKFKGRSILLKTLCKLVDVGSDSLSLKIAKII 

SEG xxxxxxxxxx 

PRD hhhhhcchhhhhhhhhhhhhhhhcccccccccccccchhhhhheeeeccccchhhhhhhh 

SEQ LALKVSRKNLLNVCKLIFKISRNEKNDSLIQNDSILESLLEVLRSEDLQTNMEAFLYCMG 

SEG xxxx 

PRD hhhhhhhhhhhhhhhhhccccccccccccccccchhhhhhhhhhhccchhhhhhhhhhcc 

SEQ SIKFISGNLGFLNEMISKGAVEILINLIKQINENI KKCGT FL PNSGH LL VQVTATL RN LV 

SEG 

PRD ceeeeccccchhhhhhhcchhhhhhhhhhhhhcccccccccccccceeeeeehhhhhhhh 

SEQ DSSLVRSKFLNISALPQLCTAMEQYKGDKDVCTNIARIFSKLTSYRDCCTALASYSRCYA 

SEG 

PRD ccchhhhheeeeccchhhhhhhhhhccccceeeehhhhhhhhhhcccchhhhhhhhhhhh 

SEQ LFLNLINKYQKKQDLVVRVVFILGNLTAKNNQAREQFSKEKGSIQTLLSLFQTFHQLDLH 

SEG 

PRD hhhhhhhhhhhhhhhheeeeeeeccccccchhhhhhhhhhhchhhhhhhhhhhhhhhhcc 

SEQ SQKPVGQRGEQHRAQRPPSEAEDVLIKLTRVLANIAIHPGVGPVLAANPGIVGLLLTTLE 

SEG 

PRD ccccccccccccccccccccchhhhhhhhhhhhhhhccccccceeeccccchhhhhhhhh 

SEQ YKSLDDCEELVINATATINNLSYYQVKNSIIQDKKLYIAELLLKLLVSNNMDGILEAVRV 

SEG xxxxxxxxxxxxx 

PRD hhccccchhhhhhhhheeeecccccccceeeehhhhhhhhhhhhhhhccccchhhhhhhh 

SEQ FGNLSQDHDVCDFIVQNNVHRFMMALLDAQHQDICFSACGVLLNLTVDKDKRVILKEGGG 

SEG 

PRD cccccccccceeeeeecchhhhhhhhhhhhcccceeeecceeeeeeecccceeeeecccc 

SEQ I KKLVDCLRDLGPTDWQLACLVCKTLWNFSENITNASSCFGNEDTNTLLLLLSSFLDEEL 

SEG xxxxxxxxxxxxx 

PRD hhhhhhhhhccccccccchhhhhhhhccccccccccccccccccccceeeehhhhhhhhh 

SEQ ALDGSFDPDLKNYHKLHWETEFKPVAQQLLNRIQRHHTFLEPLPIPSF 

SEG xxx 

PRD hhccccccccchhhhhhhhhhchhhhhhhhhhhhhhhheeeecccccc 


Prosite for DKFZphtes3_14p7.2 


PSOOOOl 

206- 

->210 

ASN 

GLYCOSYLATION 

PDOCOOOOl 

PSOOOOl 

212' 

->216 

ASN 

"glycosylation 

PDOCOOOOl 

PSOOOOl 

311- 

->315 

ASN GLYCOSYLATION 

PDOCOOOOl 

PSOOOOl 

385- 

->389 

asn"glycosylation 

PDOCOOOOl 

PSOOOOl 

493- 

->497 

ASN 

GLYCOSYLATION 

PDOCOOOOl 

PSOOOOl 

500- 

->504 

asn' 

'GLYCOSYLATION 

PDOCOOOOl 

PSOOOOl 

543- 

->547 

asn" 

"GLYCOSYLATION 

PDOCOOOOl 

PSOOOOl 

584->58a 

asn" 

'GLYCOSYLATION 

PDOCOOOOl 

PSOOOOl 

628- 

->632 

asn' 

"GLYCOSYLATION 

PDOCOOOOl 

PSOOOOl 

632->636 

asn" 

"GLYCOSYLATION 

PDOCOOOOl 

PSOOOOl 

635- 

->639 

asn" 

'GLYCOSYLATION 

PDOCOOOOl 

PS00005 

173- 

->176 

PKC" 

'PHOSPHO SITE 

PDOC00005 

PS000Q5 

186- 

->189 

PKC' 

'PHOSPHO SITE 

PPOC00005 

PS00005 

241- 

■>244 

PKC" 

' PHOSPHO SITE 

PDOC00005 


576 


wo 01/12659 


PSUU005 

295«>298 

PKC PHOSPHO 

SITE 

PDOC00005 

PS00005 

344->347 

PKCPHOSPHO' 

"site 

PDOC00005 

PS00005 

387->390 

PKCPHOSPHO' 

"site 

PDOC00005 

PS00005 

421->424 

PKC PHOSPHO' 

"site 

PDOC00005 

PS00006 

79->83 

CK2 PHOSPHO" 

"site 

PDOC00006 

PS00006 

201->205 

CK2 PHOSPHO' 

"site 

PDOC00006 

PS00006 

214->218 

CK2 PHOSPHO' 

"site 

PDOC00006 

PS00006 

218->222 

CK2 PHOSPHO' 

"site 

PDOC00006 

PS00006 

230->234 

CK2 PHOSPHO' 

"site 

PDOC00006 

PS00006 

320->324 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

344->348 

CK2 PHOSPHO 

site 

PDOC00006 

PS00006 

439->443 

CK2 PHOSPHO" 

"site 

PDOC00006 

PS00006 

477->481 

CK2 PHOSPHO" 

"site 

PDOC00006 

PSOO0O6 

483->487 

CK2 PHOSPHO" 

'site 

PDOC00006 

PS00006 

654->658 

CK2 PHOSPHO 

"site 

PDOC00006 

FbUUOUo 

698->702 

CK2 PHOSPHO" 

site 

PDOC00006 

PS00008 

17->23 

MYRISTYL 

PDOC00008 

PSOO0O8 

64->70 

MYRISTYL 


PDOC00008 

PS00008 

144->150 

MYRISTYL 


PDOC00008 

PS00008 

384->390 

MYRISTYL 


PDOC00008 

PSOO0O8 

402->408 

MYRISTYL 


PDOC00008 

PS00008 

473->479 

MYRISTYL 


PDOC00008 

PS00008 

533->539 

MYRISTYL 


PDOC00008 

PS00008 

580->586 

MYRISTYL 


PDOC00008 

PS00008 

641->647 

MYRISTYL 


PDOC00008 

PS00009 

67->71 

AMIDATION 


POOC00009 


(No Pfam data available for DKFZphtes3_14p7.2) 
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DKF2phtes3_15al3 


group: testes derived 

DKF2phtes3_15al3 encodes a novel 387 amino acid protein with weak similarity to S.cerevisiae 
Hopl. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 
genes. 

similarity to S.cerevisiae Hopl 

complete cDNA, complete cds, potential start codon at Bp 116, 3 EST 
hits 

S.cerevisiae Hoplp is a meiosis-specific protein 

Sequenced by GBF 

Locus: unknown 

Insert length: 1848 bp 

Poly A stretch at pos. 1766, no polyadenylation signal found 

1 GGAAAGCGCA TGCGCGTCGG GCACAGCGCG TGCAGCCTCG TGCAGCTCTT 
51 CTGGTCTCCG GCGCCCGCCC CTCAGACGTA ATGTTGAATT AAAGAAAATA 

101 CTTTATCAGA AGAAGATGGC CACTGCCCAG TTGCAGAGGA CTCCCATGAG 

151 TGCACTGGTA TTTCCCAATA AGATATCAAC TGAACACCAG TCTTTGGTGT 

201 TAGTGAAGAG GCTTCTAGCA GTTTCAGTAT CCTGTATCAC GTATTTGAGG 

251 GGAATATTCC CAGAATGCGC TTATGGAACA AGATATCTAG ATGATCTTTG 

301 TGTCAAAATA CTGAGAGAAG ATAAAAATTG CCCAGGATCT ACACAGTTAG 

351 TGAAATGGAT GCTAGGATGT TATGATGCTT TACAGAAAAA ATATGTATAC 

4 01 ACAAACCCAG AAGATCCTCA GACAATTTCA GAATGTTACC AATTCAAATT 

4 51 CAAATACACC AATAATGGAC CACTCATGGA CTTCATAAGT AAAAACCAAA 

501 GCAACGAATC TAGCATGTTG TCTACTGACA CCAAGAAAGC AAGCATTCTC 

551 CTCATTCGCA AGATTTATAT CCTAATGCAA AATCTGGGGC CTTTACCTAA 

601 TGATGTTTGT TTGACCATGA AACTTTTTTA CTATGATGAA GTTACACCCC * 

651 CAGATTACCA GCCTCCCGGT TTTAAGGATG GTGATTGTGA AGGAGTTATA 

701 TTTGAAGGGG AACCTATGTA TTTAAATGTG GGAGAAGTCT CAACACCTTT 

751 TCACATCTTC AAAGTAAAAG TGACCACTGA GAGAGAACGA ATGGAAAATA 

801 TTGACTCAAC TATACTATCA CCAAAACAAA TAAAAACACC ATTTCAAAAA 

851 ATCCTGAGGG ACAAAGATGT AGAAGATGAA CAGGAGCATT ATACAAGTGA 

901 TGATTTGGAC ATTGAAACTA AAATGGAAGA ACAGGAAAAA AACCCTGCAT 

951 CTTCTGAACT TGAAGAACCA AGTTTAGTTT GTGAGGAAGA TGAAATTATG 
1001 AGGTCTAAAG AAAGTCCAGA TCTTTCTATT TCTCATTCTC AGGTTGAGCA 
1051 GTTAGTCAAT AAAACATCTG AACTTGATAT GTCTGAAAGC AAAACAAGAA 
1101 GTGGAAAAGT CTTTCAGAAT AAAATGGCAA ATGGAAATCA ACCAGTAAAA 
1151 TCTTCCAAAG AAAATCGGAA GAGAAGTCAA CATGAATCTG GGAGAATAGT 
1201 CCTCCATCAC TTTGATTCTT CTAGTCAAGA GTCAGTGCCA AAAAGGAGAA 
1251 AGTTTAGTGA ACCAAAGGAA CATATATAAA AATTATTTTT GTTCTGCAGG 
1301 CTTGCAGAGT TCTTCTCACC ATTTAAACTG AAGGACCCTA TATTATATTT 
1351 CCCTAACTCT GAAGATGTAT ATGTAGTTTA AAGCAGTTTG TACACTAAAA 
1401 CTAAGTTTTT GGCTGACTGT CATATTGTGG TCCTTAATCT TGAGATAAAT 
14 51 CCAATAGAAC TTTTGAATAA AAGCAAAAGT ACAAATGTCA TAATTGATTC 
1501 GGTAATAAGT AAAATTTCAA AATTGATTTT GTTCATTACC TACTTAATAT 
1551 TTCCTTTAAA TATATACTAA CTGTTAAGGC CCTCTAATGC CATTTTTCTA 
1601 AACAGTAATG TTTACTTTGG TATTAAAATT TGGTATGGAT TCACTTTTTA 
1651 CTTATGTTAA AATTATACCA TTTAACTGGC TCTTTTGTCA TTGTGCTGTT 
1701 ATTAAAACAA TGTTCTTCAA TATTTTGACA TAATGTATTA ACATTTTAAT 
1751 ATATAATGTA CAATTTAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAGG 
1801 GGCGGCCGCT CTAGAGGATC CAAGCTTACG TACAAAAAAA AAAAAAGG 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 
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Peptide information for frame 2 


ORF from 116 bp to 1276 bp; peptide length: 387 
Category: similarity to known protein 


1 MATAQLQRTP MSALVFPNKI STEHQSLVLV KRLLAVSVSC ITYLRGIFPE 

51 CAYGTRYLDD LCVKILREDK NCPGSTQLVK WMLGCYDALQ KKYVYTNPED 

101 PQTISECYQF KFKYTNNGPL MDFISKNQSN ESSMLSTDTK KASILLIRKI 

151 YILMQNLGPL PNDVCLTMKL FYYDEVTPPD YQPPGFKDGD CEGVIFEGEP 

201 MYLNVGEVST PFHIFKVKVT TERERMENID STILSPKQIK TPFQKILRDK 

251 DVEDEQEHYT SDDLDIETKM EEQEKNPASS ELEEPSLVCE EDEIMRSKES 

301 PDLSISHSQV EQLVNKTSEL DMSESKTRSG KVFQNKMANG NQPVKSSKEN 

351 RKRSQHESGR IVLHHFDSSS QESVPKRRKF SEPKEHI 

BLASTP hits 

No BLASTP hits available 


Alert BLASTP hits for DKF2phtes3_15al3, frame 2 

TREMBL:ATAC2130_3 product: -F1N21.3"; The sequence of BAG F1N21 from 
Arabidopsis thaliana chromosome 1, complete sequence., N = 1. Score ^ 
274, P = 5.7e-22 , e 

TREMBL:SC9877_9 gene: "hopl"; S.cerevisiae chromosome IX cosmid 9877.. 
N « 2, Score = 126, P = 7.1e-09 

PIR:A34691 meiosis-specif ic protein HOPl - yeast {Saccharomyces 
cerevisiae), N = 2, Score = 126, P = 7.8e-08 


>TREMBL:ATAC2130_3 product: "F1N21.3" 
Arabidopsis thaliana chromosome 
Length = 562 

HSPs: 


The sequence of BAG F1N21 from 
complete sequence. 


Score = 274 (41.1 bits), Expect = 5.7e-22, P = 5.7e-22 
Identities - 84/290 (28%), Positives = 145/290 (50%) 


Query: 

22 

Sbjct : 

11 

Query: 

82 

Sbjct: 

71 

Query: 

131 

Sbjct: 

130 

Query: 

185 

Sbjct: 

190 

Query: 

236 

Sbjct: 

249 


TE SL+L + LL +++ I+Y+RG+FPE + + + L +KI + 


S +L+ W 


M G YDALQ+KY+ T 


- N PEDPQT I S EC YQFKFKYTNNG P- 
D I E Y F F Y+++ 


-LMDFISK- 
+M I++ 


-NQSN 130 
N+ N 


ST 


-DTKKAS I LLI RK I Y I LMQNLGPLPN DVCLTMKLFYYDEVTPPDYQPP 184 
+ ++ ++R + LM+ L +P++ + MKL YYD+VTPPDY+PP 


-GDCEGVI FEGEPMYLNVGEVSTPFHI FKVKVTT- 
D ++ P+ + +G V++ + +KV + 


-ERERMENIDSTILS 235 
E + M++ D + 


D ++ 


QE+ 


-TSDDLDIETKMEEQEKNPASSE 281 
DD D E ++ ++PA +E 


Pedant information for DKFZphtes3_15al3, frame 2 


Report for DKFZphtes3_15al3.2 


[LENGTH] 387 

EMW) 44417.64 

[pi] 5.57 

(HOMOL) TREMBL:ATAC2130_3 product: "F1N21.3''; The sequence of BAG F1N21 from 
Arabi-dopsis thaliana chromosome 1, complete sequence. 9e-23 

[FUNCATJ 09.13 biogenesis of chromosome structure [S. cerevisiae, YIL072wl 7e-ll 

[FUNCATj 03.19 recombination and dna repair (S. cerevisiae, yiL072w) 7e-H 

[FUNCAT] 03.13 meiosis [S. cerevisiae, YIL072w) 7e-ll 

IFUNCAT] 30.10 nuclear organization (s. cerevisiae, YIL072w] 7e-ll 

tPIRKW] nucleus 2e-09 

(PIRKWJ zinc finger 2e-09 
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[PIRKWJ DNA binding 2e-09 

[PROSITEJ MYRISTYL 1 

[PROSITE) CAMP_PHOSPHO SITE 3 

tPROSITE] CK2_PH0SPH0_SITE 12 

( PROSITE] PKC_PH0SPHO_SITE 7 

[PROSITE) ASN_GLYCOSYLATION 3 

[KWJ Alpha Beta 


SEQ MATAQLQRTPMSALVFPNKISTEHQSLVLVKRLLAVSVSCITYLRGIFPECAYGTRYLDD 

PRO cccccccccccccccccccchhhhhhhhhhhhhhhhhhhhheeeeecccccccccccchh 

SEQ LCVKILREDKNCPGSTQLVKWMLGCYDALQKKYVYTNPEDPQTISECYQFKFKYTNNGPL 

PRD hhhhhhhccccccccccccccccchhhhhhhhhhhcccccccchhhhhheeeeeccccce 

SEQ MDFISKNQSNESSMLSTDTKKASILLIRKIYILMQNLGPLPNDVCLTMKLFYYDEVTPPD 

PRD eeeecccccccceeecccchhhhhhhhhhhhhhhhhcccccccccceeeeeeeeeccccc 

SEQ YQPPGFKDGDCEGVIFEGEPMYLNVGEVSTPFHIFKVKVTTERERMENIDSTILSPKQIK 

PRD cccccccccccceeeeeccceeeeeccccccceeeeeecccchhhhhcccccccccchhh 

SEQ TPFQKILRDKDVEDEQEHYTSDDLDIETKMEEQEKNPASSELEEPSLVCEEDEIMRSKES 

PRD ^^^^hhhhhhhhhhhhhhhhhcccchhhhhhhhhhhcccccccccccccchhhhhhhhhhcc 

SEQ 
PRD 


PDLSISHSQVEQLVNKTSELDMSESKTRSGKVFQNKMANGNQPVKSSKENRKRSQHESGR 
ccccccchhhhhhhhhhcccccccccccccceeeeeccccccccchhhhhhhhhhcccce 


SEQ IVLHHFDSSSQESVPKRRKFSEPKEHI 
PRD eeeeecccccccccccccccccccccc 


Prosite for DKFZphtes3_15al3.2 

PSOOOOi 127->131 ASNGLYCOSYLATION PDOCOOOOl 

PSOOOOl 130->i34 ASN_GLYCOSYLATION PDOCOOOOl 

PSOOOOI 315->319 ASN_GLYCOSYLATION PDOCOOOOl 

PS00004 140->144 CAMP_PH0SPHO_SITE PDOC00004 

PS00004 351->355 CAMP_PHOSPHO SITE PDOC00004 

PS00004 378->382 CAMP_PHOSPHO SITE PDOC00004 

PS00005 139->142 PKC_PHOSPHO_SITE PDOC00005 

PS00005 167->170 PKC_PHOSPHO_SITE PDOC00005 

PS00005 221->224 PKC_PHOSPHO_SITE PDOC00005 

PS00005 235>>238 PKC_PH0SPHO_SITE PDOC00005 

PS00005 329->332 PKC_PHOSPHO_SITE PDOC00005 

PS00005 346->349 PKC_PHOSPHO_SITE PDOC00005 

PS00005 358->361 PKC_PHOSPHO_SITE PDOC00005 

PS00006 96->100 CK2_PH0SPH0_SITE PDOC00006 

PS00006 103->107 CK2_PH0SPHO_SITE PDOC00006 

PS00006 177->181 CK2_PH0SPH0_SITE PDOC00006 

PS00006 221->225 CK2 PHOSPHO^SITE PDOC00006 

PS00006 260->264 CK22pHOSPHO SITE PDOC00006 

PS00006 268->272 CK2_PHOSPHO_SITE PDOC00006 

PS00006 280->284 CK2_PH0SPH0__SITE PDOC00006 

PS00006 308->312 CK2_PHOSPH0_SITE PDOC00006 

PS00006 318->322 CK2_PH0SPH0_SITE PDOC00006 

PS00006 346->350 CK2_PHOSPHO_SITE PDOC00006 

PS00006 354->358 CK2_PHOSPHO SITE PDOC00006 

PS00006 369->373 CK2 PHOSPHO"'siTE PDOCOOOOe 

PS00008 B4->90 MYRISTYL ~ FDOC00008 


(No Pfam data available for DKFZphtes3 15al3.2) 


580 


wo 01/12659 

DKF2phtes3_15c24 


PCT/lBOO/01496 


group: metabolism 

DKFZphtes3_15c24 encodes a novel 404 amino acid protein with strong similarity to 2- 
hydroxyacid dehydrogenases. 

The novel protein contains a D-isomer specific 2-hydroxyacid dehydrogenases signature. 
Proteins with such a signature have similar enzymatic activities: D-lactate dehydrogenase (EC 

1.1.1.28) , catalyzes the reduction of D-lactate to pyruvate. D-glycerate dehydrogenase (EC 

1.1.1.29) catalyzes the reduction of 

hydroxypyruvate to glycerate. 3-phosphoglycerate dehydrogenase (EC 1.1,1.95), catalyzes the 
oxidation of D-3~phosphoglycerate to 3-phosphohydroxypyruvate . 
Therefore the novel protein is a new 2-hydroxyacid dehydrogenase. 

The new protein can find application in modulation of 2-hydroxyacid dehydrogenases-dependent 
pathways and as a new enzyme for biotechnologic production processes. 


strong similarity to C.eiegans T03F1.1 

potential start at Bp 55 matches kozak consensus PyCCatgG 

Sequenced by GBF 

Locus : unknown 

Insert length: 1956 bp 

Poly A stretch at pos. 1929, polyadenylation signal at pos, 1903 


1 CGAAGGCGGC GGCGAAGGCC CGGGCTGGGA GCGTTGGCGG CCGGAGTCCC 
51 AGCCATGGCG GAGTCTGTGG AGCGCCTGCA GCAGCGGGTC CAGGAGCTGG 
101 AGCGGGAACT TGCCCAGGAG AGGAGTCTGC AGGTCCCGAG GAGCGGCGAC 
151 GGAGGGGGCG GCCGGGTCCG CATCGAGAAG ATGAGCTCAG AGGTGGTGGA 
201 TTCGAATCCC TACAGCCGCT TGATGGCATT GAAACGAATG GGAATTGTAA 
251 GCGACTATGA GAAAATCCGT ACCTTTGCCG TAGCAATAGT AGGTGTTGGT 
301 GGAGTAGGTA GTGTGACTGC TGAAATGCTG ACAAGATGTG GCATTGGTAA 
351 GTTGCTACTC TTTGATTATG ACAAGGTGGA ACTAGCCAAT ATGAATAGAC 
401 TTTTCTTCCA ACCTCATCAA GCAC^GATTAA GTAAAGTTCA AGCAGCAGAA 
451 CATACTCTGA GGAACATTAA TCCTGATGTT CTTTTTGAAG TACACAACTA 
501 TAATATAACC ACAGTGGAAA ACTTTCAACA TTTCATGGAT AGAATAAGTA 
551 ATGGTGGGTT AGAAGAAGGA AAACCTGTTG ATCTAGTTCT TAGCTGTGTG 
601 GACAATTTTG AAGCTCGAAT GACAATAAAT ACAGCTTGTA ATGAACTTGG 
651 ACAAACATGG ATGGAATCTG GGGTCAGTGA AAATGCAGTT TCAGGGCATA 
701 TACAGCTTAT AATTCCT(3GA GAATCTCKTTT GTTTTGCGTG TGCTCCACCA 
751 CTTGTAGTTG CTGCAAATAT TGATGAAAAG ACTCTGAAAC GAGAAGGTGT 
801 TTGTGCAGCC AGTCTTCCTA CCACTAT(^G TGTGGTTGCT GGGATCTTAG 
851 TACAAAACGT GTTAAAGTTT CTGTTAAATT TTC^GTACTGT TAGTTTTTAC 
901 CTTGGATACA ATGCAATGCA GGATTTTTTT CCTACTATGT CCATGAAGCC 
951 AAATCCTCAG TGTGATGACA GAAATTGCAG GAAGCAGCAG GAGGAATATA 
1001 AGAAAAAGGT AGCAGCACTG CCTAAACAAG AGGTTATACA AGAAGAGGAA 
1051 GAGATAATCC ATGAAGATAA TGAATGGGGT ATTGAGCTGG TATCTGAGGT 
1101 TTCAGAAGAG GAACTGAAAA ATTTTTCAt^ TCCAGTTCC:A GACTTACCTG 
1151 AAGGAATTAC AGTGGCATAC ACAATTCCAA AAAAGCAAGA AGATTCTGTC 
1201 ACTGAGTTAA CAGTGGAAGA TTCTGGTGAA AGCTTGG/^G acctcatg(k: 
1251 CAAAATGAAG AATATGTAGA TAATGGACTG GGATATATTG TATTTCTCAT 
1301 GTTAAAGCCT CTTCCCTTGA AATTAAAAAA AAATTTTAAC TGATAAAACT 
1351 TAGGGCAACA TTAATTAATG TATATTCTTA CCTGAATTGT TATACTTTTT 
1401 GAAAATCCTG TGACTTGCCT GTTTCTCCCC GCTCCAACGA AATCATTAAC 
1451 TCTCCTAAAA TGTGTTTCAT TCTAGTAAGA AAACCTCAAA GGATATTGTA 
1501 GGATATAAAT CTTACTTGAA AACATAGCTG TTGAAATGTT TTGGCCTTTT 
1551 GGAGTGGGGG AAGGACAAAT CTGATCCTGT AATCTTTTTC TTTCCAGTAA 
1601 TCCCTTGTGT CTGTTGCATG AGGACATGGA CAATAAAGTA GTATATGATC 
1651 CTCAGATACA GGGAGAAGGA CAAGGCATAC AGCTTATTGA TTAGAGCTGG 
1701 CAAGCATCTG CTCATTATGT TTGGAATTGC TTTCTATAAG AAAATTGCCC 
1751 ACTACTACTA ACTTGATCAA CAATGAATTC AAAATAGTTA ACCTATGAAA 
1801 TAACATCCTC TCAAATGTTT GCTGATGAAG TACAAGTTGA AATGTAGTTA 
1851 TTGGAAAAGT CTGTAACCTG TGGATCATAT ATATTCAAAG TGAGACAAAG 
1901 GCAAATAAAA AGCAGCTATT TTCATGAATA (yvCAAAAAAA AAAAAAAAAA 
1951 AAAAAG 


BLAST Results 


No BLAST result 
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Medline entries 


No Medline entry 


Peptide Information for frame 1 


ORF from 55 bp to 1266 bp; peptide length: 404 
Category: similarity to unknown protein 
Classification: Metabolism 

Prosite motifs: D_2_HYDR0XYACID_DH_1 (76-105) 


1 MAESVERLQQ RVQELERELA QERSLQVPRS GDGGGGRVRI EKMSSEWDS 
51 NPYSRLMALK RMGIVSDYEK IRTFAVAIVG VGGVGSVTAE MLTRCGIGKL 
101 LLFDYDKVEL ANMNRLFFQP HQAGLSKVQA AEHTLRNINP DVLFEVHNYN 
151 ITTVENFQHF MDRISNGGLE EGKPVDLVLS CVDNFEARMT INTACNELGQ 
201 TWMESGVSEN AVSGHIQLII PGESACFACA PPLVVAANID EKTLKREGVC 
251 AASLPTTMGV VAGILVQNVL KFLLNFGTVS FYLGYNAMQD FFPTMSMKPN 
301 PQCDDRNCRK QQEEYKKKVA ALPKQEVIQE EEEIIHEDNE WGIELVSEVS 
351 EEELKNFSGP VPDLPEGITV AYTIPKKQED SVTELTVEDS GESLEDLMAK 
401 MKNM 

BLASTP hits 

Ko BLASTP hits available 

Alert BLASTP hits for DKF2phtes3_15c24, frame 1 

TREMBL:CEUT03F1_11 gene: -T03F1.1"; Caenorhabditis elegans cosmid 
T03F1., N - 1, Score = 1204, P « 1.9e-122 

TREMBL:ATAC98_3 gene: "YOP8H12 . 3**; Arabidopsis thaliana chromosome 1 
YAC yUP8H12 complete sequence., N = 1, Score = 733, P = 1.5e-72 

PIR:A69319 thiamine biosynthesis protein (thiF) horaolog - Archaeoglobus 
fulgidus, N 1, Score « 218, P = 1.8e-17 

TREMBL:AF0227964 gene: "moeB"; product: "MoeB"; Staphylococcus 
carnosus molybdenum cofactor biosynthetic gene cluster, complete 
sequence., N = 1, Score = 220, P = 3.7e-16 


>TREMBL:CEOT03F1_11 gene: "T03F1.1"; Caenorhabditis elegans cosmid T03F1. 
Length 419 

HSPs: 

Score 1204 (180.6 bits). Expect = 1.9e-122, P * 1.9e-122 
Identities = 241/367 (65%), Positives ^ 293/367 (79%) 

Query: 37 RVRIEKMSSEVVDSNPYSRLMALKRMGIVSDYEKIRTFAVAIVGVGGVGSVTAEMLTRCG 96 

R +IEK+S+EVVDSNPYSRLMAL+RMGIV++YE+IR VA+VGVGGVGSV AEMLTRCG 
Sbjct: 48 RQKIEKLSAEWDSNPYSRLMALQRMGIVNEYERIREKTVAVVGVGGVGSVVAEMLTRCG 107 

Query: 97 IGKLLLFDYDKVELANMNRLFFQPHQAGLSKVQAAEHTLRNINPDVLFEVHNYNITTVEN 156 
IGKL+LFDYDKVE+ANMNRLF4QP+QAGLSKV+AA TL ++NPDV EVHN+NITT++N 

Sbjct: 108 IGKLILFDYDKVEIANMNRLFYQPNQAGLSKVEAARDTLIHVNPDVQIEVHNFNITTMDN 167 

Query: 157 FQHFMDRISNGGLEEGKPVDLVLSCVDNFEARMTINTACNELGQTWMESGVSENAVSGHI 216 

F F++RI G L +GK +DLVLSCVDNFEARM +N ACNE Q WMESGVSENAVSGHI 
Sbjct: 168 FDTFVNRIRKGSLTDGK-IDLVLSCVDNFEARMAVNhlACNEENQIWMESGVSENAVSGHI 226 

Query: 217 QLIIPGESACFACAPPLWAANIDEKTLKREGVCAASLPTTMGWAGILVQNVLKFLLNF 276 

Q I PG++ACFAC PPLVVA+ IDE+TLKR+GVCAASLPTTM WAG LV N LK+LLNF 
Sbjct: 227 QYIEPGKTACFACVPPLVVASGIDERTLKRDGVCAASLPTTMAWAGFLVMNTLKYLLNF 286 

Query: 277 GTVSFYLGYNAMQDFFPTMSMKPNPQCDDRNCRKQQEEYKKKVAALPKQ-EV-IQEEEEI 334 

G VS Y+GYNA+ DFFP S+KPNP ODD -l-C ++Q+EY++KVA P EV + EEE + 
Sbjct: 287 GEVSQYVGYNALSDFFPRDSrKPNPYCDDSHCLQRQKEYEEKVANQPVDLEVEVPEEETV 346 

Query: 335 IHEDNEWGIELVSEVSEEELKNFSGPVPDLPEGITVAYTIPKKQEDSVTELTVEDSGESL 394 

+HEDNEWGIELV+E SE + S + G+ AY P K+ D+ TEL+ + + 
Sbjct: 347 VHEDNEWGIELVNE-SEPSAEQSSSL— NAGTGLKFAYE-PIKR-DAQTELSPAQA— AT 399 

Query: 395 EDLMAKMKN 403 
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D M +K+ 
Sbjct: 400 HDFMKSIKD 408 

Pedant information for DKFZphtes3_15c24, frame 1 

Report for DKFZphtes3_15c24 . 1 

[LENGTH] 404 

IMW] 44863.36 

Ipl] 4.79 

[HOMOL] TREMBL:CEUT03F1_11 gene: "T03F1.1"; Caenorhabditis elegans cosmid T03F1. le-115 

(FUNCAT) h cofactor metabolism (H. influenzae, HI1449J 2e-08 

[FUNCAT] 06,07 protein modification (glycolsylation, acylation, myristylation, 

palmitylation, farnesylation and processing) [S. cerevisiae, YDR390c UBA2 - El-like] 

4e-07 

[FUNCAT] 04.05-05 mrna processing (5'-end, 3*-end processing and mrna degradation) (S. 

cerevisiae, YDR390c UBA2 - El-like] 4e-07 

(FUNCAT] 06.13.01 cytoplasmic degradation [S. cerevisiae, yDR390c UBA2 - El-like] 

4e-07 

[FUNCAT] 30-10 nuclear organization (S. cerevisiae, YDR390c UBA2 - El-like J 4e-07 

(FUNCAT] 11.01 stress response [S. cerevisiae, yKL210w UBAl - El-like) 2e-06 

(FUNCAT) 30.03 organization of cytoplasm (S. cerevisiae, YKL210w UBAl - El-like] 

2e-06 

[BLOCKS] BL01042A Homoserine dehydrogenase proteins 

[PIRKW] thiamine pyrophosphate le-07 

[PIRKWJ molybdenum 5e-07 

{PIRKW] molybdopterin biosynthesis 5e-07 

(SUPFAM) molybdopterin biosynthesis protein moeB 2e-12 

[PROSITE] D_2_HYDR0XYACID_DH 1 1 

[KW] TRANSMEMBRANE 1 

IKW) LOW_COMPLEXITy 8.66 % 

SEQ MAESVERLQQRVQELERELAQERSLQVPRSGDGGGGRVRIEKMSSEWDSNPYSRLMALK 
SEG 

PRD ccchhhhhhhhhhhhhhhhhhhhhhhcccccccccccceeeccccccccccchhhhhhhc 
MEM 


SEQ RMGIVSDYEKIRTFAVAIVGVGGVGSVTAEMLTRCGIGKLLLFDYDKVELANMNRLFFQP 

SEG xxxxxxxxx 

PRD cccccchhhhhhhheeeeecccccchhhhhhhhhhcccceeeecccccchhhhhhhhhhc 

MEM MMMMMMMMMMMMM>IMMMMMMMM 

SEQ HQAGLSKVQAAEHTLRNINPDVLFEVHNYNITTVENFQHFMDRISNGGLEEGKPVDLVLS 

SEG 

PRD ccccchhhhhhhhhhhhccccceeeeeccccccchhhhhhhhhhhcccccccccceeeee 


MEM 


SEQ CVDNFEARMTINTACNELGQTWMESGVSENAVSGHIQLIIPGESACFACAPPLWAANID 
SEG 

PRD cccchhhhhhhhhhhhhhccccccccccccccccceeeeccccccceeeccccccccccc 


MEM 


SEQ EKTLKREGVCAASLPTTMGVVAGILVQNVLKFLLNFGTVSFYLGYNAMQDFFPTMSMKPN 

SEG 

PRD ccccccccccccccccchhhhhhhhhhhhhhhhhcccceeeccccccccccccccccccc 

MEM 

SEQ PQCDDRNCRKQQEEYKKKVAALPKQEVIQEEEEIIHEDNEWGIELVSEVSEEELKNFSGP 

SEG xxxxxxxxxxxxxxx . . . xxxxxxxxxxx 

PRD ccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccceeeeeehhhhhhhhhcccc 

MEM 

SEQ VPDLPEGITVAYTI PKKQEDSVTELTVEDSGESLEDLMAKMKNM 

SEG 

PRD ccccccceeeeeeehhhhhhhheeeeeccccchhhhhhhhhccc 

MEM 


Prosite for DKFZphtes3_15c24 . 1 
PS00065 7 6- > 105 D_2_HYDR0XYACID_DH_1 PDOC00063 

(No Pfam data available for DKF2phtes3_15c24 . 1) 


583 


wo 01/12659 


PCT/IBOO/01496 


DKFZphtes3_15c6 


group: transmembrane protein 

DKF2phtes3_15c6 encodes a novel 118 amino acid protein without similarity to known proteins. 
The novel protein contains 1 transmembrane region.' 

No informative BLAST results; No predictive prosite, pfara or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes and as a new marker for testicular cells. 


unknown 

complete cDNA, complete cds, EST hits 
Sequenced by GBF 
Locus : unknown 

Insert length: 1283 bp 

Poly A stretch at pes. 1264, no polyadenylation signal found 


1 GAGACACTGA GCCCCGAGAC AGTGAGTGGT GGCCTCACTG CTCTGCCCGG 
51 CACCCTGTCA CCTCCACTTT GCCTTGTTGG AAGTGACCCA GCCCCCTCCC 
101 CTTCCATTCT CCCACCTGTT CCCCAGGACT CACCCCAGCC CCTGCCTGCC 
151 CCTGAGGAAG AAGAGGCACT CACCACTGAG GACTTTGAGT TGCTGGATCA 
201 GGGGGAGCTG GAGCAGCTGA ATGCAGAGCT GGGCTTGGAG CCAGAGACAC 
251 CGCCAAAACC CCCTGATGCT CCACCCCTGG GGCCCGACAT CCATTCTCTG 
301 GTACAGTCAG ACCAAGAAGC TCAGGCCGTG GCAGAGCCAT GAGCCAGCCG 
351 TTGAGGAAGG AGCTGCAGGC ACAGTAGGGC TTCCTGGCTA GGAGTGTTGC 
401 TGTTTCCTCC TTTGCCTACC ACTCTGGGGT GGGGCAGTGT GTGGGGAAGC 
451 TGGCTGTCGG ATGGTAGCTA TTCCACCCTC TGCCTGCCTG CCTGCCTGCT 
501 GTCCTGGGCA TGGTGCAGTA CCTGTGCCTA GGATTGGTTT TAAATTTGTA 
551 AATAATTTTC CATTTGGGTT AGTGGATGTG AACAGGGCTA GGGAAGTCCT 
601 TCCCACAGCC TGCGCTTGCC TCCCTGCCTC ATCTCTATTC TCATTCCACT 
651 ATGCCCCAAG CCCTGGTGGT CTGGCCCTTT CTTTTTCCTC CTATCCTCAG 
701 GGACCTGTGC TGCTCTGCCC TCATGTCCCA CTTGGTTGTT TAGTTGAGGC 
751 ACTTTATAAT TTTTCTCTTG TCTTGTGTTC CTTTCTGCTT TATTTCCCTG 
801 CTGTGTCCTG TCCTTAGCAG CTCAACCCCA TCCTTTGCCA GCTCCTCCTA 
851 TCCCGTGGGC ACTGGCCAAG CTTTAGGGAG GCTCCTGGTC TGGGAAGTAA 
901 AGAGTAAACC TGGGGCAGTG GGTCAGGCCA GTAGTTACAC TCTTAGGTCA 
951 CTGTAGTCTG TGTAACCTTC ACTGCATCCT TGCCCCATTC AGCCCGGCCT 
1001 TTCATGATGC AGGAGAGCAG GGATCCCGCA GTACATGGCG CCAGCACTGG 
1051 AGTTGGTGAG CATGTGCTCT CTCTTGAGAT TAGGAGCTTC CTTACTGCTC 
1101 CTCTGGGTGA TCCAAGTGTA GTGGGACCCC CTACTAGGGT CAGGAAGTGG 
1151 ACACTAACAT CTGTGCAGGT GTTGACTTGA AAAATAAAGT GTTGATTGGC 
1201 TAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAGGGCGGCC GCTCTAGAGG 
1251 ATCCAAGCTT ACGTAAAAAA AAAAAAAAAA AAG 


BLAST Results 


No BLAST result 


Medline entries 


No. Medline entry 


Peptide information for frame 2 


ORF from 461 bp to 814 bp; peptide length: 118 
Category: putative protein 


1 MVAIPPSACL PACCPGHGAV PVPRIGFKFV NNFPFGLVDV NRAREVLPTA 
51 CACLPASSLF SFHYAPSPGG LALSFSSYPQ GPVLLCPHVP LGCLVEALYN 
101 FSLVLCSFLL YFPAVSCP 
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BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_15c6, frame 2 

protein L2 - Arabidopsis thaliana, N = i. Score = 

'0# P — 0.33 

>PIR:S54250 ribosomal protein L2 - Arabidopsis thaliana 
Length = 258 

HSPs: 

Score = 76 (11.4 bits). Expect = 4.0e-01, P = 3.3e-01 
Identities = 30/91 (32%), Positives = 44/91 (48%) 

Query: 15 PGHGAVPVPRIGFKFVKNFPFGLVDVNRAREVLPTACACLPASSLFSFHYAPSPGGLALS 74 

CO + +E+ A C P SSL+ A G L 

Sb3Ct: 52 PGRGA-PLARVTFRH PFRF— KKQKELFVAAEVCTPVSSLYCGKKATLWGNVLP 103 

Query: 75 FSSYPQGPVLLCP-— HV-PLGCLVEALYNFSLVL 105 

S P+G V+ C HV G L A ++++V+ 
Sbjct: 104 LRSIPEGAVV-CNVEHHVGDRGVLARASGDYAIVI 137 

Pedant information for DKFZphtes3_15c6, frame 2 

Report for DKF2phtes3l5c6.2 

[LENGTH] 118 

[MW] 12413.79 

IPIJ 7.53 

[PROSITEJ LEUCINE^ZIPPER 1 

IPROSITEJ MYRISTYL 1 

(PROSITEJ ASN_GLYCOSYLATION 1 

IKWI TRANSMEMBRANE 1 

SEQ MVAIPPSACLPACCPGHGAVPVPRIGFKFVNNFPFGLVDVNRAREVLPTACACLPASSLF 

PRD cccccccccccccccccccccccccceeeecccccceeehhhhhhccccceeeccccccc 
MEM 

SEQ SFHYAPSPGGLALSFSSYPQGPVLLCPHVPLGCLVEALYNFSLVLCSFLLYFPAVSCP 

PRO eeecccccccceeeeecccccccccccccccchhhhhhhcchhhhhhhhccccccccc 

"™ MMMMMMMMMMMMMMMMM. 


Prosite for DKFZphtes3_15c6.2 

PSOOOOl 100->104 ASNGLYCOSYLATION PDOCOOOOl 

PS00008 70->76 MYRISTYL PDOC00008 

PS00029 84->106 LEUCINE_2IPPER PDOC00029 


{No Pfam data available for DKF2phtes3_15c6.2) 
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DKFZphtes3_15gl4 


group: testes derived 

DKFZphtes3_15gl4 encodes a novel 701 amino acid protein with weak similarity to S. cerevisiae 
hypothetical protein YOR243c. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 
genes. 


similarity to YOR243c 

complete cDNA, complete cds, potential start codon at Bp 35, EST hits 

Sequenced by GBF 

Locus: unknown 

Insert length: 3495 bp 

Poly A stretch at pos. 3462, no polyadenylation signal found 


1 GCCTTCCACT GAACCGAGGC ACTGTTATAG AAGAATGGAA GAAGATACAG 
51 ATTATAGAAT CAGGTTTAGT TCTTTGTGTT TCTTTAATGA TCACGTTGGA 
101 TTTCATGGCA CTATAAAAAG CTCACCAAGT GACTTTATTG TTATTGAAAT 
151 TGATGAACAG GG AC AGTTAG TTAATAAGAC CATCGATGAG CCTATTTTCA 
201 AGATTAGTGA AATACAACTT GAGCCAAATA ATTTTCCCAA AAAACCAAAA 
251 CTAGATCTTC AAAATCTGTC CTTAGAAGAT GGAAGAAACC AAGAAGTTCA 
301 TACTTTGATT AAGTACACTG ATGGTGACCA AAATCATCAG TCTGGTTCAG 
351 AAAAGGAAGA TACTATCGTT GATGGAACTT CCAAATGTGA AGAAAAAGCT 
401 GATGTTTTAA GCTCCTTTTT GGATGAAAAA ACTCATGAGT TACTGAATAA 
451 TTTTGCCTGT GATGTAAGAG AGAAGTGGCT TTCTAAAACA GAGCTAATTG 
501 GACTACCTCC TGAATTCTCA ATAGGCAGAA TCCTTGACAA AAACCAGAGG 
551 GCTAGTTTAC ACAGTGCCAT TAGGCAGAAA TTTCCATTTT TAGTAACTGT 
601 AGGAAAAAAC AGTGAAATTG TTGTAAAACC AAATCTTGAA TATAAAGAAC 
651 TTTGTCATTT GGTATCTGAA GAGGAAGCAT TTGACTTTTT TAAATATTTG 
701 GATGCAAAGA AAGAAAATTC CAAATTTACC TTTAAACCTG ATACAAACAA 
751 AGACCACAGA AAAGCTGTCC ACCATTTTGT CAACAAAAAG TTTGGAAACC 
801 TTGTGGAAAC CAAATCTTTT TCTAAAATGA ATTGCAGTGC TGGTAATCCG 
851 AATGTGGTGG TAACAGTAAG ATTTCGGGAA AAAGCACACA AACGTGGGAA 
901 AAGGCCTCTT TCTGAATGCC AAGAAGGAAA AGTTATATAT ACAGCTTTTA 
951 CCCTACGAAA GGAAAACCTG GAAATGTTTG AAGCGATTGG TTTTTTAGCT 
1001 ATCAAACTTG GTGTTATTCC TTCGGATTTT AGTTATGCAG GCCTTAAAGA 
1051 CAAGAAAGCC ATCACCTATC AAGCAATGGT TGTTAGAAAA GTGACTCCAG 
1101 AGAGGTTGAA AAATATTGAA AAAGAAATTG AAAAGAAAAG AATGAATGTC 
1151 TTTAATATTC GGTCTGTAGA TGATTCCCTG AGACTTGGTC AGCTCAAAGG 
1201 AAATCACTTT GATATTGTCA TTAGAAATTT AAAAAAACAA ATAAATGATT 
1251 CTGCAAACCT GAGGGAGAGA ATTATGGAAG CAATAGAAAA TGTTAAGAAA 
1301 AAAGGCTTTG TGAATTACTA TGGACCACAG AGATTTGGGA AGGGAAGGAA 
1351 AGTTCACACA GACCAAATTG GACTAGCTTT GCTGAAGAAT GAAATGATGA 
1401 AAGCCATAAA ATTGTTTCTT ACACCAGAAG ACTTGGATGA TCCTGTAAAT 
1451 AGAGCAAAGA AGTATTTTCT TCAAACTGAG GATGCTAAAG GCACACTTTC 
1501 ATTGATGCCT GAATTCAAAG TGCGTGAGAG AGCATTGTTG GAGGCATTGC 
1551 ACCGCTTTGG CATGACCGAG GAAGGTTGTA TCCAGGCATG GTTCTCTTTA 
1601 CCCCATTCCA TGCGCATATT CTATGTTCAC GCATATACCA GCAAAATTTG 
1651 GAATGAGGCA GTATCTTACA GACTTGAAAC CTATGGAGCA AGAGTAGTGC 
1701 AGGGTGATTT GGTCTGTTTG GATGAAGACA TTGATGACGA GAATTTCCCA 
1751 AATAGTAAAA TTCACCTGGT AACTGAAGAG GAGGGATCAG CTAATATGTA 
1801 TGCAATACAT CAGGTGGTTC TTCCAGTACT TGGATACAAT ATTCAGTACC 
1851 CGAAGAACAA AGTAGGGCAG TGGTACCATG ACATACTTAG CAGAGATGGA 
1901 CTACAGACAT GTAGGTTTAA AGTACCTACT CTGAAACTGA ATATACCAGG 
1951 TTGCTATAGA CAGATTTTGA AACATCCCTG TAATCTCTCA TACCAACTAA 
2001 TGGAAGATCA TGACATTGAT GTCAAAACGA AAGGTTCCCA CATTGATGAA 
2051 ACAGCTTTGT CTCTTTTGAT CTCTTTTGAT CTTGATGCTT CATGCTATGC 
2101 TACCGTTTGT CTGAAGGAAA TAATGAAGCA TGACGTTTAA AACTGATACC 
2151 CTTGGTATAA CCATATATAT GTCACCCTTT CCTGTTTTTG AAATTATTGA 
2201 TCAGAACAAT ATACAAGGGA AATGCCATAC CTCTGTTTGT GATAGATACC 
2251 CCAGAGTAGT TATTACCTCT TTGTGAGATA AGTAATCTTT GATGAAGATT 
2301 GAAATACAAT TTCTCATCCA ATTTTTATAT CTTGGCATAC GCTGACCCTC 
2351 TTGACCATTT GTAATTTTTT CATATTATCT AAAACAGGTG TTAGAGTCAG 
2401 ACAGATTCAT TCTTAGATTC TAGCTCTGAC ACTTACTAGT GATTTTGAGT 
2451 ATGTTGTTGA TTTTTTTGTG TGTGGTTACT GATAGAATCA AGACAATTAC 
2501 AACTTCATAA ATGACAAATA ATAGGATTAT CTCCACATTT TCTGTTGCTG 
2551 GAGGAACAAA ACATTGTGCC CATTTGAAAA TTTTAATTTT TGTTGGTTTA 
2601 ACTATCCCAC ATTATAAATC ATCCTTCACC ATTTTATATC AGTTAAATAT 
2651 GGGTGTGTTG GGGAGGAATG ACTGGCATGT AGACATGTAT TGATTTAGGA 
2701 AGATCTGAGC ATTTCTTTCA TTGTTGGTAA GATATAATGA TGAAATTTAA 
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2751 AAAGCAGTAT GGAGCATTAT ATATCAGTAA TGTGATATAT ATACTTAAGC 
2801 CAGTTTAACC ATTTTGGGAA ATGTTAGCAT TAGGAAATAA AATCCAAAAG 
2851 AAGGAAGAGA AGCTATATGC AATGCAAAAT TTGCTTATTG CAATATTTTC 
2901 ATATACAGAC ACTAAAAACA GTTTTCAAAG TCCAGCATTA CGTAACTAAA 
2951 GTAAGTAAAA TGATGTGTAT CAACTTGATG GTAAAATATG TAGTTATTTA 
3001 AAAAAGCAAT GAACAATTTA GTTTCATGAG AAAATGTTGC CCCCTAAAAG 
3051 TAGAACACAT ATGTTACAAC TGCAATAATA CTCTGAATTC ATCTTTCACA 
3101 AATAAGAGAC ATGTTAGCAT AGTGATTAAA AGCACAGATA TTGGAGACAA 
3151 ACTAACCCAG TTTGAACCCT GGCACTGCCA CGTATAGCAC TGCAGCCTTG 
3201 GGAAAGTTAT TTAAACTCAT GGGCTTCAGT TTCAACATCT GTAAAATGGG 
3251 CATGTTAACA TTGCCTACCT CATAGGATTA CTGTGAGAAT TTTCTAAGTT 
3301 AATATATGTA AAGCAACTTT AAAAAGTGCC TGGCACTTAG TTATTGTTAA 
3351 GTAAGTGTCT GCAGATGCAA GTTTGGAAGA GAAAAGCAAA TAAATGAAAA 
3401 TCCCTTCCTG TTAAGATGAA AAAAAAAAAA AAAAAAAAAA AAAAAAGGGG 
3451 CGGCCGCTCA AGATGAAAAA AAAAAAAAAA AAAAAAAAAA AAAGG 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 2 


ORF from 35 bp to 2137 bp; peptide length: 701 
Category: similarity to unknown protein 


1 MEEDTDYRIR FSSLCFFNDH 
51 DEPIFKISEI QLEPNNFPKK 
101 HQSGSEKEDT IVDGTSKCEE 
151 KTELIGLPPE FSIGRILDKN 
201 LEYKELCHLV SEEEAFDFFK 
251 KKFGNLVETK SFSKMNCSAG 
301 lYTAFTLRKE NLEMFEAIGF 
351 RKVTPERLKN lEKEIEKKRM 
401 KQINDSANLR ERIMEAIENV 
451 KNEMMKAIKL FLTPEDLDDP 
501 LLEALHRFGM TEEGCXQAHF 
551 GARVVQGDLV CLDEDXDDEN 
601 YNIQYPKNKV GQWYHDILSR 
651 LSYQLMEDHD IDVKTKGSHI 
701 V 


VGFHGTIKSS PSDFIVIEID EQGQLVNKTI 
PKLDLQNLSL EDGRNQEVHT LIKYTDGDQN 
KADVLSSFLD EKTHELLNNF ACDVREKWLS 
QRASLHSAIR QKFPFLVTVG KNSEIVVKPN 
YLDAKKENSK FTFKPDTNKD HRKAVHHFVN 
NPNWVTVRF REKAHKRGKR PLSECQEGKV 
LAIKLGVIPS DFSYAGLKDK KAITYQAMW 
NVFNIEISVDD SLRLGQLKGN HFDIVIRNLK 
KKKGFVNYYG PQRFGKGRKV HTDQIGLALL 
VNRAKKYFLQ TEDAKGTLSL MPEFKVRERA 
SLPHSMRIFY VHAYTSKIWN EAVSYRLETY 
FPNSKIHLVT EEEGSANMYA IHQWLPVLG 
DGLQTCRFKV PTLKLNIPGC YRQILKHPCN 
DETALSLLIS FDLDASCYAT VCLKEIMKHD 


BLASTP hits 


No BLASTP hits available 


Alert BLASTP hits for DKFZphtes3_15gl4, frame 2 

TREMBL:SPBC1A45P_10 gene: "SPBC1A4 .09"; product: "hypothetical 
protein"; S.pombe chromosome II cosmid clA4 left hand region 1-26184 bp 
Originates from chimeric cosmid., N = 3, Score = 511, P = 2.9e-57 

PIR:S67136 hypothetical protein YOR243c - yeast ( Saccharomyces 
cerevisiae), N = 2, Score =» 516, P = 7.3e-54 

SWISSPR0T:YQ4B CAEEL HYPOTHETICAL 64.6 KD PROTEIN B0024.il IN 
CHROMOSOME V.,"n = 2, Score = 386, P = 2.1e-34 


>PIR:S67136 hypothetical protein YOR243c - yeast (Saccharomyces cerevisiae) 
Length - 676 


HSPs: 


Score * 516 (77.4 bits). Expect - 7.3e-54, Sum P(2) « 7.3e-54 
Identities « 151/498 (30%), Positives » 245/498 (49%) 

Query: 191 KNSEIVVKPNLEYKELCHLVSEEEAFDFFK-YLDAKKENSKFTFKPDTNKDHRKAVHHFV 249 
+ E V P L +L + EE+ Y A K + F+ +K R +H + 
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Sbjct: 109 RRQEFNVDPELR-NQLVEIFGEEDVLKIESVYRTANKMETAKNFE— DKSVRTKIHQLL 164 

Query: 250 NKKFGNLVETKSFSKMNCSAGNPNVVVTVRFREKAHK-RGKRPLSECQEG-KVIYTAFTL 307 

+ F N +E+ + N +EK ++ R + G + FTL 

Sbjct: 165 REAFKNELESVTTDTNTFKIARSNRNSRTNKQEKINQTRDANGVENWGYGPSKDFIHFTL 224 

Query: 308 RKENLEMFEAIGFLAIKLGVIPSD-FSYAGLKDKKAITYQAMVVRKVTPERLKNIEKEIE 366 

KEN + EA+ + KL +PS YAG KD-»-+A+T Q + + K+ +RL + + + 
Sbjct: 225 HKENKDTMEAVNVIT-KLLRVPSRVIRYAGTKDRRAVTCQRVSISKIGLDRLNALNRTL- 282 

Query: 367 KKRMNVFNIRSVDDSLRLGQLKGNHFDIVIRNLKKQINDSANLRERIMEAIENVKKKGFV 426 

K M + N D SL LG LKGN F +VIR++ N +L E + +++ + GF+ 
Sbjct: 283 -KGMIIGNYNFSDASLNLGDLKGNEFVVVIRDVTTG-NSEVSLEEIVSNGCKSLSENGFI 340 

Query: 427 NYYGPQRFGKGRKVHTDQIGLALLKNEMMKAIKLFLTPEDLDDPVNR-AKKYFLQTEDAK 485 

NV+G QRFG + T IG LL + KA +L L+ +D P ++ A+K + +T+DA 
Sbjct: 341 NYFGMQRFGTF-SISTHTIGRELLLSNWKKAAELILSDQDNVLPKSKEARKIWAETKDAA 399 

Query: 486 GTLSLMPEFKVRERALLEALHRFGMTEEGCIQ— AWFS LPHSMRIFYVHAYTSKIW 539 

L MP + E ALL +L E+G A+++ +P ++R YVHAY S +W 

Sbjct: 400 LALKQMPRQCLAENALLYSLSNQRKEEDGTYSENAYYTAIMKIPRNLRTMYVHAYQSYVW 459 

Query: 540 NEAVSYRLETYGARWQGDLVC LDEDIDDENFPNS KIHLVTEEEGS 585 

N S R+E +G ++V GDLV L IDDE+F + VT+E+ 

Sbjct: 460 NSIASKRIELHGLKLVVGDLVIDTSEKSPLISGIDDEDFDEDVREAQFIRAKAVTQEDID 519 

Query: 586 ANMYAIHQVVLPVLGYNIQYPKNK-VGQWYHDILSRDGLQTCRFKVPTLKLNIPGCYRQI 644 

+ Y + VVLP G+++ YP N+ + Q Y OIL D + + ++ G YR + 

Sbjct: 520 SVKYTMEOWLPSPGFDVLYPSNEELKQLYVDILKADNMDPFNMRRKVRDFSLAGSYRTV 579 

Query: 64 5 LKHPCNLSYQLMEDHDIDVKTKGSHID 671 

++ P +L Y+++ D + + +D 
Sbjct: 580 IQKPKSLEYRIIHYDDPSQQLVNTDLD 606 

Score = 86 <12.9 bits). Expect = 3.2e-0l, Sum P(2) = 2.8e-01 
Identities = 40/160 (25%), Positives - 77/160 (48%) 

Query: 22 GFHGTIKSSPSDFIVIEIDEQGQLVNKTIDEPIFKISEIQLEPNNFPKKPiaDLQNLSLE 81 

GF G IK +DF+V EID++G++++ T D+ FK+ + +P K + S E 

Sbjct: 55 GFRGQIKQRYTDFLVNEIDQEGKVIHLT-DKG-FKMPK— KPQR— SKEEVNAEKES-E 106 

Query: 82 DGRNQEVHTLIKYTDGDQNHQSGS—EKEDTI-VDGTSKCEEKADVLSSFLDEKTHELLN 138 
R QE + D + +Q +ED + ++ + K + +F D+ ++ 

Sbjct: 107 AARRQEFNV DPELRNQLVEIFGEEDVLKIESVYRTANKMETAKNFEDKSVRTKIH 161 

Query: 139 NFACDVREKWLSKTELIGLPPE-FSIGRILDKNQRASLHSAIRQ 181 

+RE + ++ E + FIR ++N R + I Q 

Sbjct: 162 QL LREAFKNELESVTTDTNTFKIARS-NRNSRTNKQEKINQ 201 

Score = 58 (8.7 bits). Expect = 7.3e-54, Sum P(2) « 7.3e-54 
Identities = 10/23 (43%), Positives = 17/23 (73%) 

Query: 676 SLLISFDLDASCYATVCLKEIMK 698 

++++ F L S YAT+ L+E+MK 
Sbjct: 638 AVVLKFQLGTSAYATMALRELMK 660 

Pedant information for DKFZphtes3_15gl4, frame 2 


Report for DKFZphtes3_15gl4 .2 

[LENGTH) 701 

[MW) 80700.96 

[pH 7.31 

[HOMOL] PIR:S67136 hypothetical protein YOR243c - yeast (Saccharomyces cerevisiae) 2e- 

(FUNCATl 99 unclassified proteins [S. cerevisiae, YOR243cl 8e-53 

[BLCKKSJ BL01268C 

[BLOCKS! BL01268B 

[BLOCKS) BL01268A 

[SUPFAMJ hypothetical protein HI0701 3e-06 

[PROSITE) MYRISTYL 7 

(PROSITEJ AMIDATION 2 

(PROSITE) CAMP_PHOSPHO_SITE 1 

(PROSITE) CK2_PHOSPHO_SITE 16 

(PROSITE! TYR_PHOSPHO_SITE 1 

(PROSITE) PKC_PHOSPHO SITE 13 

(PROSITEJ ASNGLYCOSYLATION 5 

[KW] Alpha Beta 
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SEQ • MEEDTDYRIRFSSLCFFNDHVGFHGTIKSSPSDFIVIEIDEQGQLVNKTIDEPIFKISEI 

PRO ccccceeeeeecceeecccccccceeeeecccceeeeeecccceeeeeccccceeeeeee 

SEQ QLEPNNFPKKPKIiDLQNLSLEDGRNQEVHTLIKYTDGDQNHQSGSEKEDTIVDGTSKCEE 

PRD cccccccccccccccccccccccccccccceeeeccccccccccccceeeeeecccccch 

SEQ KADVLSSFLDEKTHELLNNFACDVREKWLSKTELIGLPPEFSIGRILDKNQRASLHSAIR 

PRD hhhhhhhhhhhhhhhhhhhcchhhhhhhhhhheeecccccceeeeeeecchhhhhhhhhh 

SEQ QKFPFLVTVGKNSEIWKPNLEYKELCHLVSEEEAFDFFKYLDAKKENSKFTFKPDTNKD 

PRD hhccceeeecccceeeecccchhhhhhhhhhhhhhhhhhhhhhcccpcceeeecccccch 

SEQ HRKAVHHFVNKKFGNLVETKSFSKMNCSAGNPNWVTVRFREKAHKRGKRPLSECQEGKV 

PRD hhhhhhhhhhhhhhheeeeecccceeeecccccceeeechhhhhhhhcccccccccccce 

SEQ I YTAFTLRKENLEMFEAIGFLAIKLGVI PSDFSYAGLKDKKAITYQAMWRKVTPERLKN 

PRD eeeeeeeeccccchhhhhhhhhhhhcccccceeeccccchhhhhhhheeeccccchhhhh 

SEQ lEKEIEKKRMNVFNIRSVDDSLRLGQLKGNHFDIVIRNLKKQINDSANLRERIMEAIENV 

PRD hhhhhhhhhheeeeeeccccccccccccccceeeeeehhhhhccccchhhhhhhhhhhhh 

SEQ KKKGFVNYYGPQRFGKGRKVHTDQIGLALLKNEMMKAI KLFLTPEDLDDPVNRAKKYFLQ 

PRD hhcccccccccccccccccccchhhhhhhhhhhhhhhhhhhhcccccccchhhhhhhhhh 

SEQ TEOAKGTLSLMPEFKVRERALLEALHRFGMTEEGCIQAWFSLPHSMRI FYVHAYTSKIWM 

PRD hcccchhhhhhhhhhhhhhhhhhhhhhcccccchhhhhhhhcccchhhhhhhhhhhhhhh 

SEQ EAVSYRLETYGARWQGDLVCLDEDIDDENFPNSKIHLVTEEEGSANMYAI HQVVLPVLG 

PRD hhhhhhhhhhcceeeccceeeeccccccccccccccceeecccccccccccceeeccccc 

SEQ YNIQYPKNKVGQWYHDILSRDGLQTCRFKVPTLKLNIPGCYRQILKHPCNLSYQLMEDHD 

PRD cccccccccchhhhhhhhhhccccccccccccccccccchhhhhhhhccchhhhhhhhcc 

SEQ I DVKTKGSHI DETALSLLI SFDLDASCYATVCLKEIMKHDV 

PRD ceeeccccchhhhhhheeeeeecccccchhhhhhhhhhccc 


Prosite for DKFZphtes3_15gl4 . 2 


PSOOOOl 

47->51 

ASN 

GLYCOSYLATION 

PSOOOOl 

77->81 

asn' 

"glycosylation 

PSOOOOl 

266- 

■>270 

asn' 

"GLYCOSYLATION 

PSOOOOl 

404- 

■>408 

asn" 

"glycosylation 

PSOOOOl 

650- 

•>654 

asn" 

"glycosylation 

PS00004 

351- 

■>355 

CAMP PHOSPHO SITE 

PS00005 

26->29 

PKC 

PHOSPHO 

SITE 

PS00005 

105- 

■>108 

PKC' 

"PHOSPHO" 

"site 

PS00005 

115- 

■>118 

PKC" 

"PHOSPHO" 

"site 

PS00005 

232- 

•>235 

PKC' 

"PHOSPHO" 

'site 

PS00005 

237- 

•>240 

PKC" 

"PHOSPHO" 

'site 

PS00005 

277- 

■>280 

PKC" 

"PHOSPHO" 

'site 

PS00005 

306- 

>309 

PKC" 

"PHOSPHO" 

'site 

PS00005 

381- 

>384 

PKC' 

"PHOSPHO" 

'site 

PS00005 

525- 

■>528 

PKC" 

"PHOSPHO" 

'site 

PS00005 

535- 

->538 

PKC" 

PHOSPHO" 

'site 

PS00005 

544- 

>547 

PKC" 

"PHOSPHO' 

'site 

PS00005 

625- 

•>628 

PKC~ 

"PHOSPHO' 

'site 

PS00005 

632- 

>635 

PKC"" 

"PHOSPHO" 

'site 

PS00006 

30->34 

CK2~ 

"pHOSPHO" 

'site 

PS00006 

49->53 

CK2" 

■pHOSPHO" 

'site 

PS00006 

79->83 

CK2" 

"PHOSPHO" 

"site 

PS00006 

95 

.->99 

CK2" 

"PHOSPHO" 

'site 

PS00006 

103- 

>107 

CK2" 

"PHOSPHO" 

'site 

PS00006 

105- 

•>109 

CK2" 

'PHOSPHO" 

'site 

PS00006 

110- 

>114 

CK2" 

"PHOSPHO" 

'site 

PS00006 

116- 

>120 

CK2' 

'pHOSPHO" 

'site 

PS00006 

127- 

>131 

CK2* 

"PHOSPHO" 

"site 

PS00006 

150- 

>154 

CK2" 

'PHOSPHO" 

"site 

PS00006 

211- 

•>215 

CK2" 

'PHOSPHO" 

site 

PS00006 

237- 

>241 

CK2" 

"pHOSPHO" 

"site 

PS00006 

377- 

>381 

CK2" 

"PHOSPHO" 

'site 

PS00006 

463- 

>4 67 

CK2' 

>HOSPHO~ 

'site 

PS00006 

580- 

>584 

CK2" 

"PHOSPHO" 

'site 

PS00006 

668->672 

CK2 PHOSPHO SITE 

PS00007 

537- 

>546 

tyr'phospho-site 

PS00008 

25 

-->31 

MYRISTYL ~ 

PS00008 

43 

t->49 

MYRISTYL 


PS00008 

114- 

>120 

MYRISTYL 



PDOCOOOOl 
PDOCOOOOl 
PDOCOOOOl 
PDOCOOOOl 
PDOCOOOOl 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
P0OC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00008 
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PS00008 
PS00008 
PS00008 

PS00008 
PS00009 
PS00009 


326->332 
385->391 
514->520 
622->628 
287->291 


MYRISTYL 
MYRISTYL 
MYRISTYL 

MYRISTYL 
AMI DAT I ON 


436->440 AMIDATION 


PDOC00008 
PDOCOOOOB 
PDOC00008 

PDOC00008 
PDOC00009 
PDOC00009 


(No Pfam data available for DKF2phtes3_15gl4 .2) 
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PCT/IBOO/01496 


group: testes derived 

DKFZphtes3_15hl encodes a novel 672 amino acid protein with very weak similarity to several 
proteins . 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-speclfic 
genes . 


similarity to Hsp70/Hsp90 organizing protein 

complete cDNA, complete cds, no EST hits 

Sequenced by GBF 

Locus: unknown 

Insert length: 2277 bp 

Poly A stretch at pos. 2252, polyadenylation signal at pos. 2226 


1 AAACCAGATA GAGGTTCTCC AGCTTTTCTT TGATTGTCTC TGCTTTAGCG 
5a TCTCTAAATC CGGTCACCAT GTCGGACCCC GAAGGCGAGA CCTTGCGAAG 
101 CACCTTTCCC TCTTATATGG CCGAAGGCGA GCGGCTCTAC CTGTGCGGGG 
151 AATTTTCTAA AGCCGCGCAG AGCTTCAGCA ACGCTCTTTA CCTTCAGGAT 
201 GGAGACAAGA ACTGCCTGGT TGCTCGCTCA AAGTGCTTCC TGAAGATGGG 
251 AGACTTGGAG AGATCCCTGA AGGATGCTGA GGCTTCGCTC CAGAGTGACC 
301 CAGCTTTCTG TAAGGGGATT TTGCAAAAGG CTGAGACACT GTACACCATG 
351 GGAGACTTTG AGTTTGCCTT GGTATTCTAT CATCGAGGCT ACAAGCTGAG 
401 GCCTGATCGG GAATTCAGAG TTGGCATTCA GAAAGCCCAG GAAGCCATCA 
4 51 ACAACTCAGT GGGAAGTCCT TCTTCCATTA AGCTGGAGAA CAAAGGGGAC 
501 CTCTCCTTCT TAAGCAAGCA GGCTGAGAAT ATAAAAGCCC AGCAGAAGCC 
551 TCAGCCCATG AAACACCTCT TACACCCCAC CAAGGGAGAG CCCAAGTGGA 
601 AGGCCTCGCT CAAGAGTGAG AAGACTGTCC GCCAGCTTCT GGGGGAGCTC 
651 TACGTGGACA AAGAGTATTT GGAGAAGCTC CTATTGGATG AAGACCTGAT 
701 CAAAGGCACC ATGAAGGGCG GCCTGACTGT GGAGGACCTC ATCATGACGG 
751 GCATCAACTA CCTGGATACT CACAGCAACT TCTGGAGGCA GCAGAAGCCG 
801 ATCTACGCCA GGGAGCGGGA CCGGAAGCTG ATGCAAGAGA AATGGCTGCG 
851 GGACCACAAA CGCCGTCCCT CACAGACAGC CCATTACATC CTCAAGAGCC 
901 TGGAGGACAT TGATATGTTG CTCACAAGTG GCAGTGCTGA AGGGAGTCTT 
951 CAGAAAGCTG AGAAAGTGCT GAAGAAGGTA CTGGAATGGA ACAAGGAAGA 
1001 GGTACCCAAC AAGGATGAAC TGGTTGGAAA CTTGTATAGC TGCATAGGGA 
1051 ATGCCCAGAT TGAGCTGGGG CAGATGGAGG CAGCCCTGCA GAGCCACAGA 
1101 AAGGACCTGG AGATCGCCAA GGAATATGAC CTTCCTGATG CAAAATCGAG 
1151 AGCCCTTGAC AACATTGGCA GAGTTTTTGC CAGAGTTGGG AAATTCCAGC 
1201 AAGCCATTGA CACGTGGGAA GAAAAGATCC CTCTGGCAAA AACCACCCTG 
1251 GAGAAGACCT GGCTGTTCCA CGAGATCGGC CGCTGCTACT TGGAGCTGGA 
1301 CCAGGCCTGG CAGGCCCAGA ATTATGGCGA GAAGTCCCAG CAGTGTGCCG 
1351 AGGAGGAAGG GGACATTGAG TGGCAACTGA ATGCCAGTGT TCTGGTGGCC 
1401 CAGGCACAAG TGAAGCTGAG AGACTTGGAG TCAGCCGTGA ACAATTTTGA 
1451 GAAGGCCCTG GAGAGAGCAA AGCTTGTGCA TAACAACGAG GCGCAGCAGG 
1501 CCATCATCAG TGCCTTGGAC GATGCCAACA AGGGTATCAT CAGAGAACTG 
1551 AGGAAAACCA ACTACGTGGA GAATCTCAAA GAAAAAAGCG AGGGAGAAGC 
1601 TTCACTGTAT GAAGATAGAA TAATAACAAG AGAGAAGGAC ATGAGGAGAG 
1651 TGAGAGATGA GCCCGAGAAG GTGGTGAAGC AGTGGGACCA TAGTGAGGAT 
1701 GAGAAAGAGA CAGATGAGGA CGATGAGGCT TTTGGGGAAG CTCTGCAGAG 
1751 CCCAGCAAGC GGAAAGCAGA GTGTGGAAGC AGGAAAAGCC AGAAGCGATT 
1801 TGGGAGCAGT TGCCAAGGGC CTGTCAGGAG AATTAGGCAC AAGATCAGGA 
1851 GAAACAGGCA GGAAGCTACT AGAAGCTGGC AGAAGAGAGT CAAGAGAAAT 
1901 TTATAGGAGG CCTTCGGGAG AATTAGAGCA AAGACTCTCA GGAGAATTCA 
1951 GCAGACAGGA ACCAGAAGAA CTAAAGAAAC TTTCAGAAGT GGGCAGAAGA 
2001 GAGCCAGAAG AACTGGGAAA AACACAATTT GGAGAAATAG GAGAAACGAA 
2051 AAAAACAGGA AATGAGATGG AAAAGGAATA TGAATGAAGC CATCGGTAGA 
2101 GATGAGGATC AGGAAGCTGG TGTTCAGAGG GATCATGGGA TTTTATTAAA 
2151 CTGGATTTTC AAGCGATTTG TCTGTTATAG GAAAAATGAG GGTTTTACTT 
2201 CTGCTGCTTT CCATCACTAT TTTGCCATTA AATAGGTGTC TTTCACTCTT 
2251 GCAAAAAAAA AAAAAAAAAA AAAAAAA 


BLAST Results 


NO BLAST result 
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Medline entries 


No Medline entry 


Peptide information for frame 3 


ORF from 69 bp to 2084 bp; peptide length: 672 
Category: similarity to known protein 


1 MSDPEGETLR STFPSYMAEG ERLYLCGEFS KAAQSFSNAL YLQDGDKNCL 
51 VARSKCFLKM GDLERSLKDA EASLQSDPAF CKGILQKAET LYTMGDFEFA 
101 LVFYHRGYKL RPDREFRVGI QKAQEAINNS VGSPSSIKLE NKGDLSFLSK 
151 QAENIKAI3QK PQPMKHLLHP TKGEPKWKAS LKSEKTVRQL LGELYVDKEY 
201 LEKLLLDEDL IKGTMKGGLT VEDLIMTGIN YLDTHSNFWR QQKPIYARER 
251 DRKLMQEKWL RDHKRRPSQT AHYILKSLED IDMLLTSGSA EGSLQKAEKV 
301 LKKVLEWNKE EVPNKDELVG NLYSCIGNAQ lELGQMEAAL QSHRKDLEIA 
351 KEYDLPDAKS RALDNIGRVF ARVGKFQQAI DTWEEKIPLA KTTLEKTWLF 
401 HEIGRCYLEL DQAWQAQNYG EKSQQCAEEE GDIEWQLNAS VLVAQAQVKL 
4 51 RDFESAVNNF EKALERAKLV HNNEAQQAII SALDDANKGI IRELRKTNYV 
501 ENLKEKSEGE ASLYEDRIIT REKDMRRVRD EPEKVVKQWD HSEDEKETDE 
551 DDEAFGEALQ SPASGKQSVE AGKARSDLGA VAKGLSGELG TRSGETGRKL 
601 LEAGRRESRE lYRRPSGELE QRLSGEFSRQ EPEELKKLSE VGRREPEELG 
651 KTQFGEIGET KKTGNEMEKE YE 


BLASTP hits 


Entry AF039202_1 from database TREMBL: 

product: "Hsp70/Hsp90 organizing protein"; Cricetulus griseus 

Hsp70/Hsp90 organizing protein mRNA, complete cds . 

Score = 149, P = 5.3e~07, identities = 42/160, positives = 74/160 

Entry AI09782_1 from database TREMBL: 

product: "myosin heavy chain"; Argopecten irradians myosin heavy chain 

mRNA, complete cds. 

Score = 155, P = 6.1e-07, identities = 140/623, positives = 256/623 

Entry S56658 from database PIR: 
stress-induced protein stil - soybean 

Score * 156, P 9.7e-08, identities = 41/153, positives = 72/153 


Alert BLASTP hits for DKFZphtes3_15hl, frame 3 
No Alert BLASTP hits found 

Pedant information for DKF2phtes3_15hl, frame 3 

Report for DKFZphtes3_15hl . 3 


(LENGTH! 672 

[MW] 76655.61 

(plj 5.49 

[HOMOL) PIR:S56658 stress-induced protein stil - 

ISUPFAM] tetratricopeptide repeat homology le-07 

IPROSITE) MYRISTYL 7 

[PROSITEJ AMIDATION 3 

[PROSITE] CAMP_PHOSPH0_SITE 4 

IPROSITE) CK2_PHOSPH0_SITE 15 

[PROSITE] TYR_PHOSPH0_SITE 1 

(PROSITE) PKC_PHOSPH0_SITE 11 

(PROSITE) ASN GLYCOSYLATION 2 

(KW) All^Alpha 

(KW) LOW COMPLEXITY 4.76 % 


soybean 6e-10 


SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 


MSDPEGETLRSTFPSYMAEGERLYLCGEFSKAAQSFSNALYLQDGDKNCLVARSKCFLKM 
cccccccceeeccccccccccccccccchhhhhhhhhhhhhhccccceeehhhhhhhhhh 
GDLERSLKDAEASLQSDPAFCKGILQKAETLYTMGDFEFALVFYHRGYKLRPDREFRVGI 
hcchhhhhhhhhhhhhccchhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccchhhhhh 
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SEQ QKAQEAINNSVGSPSSIKLENKGDLSFLSKQAENIKAQQKPQPMKHLLHPTKGEPKWKAS 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhccchhhhhhhchhhhhhhcccchhhhhhcccccccchhhh 

SEQ LKSEKTVRQLLGELYVDKEYLEKLLLDEDLIKGTMKGGLTVEDLIMTGINYLDTHSNFWR 

SEG xxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhccccccccccccc 

SEQ QQKPIYARERDRKLMQEKWLRDHKRRPSQTAHYILKSLEDIDMLLTSGSAEGSLQKAEKV 

SEG 

PRD cchhhhhhhhhhhhhhhhhhhhccccccchhhhhhhhhhhheeeeeccccchhhhhhhhh 

SEQ LKKVLEWNKEEVPNKDELVGNLYSCIGNAQIELGQMEAALQSHRKDLEIAKEYDLPDAKS 

SEG 

PRD hhhhhhhhcccccccceeecccccccchhhhhhhhhhhhhhhhhhhhhhhhhcccccchh 

S EQ RALDN I GRV FARVGK FQQA I DTWEEKI PLAKTTLEKTWL FHE I GRC YLELDQAWQAQN YG 

SEG 

PRD hhhccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhh 

SEQ EKSQQCAEEEGDIEWQLNASVLVAQAQVKLRDFESAVNNFEKALERAKLVHNNEAQQAII 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhh 

SEQ SALDDANKGIIRELRKTNYVENLKEKSEGEASLYEDRIITREKDMRRVRDEPEKVVKQWD 

SEG X 

PRD hhhhccchhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhccceeeeecc 

SEQ HSEDEKETDEODEAFGEALQSPASGKQSVEAGKARSDLGAVAKGLSGELGTRSGETGRKL 

SEG xxxxxxxxxxxxx 

PRD ccccccccccchhhhhhhcccccccchhhhhccccccceeeeecccccccccccccchhh 

SEQ LEAGRRESREIYRRPSGELEQRLSGEFSRQEPEELKKLSEVGRREPEELGKTQFGEIGET 

SEG 

PRD hhhcccccceeeeccccchhhhhcccccchhhhhhhhhhhcccccccccccccccccccc 

SEQ KKTGNEMEKEYE 

SEG 

PRD cccccccccccc 


Prosite for DKFZphtes3_15hl . 3 


PSOOOOl 

128- 

->132 

ASN GLYCOSYLATION 

PDOCOOOOl 

PSOOOOl 

438- 

->442 

ASN GLYCOSYLATION 

PDOCOOOOl 

PS00004 

265- 

■>269 

CAMP PHOSPHO SITE 

PDOC00004 

PS00004 

605- 

->609 

CAMP~PHOSPHO SITE 

PDOC00004 

PS00004 

613- 

•>617 

CAMP PHOSPHO SITE 

PDOC00004 

PS00004 

636- 

•>640 

CAMP PHOSPHO SITE 

PDOC00004 

PS00005 

8->il 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

66->69 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

136- 

■>139 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

180- 

■>183 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

183- 

■>186 

PKC PHOSPHO^SITE 

PDOC00005 

PS00005 

186- 

•>189 

PKC~PHOSPHO SITE 

PDOC00005 

PS00005 

214- 

->217 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

342- 

•>345 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

564- 

■>567 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

596- 

•>599 

PKC PHOSPHORS I TE 

PDOC00005 

PS00005 

660- 

■>663 

PKC PHOSPHO SITE 

PDOC00005 

PS00006 


2«>6 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

66->70 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

93 

l->97 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

171- 

■>175 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

220- 

■>224 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

277- 

■>281 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

382- 

>386 

CK2 PHOSPHO SITE 

PDOC00006 

PSOQ006 

392- 

'>396 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

481- 

>485 

CK2 PH0SPH0"SITE 

PDOC00006 

PS00006 

507- 

■>511 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

512- 

>516 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

542- 

>546 

CK2 PHOSPHO SITE 

PDOC00006 

PS0D0O6 

548- 

>552 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

628- 

>632 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

663- 

>667 

CK2 PHOSPHO SITE 

PDOC00006 

PS00007 

506- 

>515 

TYR^PHOSPHO'SITE 

PDOC00007 

PS00008 

119- 

>125 

MYRISTYL 

PDOCOOOOB 

PS00008 

132- 

■>138 

MYRISTYL 

PDOC00008 

PS00008 

213- 

>219 

MYRISTYL 

PDOC00008 
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PS00008 
PS00008 
PS00008 
PS00008 
PS00009 
PS00009 
PS00009 


288->294 
320->326 
334->340 
590->596 
596->600 
603->607 
641->645 


MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

AMI DAT ION 

AMI DAT ION 

AMIDATION 


PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00009 
PDOC00009 
PDOC00009 


(No Pfam data available for DKF2phtes3_15hl.3) 
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DKF2phtes3_15i5 


group: cell structure and motility 

DKFZphtes3_15i5 encodes a novel 717 amino acid protein with similarity to radial spokehead 
proteins. 

The novel protein is similar to the Chlamydomonas reinhardtii radial spokehead protein of 
flagella or axoneme and to the Strongylocentrotus purpuratus sea urchin spermatozoa protein 
p63. This protein is important for the maintenance of a planar form of sperm flagellar 
beating, in addition, the novel protein contains a transferrin signature 1 for iron-binding. 
The new protein seems to be a part of the human radial spoke heads in spermatozoa. 

BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in modulating the structure of the human spermatozoa 
radia spoke head and modulation of sperm motility in men. 


strong similarity to "radial spokehead" proteins 

complete cDNA, complete cds, 1 EST hit (from a testis library) 
"radial spokehead" part of flagella in Chlamydomona, this protein 
seems to be part of the sperm motor or tail 

Sequenced by GBF 

Locus : unknown 

Insert length: 2478 bp 

Poly A stretch at pos. 2452, polyadenylation signal at pos, 2433 


1 CACCCTGGCC CGCTCCCCGC GCCCTCCACG GGTAACGGCC CCCTCTCTCG 
51 GTGCTCAGAA ACCGGCGGTG TCGACAGGTG GCTCTCGCTT GGCCTCCTTG 
101 TCTGCAAGCC TTTCTCCTAG AGATCTGTGC CTCCTGGCGA ACCATGGGAG 
151 ACCTGCCGCC CTACCCTGAG CGCCCTGCCC AGCAGCCTCC GGGCCGGAGG 
201 ACTTCTCAGG CCTCCCAGAG GCGGCACAGT CGGGACCAAG CTCAGGCCCT 
251 GGCAGCGGAC CCCGAGGAGA GGCAGCAGAT ACCTCCAGAC GCCCAGCGAA 
301 ACGCCCCTGG TTGGTCACAG AGGGGCAGCC TGTCCCAACA GGAGAACTTG 
351 CTGATGCCCC AGGTCTTCCA GGCTGAGGAA GCCCGGCTGG GTGGCATGGA 
401 GTACCCATCT GTGAACACGG GCTTTCCCTC AGAGTTCCAG CCTCAGCCTT 
451 ACTCTGATGA AAGCAGGATG CAGGTCGCCG AGCTCACCAC CAGCCTAATG 
501 CTGCAGCGGC TCCAGCAGGG CCAAAGCAGC CTGTTCCAGC AACTGGACCC 
551 CACCTTCCAG GAGCCCCCAG TCAACCCCTT GGGCCAGTTC AACCTCTACC 
601 AGACAGACCA GTTCTCTGAA GGTGCCCAGC ACGGGCCTTA CATAAGGGAT 
651 GACCCTGCCC TTCAGTTCTT GCCCTCTGAG CTGGGCTTCC CACACTACAG 
701 TGCCCAGGTG CCTGAGCCCG AGCCTCTGGA GCTGGCCGTG CAGAACGCCA 
751 AGGCCTACCT GCTGCAGACC AGCATCAATT GCGACCTCAG .CCTGTACGAG 
801 CACCTGGTAA ATCTGCTGAC CAAGATCCTG AACCAGCGGC CTGAGGACCC 
851 CTTGTCTGTC CTGGAGTCTC TGAACCGCAC CACGCAGTGG GAGTGGTTCC 
901 ACCCCAAGCT GGACACGCTG CGGGACGACC CCGAGATGCA GCCCACCTAC 
951 AAGATGGCGG AGAAACAGAA GGCGCTGTTC ACCCGGAGTG GAGGCGGCAC 
1001 TGAAGGCGAA CAGGAGATGG AGGAGGAGGT GGGGGAGACA CCAGTGCCCA 
1051 ACATCATGGA GACTGCCTTC TACTTCGAGC AGGCCGGCGT CGGCCTGAGC 
1101 TCGGACGAGA GCTTCCGCAT TTTCCTGGCC ATGAAACAGC TGGTGGAGCA 
1151 GCAGCCCATC CACACCTGTC GCTTCTGGGG CAAGATCCTG GGAATCAAAC 
1201 GCAGCTACCT GGTGGCCGAG GTGGAATTCC GGGAGGGCGA GGAGGAGGCA 
1251 GAGGAGGAGG AGGTGGAGGA GATGACGGAA GGTGGCGAGG TCATGGAGGC 
1301 GCACGGCGAG GAGGAGGGCG AGGAGGACGA GGAGAAGGCC GTGGACATCG 
1351 TCCCTAAGTC CGTATGGAAG CCGCCGCCCG TGATCCCCAA GGAGGAGAGC 
1401 CGCTCAGGCG CCAACAAGTA CCTGTACTTT GTGTGCAACG AGCCGGGCCT 
1451 GCCATGGACG CGGCTGCCCC ACGTCACTCC AGCCCAGATC GTGAACGCCC 
1501 GAAAGATCAA GAAGTTCTTC ACAGGCTACC TGGACACGCC AGTCGTCAGC 
1551 TACCCACCCT TCCCGGGCAA CGAGGCCAAC TACCTGCGGG CCCAGATAGC 
1601 CCGCATCTCG GCCGCCACGC AGGTCAGCCC GCTGGGCTTC TACCAGTTTA 
1651 GTGAGGAGGA GGGCGACGAG GAGGAGGAAG GTGGTGCTGG GCGCGACTCC 
1701 TACGAGGAGA ACCCGGACTT CGAGGGCATC CCCGTGCTGG AGCTGGTCGA 
1751 CTCCATGGCC AACTGGGTGC ATCACACACA GCACATCCTG CCGCAGGGCC 
1801 GCTGCACTTG GGTGAACCCT TTGCAGAAGA CAGAGGAGGA GGAGGACCTG 
1851 GGGGAGGAGG AAGAGAAGGC AGATGAGGGG CCAGAGGAGG TGGAGCAGGA 
1901 GGTTGGCCCC CCACTGCTAA CGCCACTTTC AGAAGATGCA GAAATCATGC 
1951 ACCTGGCACC CTGGACCACC CGCCTGTCCT GCAGCCTCTG CCCGCAGTAC 
2001 TCAGTGGCCG TTGTGCGCTC CAACCTCTGG CCCGGGGCCT ATGCCTATGC 
2051 CAGTGGCAAA AAGTTTGAGA ACATCTACAT CGGCTGGGGT CACAAGTACA 
2101 GCCCCGAGAG CTTCAACCCG GCCCTGCCAG CCCCCATTCA ACAAGAGTAC 
2151 CCCAGTGGCC CAGAGATCAT GGAGATGAGT GACCCCACAG TGGAAGAGGA 
2201 GCAGGCTCTG AAAGCAGCCC AGGAACAAGC CCTGGGAGCC ACAGAGGAGG 
2251 AGGAGGAGGG CGAGGAGGAG GAGGAGGGCG AGGAGACAGA TGACTGAGGC 
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2301 CCACCCTCTA GCCACTTTCC CCAAGCAGGT AGATAGCAAA TTTCCCCTTA 
2351 GAGGTAGTTA GCATGGATTA TATTTTCACT ATGTGCTTCC TGTCCCCAGA 
2401 GGGCAGGGAT AGAAAAGGAA GGCAACTGCT TCAAATAAAA TTCCTCCACG 
2451 GCATTAAAAA AT^AAAAAA AAT^AAAAG 


BLAST Results 


No BLAST result 


Medline entries 


86251010: 

Molecular cloning and expression of flagellar radial spoke and dynein 
genes of 

Chlamydomona 

81142496: 

Radial spokes of Chlaroydomonas flagella: polypeptide composition and 
phosphorylation of 

stalk components. 

9450971: 

Molecular cloning and characterization of a radial spoke head protein of sea urchin sperm 
axonemes: involvement of the protein in the regulation of sperm motility. 


ORF from 144 bp to 2294 bp; peptide length: 717 
Category: strong similarity to known protein 


1 MGDLPPYPER PAQQPPGRRT SQASQRRHSR DQAQALAADP EERQQIPPDA 
51 QRNAPGWSQR GSLSQQENLL MPQVFQAEEA RLGGMEYPSV NTGFPSEFQP 
101 QPYSDESRMQ VAELTTSLML QRLQQGQSSL FQQLDPTFQE PPVNPLGQFN 
151 LYQTDQFSEG AQHGPYIRDD PALQFLPSEL GFPHYSAQVP EPEPLELAVQ 
201 NAKAYLLQTS INCDLSLYEH LVNLLTKILN QRPEDPLSVL ESLNRTTQWE 
251 WFHPKLDTLR DDPEMQPTYK MAEKQKALET RSGGGTEGEQ EMEEEVGBTP 
301 VPNIMETAFY FEQAGVGLSS DESFRIFLAM KQLVEQQPIH TCRFWGKILG 
351 IKRSYLVAEV EFREGEEEAE EEEVEEMTEG GEVMEAHGEE EGEEDEEKAV 
401 DIVPKSVWKP PPVIPKEESR SGANKYLYFV CNEPGLPWTR LPHVTPAQIV 
451 NARKIKKFFT GYLDTPWSY PPFPGNEANY LRAQIARISA ATQVSPLGFY 
501 QFSEEEGDEE EEGGAGRDSY EENPDFEGIP VLELVDSHAN WVHHTQHILP 
551 QGRCTWVNPL QKTEEEEDLG EEEEKADEGP EEVEQEVGPP LLTPLSEDAE 
601 IMHLAPWTTR LSCSLCPQYS VAWRSNLWP GAYAYASGKK FENIYIGWGH 
651 KYSPESFNPA LPAPIQQEYP SGPEIMEMSD PTVEEEQALK AAQEQALGAT 
701 EEEEEGEEEE EGEETDD 


Entry U73123_l from database TREMBL; 

product: "radial spokehead"; Strongylocentrotus purpuratus radial 
spokehead mRNA, complete cds. 

Score = 1604, P = 7.4e-165, identities =. 303/523, positives = 395/523 
Entry B44498 from database PIR: 

radial spoke protein 6 - Chlamydomonas reinhardtii 

Score = 386, P » 3.4e-45, identities » 105/264, positives = 138/264 


Peptide information for frame 3 


BLAST? hits 


Alert BLASTP hits for DKF2phtes3_15i5, frame 3 


Ko Alert BLASTP hits found 


Pedant information for DKr2phtes3_15i5, frame 3 


Report for DKF2phtes3_15i5.3 


[LENGTH] 

[MWJ 

[pD 


717 

80913.61 
4.36 
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^ K TREMBL:U73123__1 product: "radial spokehead"; Strongyl ocentrotus purpuratus 

radial spokehead mRNA, complete cds le-130 wv-cm.i:«uu& purpuratus 

[PROSITE] TRANSFERRIN_1 1 

[PROSITE] MYRISTYL 5 

(PROSITE] AMIDATION 2 

(PROSITE J CAMP_PHOSPHO_^SITE 2 

[PROSITE] CK2 PHOSPHOSITE 14 

[ PROSITE 1 TYR^PHOSPHO SITE 1 

[PROSITE J GLYCOSAMINOGLYCAN 1 

(PROSITE] PKC PHOSPHO_SlTE 8 

[PROSITE] ASN^GLYCOSYLATION 1 

(KW) All>lpha 

IKW) LOW_COMPLEXITY 21,48 % 


SEQ MGDLPPYPERPAQQPPGRRTSQASQRRHSRDQAQALAADPEERQQIPPDAQRNAPGWSQR 
• . . • xxxxxxxxxxxx 

PRO ccccccccccccccccccccchhhhhhhhhhhhhhhhhcccccccccccc^^ 

SEQ GSLSQQENLLMPQVFQAEEARLGGMEYPSVNTGFPSEFQPQPYSDESRMQVAELTTSLML 

SCG 

PRD cccchhhhhhhhhhhhhhhhhhccccccccccccccccccccccchhhhhhhhhhh^ 

SEQ QRI'QQGQSSLFQQLDPTFQEPPVNPLGQFNLYQTDQFSEGAQHGPYIRDDPALQFLPSEL 

X5CXXXXXXXXXXXX •.•«•«.,,,,,, 

PRD *»*^hhhcccccccccccccccccccccccccccccccccccccccccccc^^ 

SEQ GFPHYSAQVPEPEPLELAVQNAKAYLLQTSINCDLSLYEHLVNLLTKILNQRPEDPLSVL 

PRD ccccccccccccccchhhhhhhhhhhhhhccccccchhhhh^ 

SEQ ESLNRTTQWEWFHPKLDTLRDDPEMQPTYKMAEKQKALFTRSGGGTEGEQEMEEEVGETP 

SEG 

Don KKK^wui,Kw XXXXXXXXXXXXXXXX . . 

PRD hf^fichhhhhccccccccccccccccchhhhhhhhhhhhhhhcccccchhhhhhhhhcccc 
SEQ 
SEG 
PRD 


SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 


VPNIMETAFYFEQAGVGLSSDESFRIFLAMKQLVEQQPIHTCRFWGKILGIKRSYLVAEV 
ccchhhhhhhhhhccccccchhhhhhhhhhhhhhhhhccchhhhh^ 
EFREGEEEAEEEEVEEMTEGGEVMEAHGEEEGEEDEEKAVDIVPKSVWKPPPVIPKEESR 

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 

hhhhhhhhhhhhhhhhhhcccccccccccccchhhhheeeeeccccccccccc^ 
SGANKYLYFVCNEPGLPWTRLPHVTPAQIVNARKIKKFFTGYLDTPWSYPPFPGNEANY . 
cccceeeeeeeccccccccccccccchhhhhhhhhhhhhhccccccccc^ 
SEQ I'RAQIARISAATQVSPLGFYQFSEEEGDEEEEGGAGRDSYEENPDFEGIPVLELVDSMAN 

XXXXXXXXXXXXX 

PRD *»**hhhhhhhhhhccccccceeeeccccccccccccccccccccccccccceeeecclihhh 

WVHHTQHILPQGRCTWVNPLQKTEEEEDLGEEEEKADEGPEEVEQEVGPPLLTPLSEDAE 

• * • xxxxxxxxxxxxxxxxxxxxxxxxxxx 

hhhcccccccccceeechhhhhhhhhccccchhhhhcccccccccccccccccccccccc 

IMHLAPWTTRLSCSLCPQYSVAWRSNLWPGAYAYASGKKFENIYIGWGHKYSPESFNPA 

cccccccccccccccccccceeeeeeccccceeeecccccceeeeeecccccc^ 


SEQ 
SEG 
PRO 

SEQ 
SEG 
PRD 


S^ I-PAPIQQEYPSGPEIMEMSDPTVEEEQALKAAQEQALGATEEEEEGEEEEEGEETDD 

„rr xxxxxxxxxxxxxx . . . xxxxxxxxxxxxxx . . . 

cccccccccccccceeeeccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccc 


PRD 


Prosite for DKFZphtes3_15i5 .3 


PSOOOOl 

244->248 

PS00002 

282->286 

PS00004 

18->22 

PS00004 

26->30 

PS00005 

24->27 

PS00005 

58->6l 

PS00005 

258->261 

PS00005 

268->271 

PS00005 

323->326 

PS00005 

341->344 

PS00005 

608->611 

PS00005 

637->640 

PS00006 

64->68 

PS00006 

137->141 


ASM_GLYCOSYLATION 
GL YCOSAMI NOGLYC AN 
CAMP_PHOS PHO_SITE 
GAMP^PHOS PHO_S I TE 
PKC_PHOSPH0_SITE 
PKC_PHOSPHO SITE 
PKC_PH0SPH02[SITE 
PKC~PH0S PHOS I T E 
PKC~PHOSPH0_SITE 
PKC_PHOSPHO_S I TE 
PKC_PHOSPH0_S I TE 
PKC_PHOSPHO_SITE 
CK2_PH0SPH0_SITE 
CK2~PH0SPH0 SITE 


PDOCOOOOl 
PDOC00002 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
POOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
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PS00006 

216- 

->220 

CK2 PHOSPHO 

SITE 

PDOC00006 

PS00006 

238->242 

CK2 PHOSPHO' 

"site 

PDOC00006 

PS00006 

247- 

->251 

CK2 PHOSPHO" 

"site 

PDOC00006 

PS00006 

258- 

->262 

CK2 PHOSPHO" 

"site 

PDOC00006 

PS00006 

286- 

->290 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

319- 

->323 

CK2 PHOSPHO 

SITE 

PDOC00006 

PS00006 

503- 

->507 

CK2 PHOSPHO' 

'site 

PDOC00006 

PS00006 

519- 

->523 

CK2 PHOSPHO" 

"site 

PDOC00006 

PS00006 

563->567 

CK2 PHOSPHO" 

"site 

PDOC00006 

PS00006 

671- 

•>675 

CK2""PH0SPH0" 

"site 

PDOC00006 

PS00006 

682- 

->686 

CK2 PHOSPHO" 

"site 

PDOC00006 

PS00006 

700- 

->704 

CK2 PHOSPHO" 

site 

PDOC00006 

PS00007 

639- 

->646 

TYR PHOSPHO SITE 

PDOC00007 

PS00008 

284- 

->290 

MYRISTYL 

PDOC00008 

PS00008 

315- 

■>321 

MYRISTYL 


PDOC00008 

PS00008 

350- 

->356 

MYRISTYL 


PDOC00008 

PS00008 

435- 

■>441 

MYRISTYL 


PDOC00008 

PS00008 

475- 

■>481 

MYRISTYL 


PDOC00008 

P500009 

16->20 

AMIOATION 


PDOC00009 

PS00009 

637- 

>641 

AMIDATION 


PDOC00009 

PS00205 

619- 

>628 

TRANSFERRIN 

1 

PDOC00182 


(No Pfam data available for DKFZphtes3_lSi5.3) 


598 


wo 01/12659 PCT/IBOO/01496 

DKFZphtes3_15jl8 


group: testes derived 

DKFZphtes3_15jl8 encodes a novel 148 amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 


unknown 

complete cDNA, complete cds, few EST hits 
Sequenced by GBF 
Locus : unknown 
Insert length: 905 bp 

Poly A stretch at pos. 839, polyadenylation signal at pos . 815 


1 GTGATTCATA TGCTTCCATA GCAGGTGTCT GCTTCTGAGC CAAGCTCCCA 
51 GGGCAGCGGA GCAGGCACCA ACCAGCATCC CAGGGGAGGG CACAGCTTGT 
101 CCAGCTGGGA TGTTTGGGTG CCCTGTGAGA TGCCCCAAGC CACCAACCCA 
151 GCTTATCTCA GGAGAAGCCT CGGCGGCCCG TCTGCCGGCC TGGAGAGATG 
201 TGCTACAGCA GCCGGGGGTG GGGGGAGAGG GTGGGCTTAG AATCTCTTGG 
251 CAGGGAGCCC CCAAGAGCAG GGTGAGACCT GCCTTCATTT CACCTGTCCC 
301 CTTCACAGTT CTGCAAAGCC AGCATTATCA TCCCTTTTCA GAAGGAGTGG 
351 GCACTCAGGT GGAATGCCTC ACCCCAGTCC TGCGGCTGGA AAGCGATATG 
401 GCCAGGACTG CACCCCACCC CTCATCCCTG CACCCCTTCC CTGCCTGGGA 
451 TTCCTCCAGC CCTGTGCACT GTGGAGCGCC TCTGCCTTCC GCTCATGGAG 
501 GTTTCCCAAG GGCACGCGCT GAGGGCAGCT GGTCTCAGCC TGGGGCCGGG 
551 TCCTAGTAAC TGTCTCTCTT TGCTTTCCAG CCAGTGTTTT GGGGTTTGAA 
601 GTTGGAATCT TCAGCTACTG TCAAGAACAG CCACAAAAAT GTGTCACGAT 
651 CAAGATCTTT GAGAGTCCAC CAATCAGGAG GCGTCTGTGA CAGTCGCTGT 
701 CTTCTCAGAA CAGAATCCAC ACCCAGGATT CAACCCAAAT GATTTCTCAT 
751 CAGGTGATTC TTGGTTGTAG CAAAGTTCAT GTGAATGTGG GTGAGTTTCT 
801 GTTATGAATG TGGTCAATAA ATGTTATTTG TGAAACTCTA AAAAAAAAAA 
851 AAAAAAAAAG GGCGGCCGCT CTAGAGGATC CAAGCTTACG TACGCGAAAA 
901 AAAAG 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 2 


ORF from 110 bp to 553 bp; peptide length: 148 
Category: putative protein 


1 MFGCPVRCPK PPTQLISGEA SAARLPAWRD VLQQPGVGGE GGLRISWQGA 
51 PKSRVRPAFI SPVPFTVLQS QHYHPFSEGV GTQVECLTPV LRLESDMART 
101 APHPSSLHPF PAWDSSSPVH CGAPLPSAHG GFPRARAEGS WSQPGAGS 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_15jl8, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_15jl8, frame 2 
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Report for DKF2phtes3_15j 18 . 2 


(LENGTH! 148 

[MWJ 15665.78 

[pi] 8.91 

(PROSITE) MYRISTYL 3 

[PROSITE] CK2_PH0SPH0 SITE 

fKWJ Irregular 


SEQ MFGCPVRCPKPPTQLISGEASAARLPAWRDVLQQPGVGGEGGLRISWQGAPKSRVRPAFI 

PRD ccdccccccccccccccccccccchhhhhhhccccccccccceeeeeccccccccccccc 

SEO SPVPFTVLQSQHYHPFSEGVGTQVECLTPVLRLESDMARTAPHPSSLHPFPAWDSSSPVH 

PRD cccceeeeeccccccccccccccccccchhhhhhhhcccccccccccccccccccccccc 

SEQ CGAPLPSAHGGFPRARAEGSWSQPGAGS 

PRD cccccccccccccccccccccccccccc 


Prosite for DKFZphtes3_l 5 j 18 . 2 

PS00006 82->86 CK2_PH0SPH0_SITE PDOC00006 

PS00008 38->44 MYRISTYL PDOC00008 

PS00008 42->48 MYRISTYL PDOC00008 

PS00008 49->55 MYRISTYL PDOC00008 


(No Pfam data available for DKFZphtes3_15jl8.2) 
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DKFZphtes3_15j3 


group: nucleic acid management 

DKFZphtes3_15j3 encodes a novel 743 amino acid protein with similarity to proteins with 
unknown function. 

l^t "mw®"*" P"*^®^" contains a RNA recognition motif, predicted by Pfam and therefore binds to 
RNA. The protein is similar to YGR276c, a ribonuclease H of S. cerevisiae. Thus, the protein 
seems to a new RNA-modificating protein. 

The new protein can find application in modulating the RNA metabolism in human cells and as a 
tool for biotechnologic manipulations. 


"44M2.3"; product, differences to genmodel, similarity to ribonuclease 
H 

complete cDNA, coir^jlete cds, EST hits 
YGR276c = ribonuclease H 
differences to genmodel of 44M2.3 

Sequenced by GBF 

Locus: /map="16pll .2" 

Insert length: 2695 bp 

Poly A stretch at pos. 2601, polyadenylation signal at pos. 2579 


1 GCGGTTGTTG TTGGCAGCTG TGGCTAAGGA GGGGAGAACC TCTGCTCCCC 
51 GCCCGTCTTC TCTTCTGCGT TTCCCGGGCT AGGGGGCGTG GGGAGTGGTT 
101 TTAGGCGGCG AAGCCGCTCG GCAGCACCTT CCTTCTTTGC CAGGCAGACG 
151 CCCGTTGTAG CCGTTGGGGA ACCGTTGAGA ATCCGCCATG GAGCCAGAGA 
201 GGGAAGGGAC CGAGAGACAC CCCAGGAAGG TCAGGGAAAG CAGGCAGGCC 
251 CCAAATAAGC TGGTCGGGGC AGCTGAGGCG ATGAAAGCCG GTTGGGATCT 
301 CGAGGAGAGT CAGCCCGAGG CCAAGAAAGC CCGCTTATCT ACCATTTTAT 
351 TTACTGACAA CTGTGAAGTA ACCCATGACC AGCTGTGTGA ATTGCTGAAG 
401 TATGCAGTTC TGGGCAAATC CAATGTTCCA AAACCCAGCT GGTGCCAGCT 
451 TTTTCATCAA AACCACCTAA ACAACGTAGT GGTTTTTGTT CTGCAGGGAA 
501 TGAGTCAGCT ACACTTTTAC AGGTTCTATT TGGAGTTTGG ATGTCTTCGA 
551 AAAGCATTCA GACATAAATT CCGCTTGCCT CCACCATCAT CTGATTTTCT 
601 AGCTGATGTT GTTGGGCTAC AAACTGAACA AAGAGCTGGA GATCTGCCCA 
651 AGACAATGGA AGGGCCTTTA CCTTCTAATG CAAAAGCCGC CATCAACCTT 
701 CAGGATGATC CCATCATTCA AAAGTATGGC TCTAAGAAAG TGGGCTTGAC 
751 CAGATGCCTT CTGACAAAGG AGGAAATGAG AACGTTTCAC TTTCCATTAC 
801 AAGGTTTTCC TGATTGTGAA AACTTTTTAC TTACCAAATG TAATGGTTCT 
851 ATAGCAGACA ATAGTCCTCT CTTTGGACTT GACTGTGAAA TGTGCCTCAC 
901 ATCCAAGGGG AGAGAGCTAA CACGCATCTC ACTGGTTGCT GAAGGAGGCT 
951 GCTGTGTTAT GGATGAACTG GTCAAACCTG AAAACAAGAT TCTGGACTAC 
1001 CTCACCAGCT TTTCGGGAAT CACGAAGAAG ATTCTTAACC CAGTGACGAC 
1051 CAAACTCAAA GATGTACAGA GGCAGTTAAA AGCACTGCTT CCTCCTGATG 
1101 CTGTGTTAGT GGGCCACTCC TTAGATTTGG ATCTCAGAGC ACTGAAAATG 
1151 ATACATCCAT ATGTTATTGA TACATCGTTG CTTTATGTCA GAGAGCAGGG 
1201 CAGAAGATTT AAGCTCAAGT tCTTAGCCAA AGTTATTTTG GGGAAGGATA 
1251 TACAGTGTCC AGACAGACTT GGTCATGATG CCACAGAAGA TGCTAGAACA 
1301 ATCCTTGAAT TGGCTCGGTA TTTCCTTAAG CATGGCCCAA AAAAGATTGC 
1351 AGAACTAAAT CTAGAAGCAC TAGCTAATCA CCAAGAAATA CAAGCAGCAG 
1401 GCCAAGAGCC TAAAAACACA GCAGAAGTAC TTCAGCACCC AAACACAAGT 
1451 GTTTTAGAAT GCTTGGATTC AGTGGGTCAG AAGCTTCTTT TTTTGACCCG 
1501 GGAGACAGAT GCTGGTGAAC TTCCATCTTC CAGAAATTGT CAAACTATTA 
1551 AGTGTCTTTC AAATAAAGAG GTTCTTGAGC AGGCCAGAGT GGAAATCCCC 
1601 CTGTTTCCCT TCAGCATTGT TCAGTTCTCT TTTAAGGCCT TTTCACCTGT 
1651 CCTCACTGAG GAGATGAACA AAAGGATGAG GATCAAGTGG ACAGAGATAT 
1701 CAACTGTCTA TGCTGGGCCA TTTAGCAAAA ATTGCAATCT CAGGGCTCTG 
1751 AAGAGGCTGT TTAAAAGCTT TGGCCCAGTC CAGTCAATGA CTTTTGTTCT 
1801 TGAAACCCGT CAGGTGCAGA GGCCTGTGAC AGAGCTCACG CTTGATTGTG 
1851 ACACCCTCGT GAATGAGCTG GAAGGAGATT CTGAAAACCA AGGCTCTATA 
1901 TATCTGTCTG GAGTGAGTGA AACCTTCAAA GAACAGCTAT TGCAGGAGCC 
1951 CCGCCTCTTT CTTGGCCTGG AAGCTGTGAT CTTGCCTAAA GATCTTAAAA 
2001 GTGGAAAGCA GAAAAAATAC TGTTTCCTGA AATTCAAAAG TTTTGGCAGT 
2051 GCCCAGCAGG CCCTCAACAT TCTCACAGGC AAGGACTGGA AGCTGAAAGG 
2101 CAGGCATGCC CTAACCCCCA GGCACCTCCA TGCCTGGCTC AGAGGCTTAC 
2151 CACCTGAATC AACAAGGCTC CCAGGGCTTC GTGTTGTACC TCCCCCCTTT 
2201 GAACAGGAGG CCTTGCAGAC TCTGAAACTG GACCACCCGA AGATAGCAGC 
2251 CTGGCGCTGG AGCCGGAAGA TTGGAAAGCT CTACAACAGC TTGTGCCCGG 
2301 GCACTCTCTG CCTCATCCTG CTGCCAGGAA CCAAGAGCAC TCATGGTTCA 
2351 CTCTCTGGTC TAGGACTGAT GGGAATAAAA GAGGAAGAAG AAAGCGCTGG 
2401 CCCAGGCCTG TGTTCGTGAG TCGGCCTGCC ATGTTTCCAT GTGCCATTTC 
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2451 TTACCCCTTG TAGGCAATGG CAAAGAATGT GGTCAGGCTG TAGCCTCCCC 
2501 AACCAGCAGA CAGTTTTATG GAAACTTGGT ATAGCAGCTA AAAGAGTTTA 
2551 GTTTGTTTAT ATGGCATGTA TAAGTTTTCA ATAAATGCCT AAAGTTCAAG 
2601 CATAAAAAAA AAAAAAAAAA AAAAAAAAAA AT^AAAAAAAA AAAAAAAAAA 
2651 AGGGCGGCCG CTCTAAAGGA TCCAAGCTTA CGTACGCGAA AAAAG 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 2 


ORF from 188 bp to 2416 bp; peptide length: 743 
Category: similarity to known protein 


1 MEPEREGTER HPRKVRESRQ APNKLVGAAE AMKAGWDLEE SQPEAKKARL 

51 STILFTDNCE VTHDQLCELL KYAVLGKSNV PKPSWCQLFH QNHLNNWVF 

101 VLQGMSQLHF YRFYLEFGCL RKAFRHKFRL PPPSSDFLAD VVGLQTEQRA 

151 GDLPKTMEGP LPSNAKAAIN LQDDPIIQKY GSKKVGLTRC LLTKEEMRTF 

201 HFPLQGFPDC ENFLLTKCNG SIADNSPLFG LDCEMCLTSK GRELTRISLV 

251 AEGGCCVMDE LVKPENKILD YLTSFSGITK KILNPVTTKL KDVQRQLKAL 

301 LPPDAVLVGH SLDLDLRALK MIHPYVIDTS LLYVREQGRR FKLKFLAKVI 

351 LGKDIQCPDR LGHDATEDAR TILELARYFL KHGPKKIAEL NLEALANHQE 

401 IQAAGQEPKN TAEVLQHPNT SVLECLDSVG QKLLFLTRET DAGELPSSRN 

451 CQTIKCLSNK EVLEQARVEI PLFPFSIVQF SFKAFSPVLT EEMNKRMRIK 

501 WTEISTVYAG PFSKNCNLRA LKRLFKSFGP VQSMTFVLET RQVQRPVTEL 

551 TLDCDTLVNE LEGDSENQGS lYLSGVSETF KEQLLQEPRL FLGLEAVILP 

601 KDLKSGKQKK YCFLKFKSFG SAQQALNILT GKDWKLKGRH ALTPRHLHAW 

651 LRGLPPESTR LPGLRWPPP FEQEALQTLK LDHPKIAAWR WSRKIGKLYN 

701 SLCPGTLCLI LLPGTKSTHG SLSGLGLMGI KEEEESA6PG LCS 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKF2phtes3_15j3, frame 2 

TREMBL:AC004381_4 gene: "44M2.3"; product: "Unknown gene product"; 
Homo sapiens Chromosome 16 BAC clone CIT987SK-44M2, complete sequence., 

N = 2, Score = 1827, P = 2.1e-284 

TREMBL:AF0164 30_4 gene: "C05C8.5"; Caenorhabditis elegans cosmid 
C05C8., N = 2, Score = 370, P = 1.7e-34 

PIR:S64609 hypothetical protein YGR276c - yeast (Saccharomyces 
cerevisiae), N =» 2, Score = 334, P = 1.8e-27 

TREMBLNEW:SPAC637_9 gene: "SPACeS? . 09" ; product: "putative 
exonuclease"; S.pombe chromosome I cosmid c637., N = 3, Score = 326, P 
- 2.8e-27 

>TREMBL: AC004381_4 gene: •'4 4M2.3"; product: "Unknown gene product"; Homo 
sapiens Chromosome 16 BAC clone CIT987SK-44M2, complete sequence. 
Length « 547 

HSPs: 

Score = 1827 (274.1 bits). Expect = 2.1e-284, Sum P(2) = 2.1e-284 
Identities = 358/373 (95%), Positives « 358/373 (95%) 

Query: 105 MSQLHFYRFYLEFGCLRKAFRHKFRLPPPSSDFLADWGLQTEQRAGDLPKTMEGPLPSN 164 

MSQLHFYRFYLEFGCLRKAFRHKFRLPPPSSDFLADWGLQTEQRAGDLPKTMEGPLPSN 
Sbjct: 1 MSQLHFYRFYLEFGCLRKAFRHKFRLPPPSSDFLADVVGLQTEQRAGDLPKTMEGPLPSN 60 

Query: 165 AKAAINLQDDPIIQKYGSKKVGLTRCLLTKEEMRTFHFPLQGFPDCENFLLTKCNGSIAD 224 

AKAAINLQDDPIIQKYGSKKVGLTRCLLTKEEMRTFHFPLQGFPDCENFLLTKCNGSIAD 
Sbjct: 61 AKAAINLQDDPIIQKYGSKKVGLTRCLLTKEEMRTFHFPLQGFPDCENFLLTKCNGSIAD 120 
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Query: 

225 


269 



NSPLFGLDCEM CLTSKGRELTRISLVAEGGCCVMDELVKPENKIL 


Sbjct : 

121 

NS PL FG LDCEMARTT FN FS I GVLQAEC LTS KGRELT RI S LV AEGGCCVMDELVK PEN KI L 

180 

Query: 

270 

DYLTSFSGITKKILNPVTTKLKDVQRQLKALLPPDAVLVGHSLDLDLRALKMIHPYVIDT 

329 



DYLTSFSGITKKILNPVTTKLKDVQRQLKALLPPDAVLVGHSLDLDLRALKMIHPyVIDT 


Sbjct: 

181 

DYLTSFSGITKKILNPVTTKLKDVQRQLKALLPPDAVLVGHSLDLDLRALKMIHPYVIDT 

240 

Query: 

330 

SLLYVREQGRRFKLKFLAKVILGKDIQCPDRLGHDATEDARTILELARYFLKHGPKKIAE 

389 



SLLYVREQGRRFKLKFLAKVILGKDIQCPDRLGHDATEDARTILELARYFLKHGPKKIAE 


Sbjct: 

241 

SLLYVREQGRRFKLKFLAKVILGKDIQCPDRLGHDATEDARTILELARYFLKHGPKKIAE 

300 

Query: 

390 

LN LEAL ANHQEI QAAGQEPKNTAEVLQH PNTSVLECL DS VGQKLL FLTRETDAGELPS SR 

449 



LNLEALANHQEIQAAGQEPKNTAEVLQHPNTSVLECLDSVGQKLLFLTRETDAGELPSSR 


Sbjct: 

301 

LNLEALANHQEIQAAGQEPKNTAEVLQHPNTSVLECLDSVGQKLLFLTRETDAGELPSSR 

360 

Query: 

450 

NCQTIKCLSNKEV 462 




NCQTIKCLSNKEV 


Sbjct: 

361 

NCQTIKCLSNKEV 373 


Score 

= 929 

(139.4 bits). Expect = 2.1e-284, Sum P(2) = 2.1e-284 


Identities = 

= 175/179 (97%), Positives = 177/179 (98%) 


Query: 

538 

LETRQVQRPVTELTLDCDTLVNELEGDSENQGSIYLSGVSETFKEQLLQEPRLFLGLEAV 

597 



L ++VQRPVTELTLDCDTLVNELEGDSENQGSiyLSGVSETFKEQLLQEPRLFLGLEAV 


Sbjct: 

368 

LSNKEVQRPVTELTLDCDTLVNELEGDSENOGSIYLSGVSETFKEQLLQEPRLFLGLEAV 

427 

Query: 

598 

ILPKDLKSGKQKKYCFLKFKSFGSAQQALNILTGKDWKLKGRHALTPRHLHAWLRGLPPE 

657 



ILPKDLKSGKQKKYCFLKFKSFGSAQQALNILTGKDWKLKGRHALTPRHLHAWLRGLPPE 


Sbjct: 

428 

ILPKDLKSGKQKKYCFLKFKSFGSAQQALNILTGKDWKLKGRHALTPRHLHAWLRGLPPE 

487 

Query: 

658 

STRLPGLRWPPPFEQEALQTLKLDHPKIAAWRWSRKIGKLYNSLCPGTLCLILLPGTK 716 



STRLPGLRVVPPPFEQEALQTLKLDHPKIAAWRWSRKIGKLYNSLCPGTLCLILLPGTK 


Sbjct: 

488 

STRLPGLRVVPPPFEQEALQTLKLDHPKIAAWRWSRKIGKLYNSIiCPGTLCLILLPGTK 546 


Pedant information for DKFZphtes3_15j3, frame 2 


Report for DKFZphtes3_15j3 . 2 


[LENGTH) 

[MWJ 

[pll 

[HOMOLI 

Chromosome 

IFUNCATJ 

CFUNCAT] 

(FUNCATJ 

YGL094cl le 

[FUNCATl 

cerevisiae 

[FUNCAT] 

tPROSITE] 

tPROSITE] 

(PROSITE] 

tPROSITE] 

[PROSITE] 

{PROSITE] 

(PROSITE] 

(PFAM) 

IKW] 


743 

83536.58 
8.87 

TREMBL:AC004381_4 gene: ''44M2.3"; product: "Unknown gene product"; Homo sapiens 
16 BAC clone CIT987SK-44M2, complete sequence. 0,0 

01.03.16 polynucleotide degradation [S. cerevisiae, YGR276cl 4e-30 
99 unclassified proteins [S. cerevisiae, YLR107w) 3e-13 

05.04 translation (initiation, elongation and termination) (S. cerevisiae. 


-10 

04.05.05 mrna processing {5 '-end, 3' 
YGL094CJ le-10 

03-22 cell cycle control and mitosis 
MYRISTYL 5 
AMI DAT I ON 1 
CK2_PH0SPH0 SITE 8 
TYR_PHOSPHO~SITE 1 
GLYCOSAMINOGLYCAN 1 
PKC_PHOSPHO_SITE 16 
ASN^GLYCOSYLATION 2 
RNA recognition motif, (aica RRN, 
Alpha_Beta 


end processing and mrna degradation) [S. 
(S. cerevisiae, YOL080c) 2e-10 


RBD, or RNP domain) 


SEQ MEPEREGTERHPRKVRESRQAPNKLVGAAEAMKAGWDLEESQPEAKKARLSTILFTDNCE 

PRD ccchhhhhccccchhhhhhhhcchhhhhhhhhhccccccccccchhhhhhccccccccce 

SEQ VTHDQLCELLKYAVLGKSNVPKPSWCQLFHQNHLNNWVFVLQGMSQLHFYRFYLEFGCL 

PRD eehlihhhhhhhhhhhcccccccccceeeeccccccceeeeeeecchhhhhhhhhhhhhhh 

SEQ RKAFRHKFRLPPPSSDFLADVVGLQTEQRAGDLPKTMEGPLPSNAKAAINLQDDPI IQKY 

PRD hhhhhhhhccccccccchhhhhhhhhhhhccccccccccccccchhhhhhhhcccccccc 

SEQ GSKKVGLTRCLLTKEEMRTFHFPLQGFPDCENFLLTKCNGSIADNSPLFGLDCEMCLTSK 

PRD ccccccchhhhhhhhhhhhhhccccccccccceeeeccccccccccceeeeccccccccc 

SEQ GRELTRISLVAEGGCCVMDELVKPENKILDYLTSFSGITKKILNPVTTKLKDVQRQLKAL 

PRD cchhhhheeeecccceeeeeeeccccceeecccccccccccccccccchhhhhhhhhhhh 
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SEQ LPPDAVLVGHSLDLDLRALKMIHPYVI DTSLLYVREQGRRFKLKFLAKVILGKDIQCPDR 

PRD hccceeeecccchhhhhhhhhhhhccccceeeeccccccchhhhhhhhhhhhhhcccccc 

SEQ LGHDATEDARTILELARYFLKHGPKKIAELNLEALANHQEIQAAGQEPKNTAEVLQHPNT 

PRD ccccchhhhhhhhhhhhhhhhcccceeeeehhhhhhhhhhhhhhccccccceeeeecccc 

SEQ SVLECLDSVGQKLLFLTRETDAGELPSSRNCQTIKCLSNKEVLEQARVEIPLFPFSIVQF 

PRD ceeeeeeccccceeeeeecccccccccccccceeeeecchhhhhhhhhhccccccceeee 

SEQ SFKAFSPVLTEEMNKRMRIKWTEISTVYAGPFSKNCNLRALKRLFKSFGPVQSMTFVLET 

PRD eeeceeeehhhhhhhhhhhhheeeeeecccccccchhhhhhhhhhhccccceeeehhhhh 

SEQ RQVQRPVTELTLDCDTLVNELEGDSENQGSIYLSGVSETFKEQLLQEPRLFLGLEAVILP 

PRD cccccccccccccchhhhhhcccccccccccccccchhhhhhhhhhhhcccccceeeeec 

SEQ KDLKSGKQKKYCFLKFKSFGSAQQALNILTGKDWKLKGRHALTPRHLHAWLRGLPPESTR 

PRD ccccccccceeeeeeeecccchhhhhhhhhccccccccccccccchhhhhhccccccccc 

SEQ LPGLRWPPPFEQEALQTLKLDHPKIAAWRWSRKIGKLYNSLCPGTLCLILLPGTKSTHG 

PRD ccccccccccchhhhhhhhhhcchhhhhhhhhhhhhheeeeccccceeeeeccccccccc 

SEQ SLSGLGLMGIKEEEESAGPGLCS 

PRD cccccccchhhhhhccccccccc 


Prosite for DKFZphtes3_15j3 . 2 


PSOOOOl 

219 

->223 

ASN GLYCOSYLATION 

PDOCOOOOl 

PSOOOOl 

419 

->423 

ASN GLYCOSYLATION 

PDOCOOOOl 

PS00002 

723 

->727 

GLYCOS AMI NOGL YCAN 

PDOC0a002 

PS00005 

1 

B->11 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

182 

->185 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

238- 

->241 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

279- 

->282 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

287- 

->290 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

447- 

->450 

PKC PHOSPHO SITE 

PDOCOO005 

PS00005 

453->456 

PKC"PHOS PHO~S I T E 

PDOC00005 

PS00005 

458- 

->461 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

481- 

->484 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

579- 

->582 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

605- 

->608 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

630- 

>>633 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

643- 

->646 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

658->661 

PKC PHOSPHORS I TE 

PDOC00005 

PS0OO05 

678- 

->681 

PKC PHOSPHORS I TE 

PDOC0O005 

PS00005 

692- 

->695 

PKC PHOSPHO SITE 

PDOC00005 

PS00006 

41->45 

CK2 PHOSPHORS I TE 

PDOC00006 

PS00006 

193->197 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

221- 

->225 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

371- 

■>375 

CK2 PHOSPHO'SITE 

PDOC00006 

PS00006 

421- 

■>425 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

458- 

->462 

CK2 PHOSPHO^SITE 

PDOC00006 

PS00006 

579- 

->583 

CK2 -PHOSPHO SITE 

PDOC00006 

PS00006 

630- 

•>634 

CK2 PHOSPHO SITE 

PDOC00006 

PS00007 

370- 

•>379 

TYR PHOSPHO SITE 

PDOC00007 

PS00008 

27->33 

MYRISTYL 

PDOC00008 

PS00008 

186- 

■>192 

MYRISTYL 

PDOC00008 

PS00008 

575- 

■>581 

MYRISTYL 

PDOC00008 

PS00008 

714- 

'>720 

MYRISTYL 

PDOC00008 

PS00008 

720- 

>726 

MYRISTYL 

PDOC00008 

PS00009 

337- 

>341 

AMI DAT I ON 

PDOC00009 


Pfam for DKFZphtes3_15j3.2 


HMM_NAME RNA recognition motif, (aka RRM, RBD, or RNP domain) 

HMM * I YVGNLPWDtTEEDLr Dl FsQFGpIvs I rMMr DReTGRSRGFAFVEFED 

IY+ +++ +T -fE-fL + F ■•■ ■^ -I-++D G+ + •^+F +F++ 
Query 571 lYLSGVS-ETFKEQLLQEPRLFLGLEAVILPKDLKSGKQKKYCFLKFKS 618 

HMM EEDAekAIdeMNG. .meFrnGRrlRV*- 

+k+ A+ + G ++ GR + 
Query 619 FGSAQQALNILTGKDWKLKGRHALT 643 
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group: signal transduction 

S^Sl'proteln and°h?oh%ir?^ "'^'^ C-terminal identical with human 

MAAU/Hi protean and hxgh Sinularity to protein kinases. 

The novel protein contains a protein kinase ATP-binding region signature and a 
Troii'^^^e^Ty^c^^^^^^^ ^^^r^se was cloned 

on'tSi^ ^[nase" application in modulation of intracellular signal pathways dependent 

KIAA0781, 5* extension 

complete cDNA, complete cds, potential start at Bp 97, EST hits 
Sequenced by GBF 
Locus: /roap="ll" 
Insert length: 4868 bp 

Poly A stretch at pos. 4798, polyadenylation signal at pos. 4776 

1 GAGCAAGCGG AGCGGCCGTC GCCCAAGCCA AGCCGCGCTG CCAACCCTCC 
51 CGCCCGCCCG CGCTCCTGTC CGCCGTGTCT AGCAGCGGGG CCCAGCATGG 

101 TCATGGCGGA TGGCCCGAGG CACTTGCAGC GCGGGCCGGT CCGGGTGGGG 

151 TTCTACGACA TCGAGGGCAC GCTGGGCAAG GGCAACTTCG CTGTGGTGAA 

201 GCTGGGGCGG CACCGGATCA CCAAGACGGA GGTGGCAATA AAAATAATCG 

251 ATAAGTCTCA GCTGGATGCA GTGAACCTTG AGAAAATCTA CCGAGAAGTA 

301 CAAATAATGA AAATGTTAGA CCACCCTCAC ATAATCAAAC TTTATCAGGT 

351 AATGGAGACC AAAAGTATGT TGTACCTTGT GACAGAATAT GCCAAAAATG 

401 GAGAAATTTT TGACTATCTT GCTAATCATG GCCGGTTAAA TGAGTCTGAA 

451 GCCAGGCGAA AATTCTGGCA AATCCTGTCT GCTGTTGATT ATTGTCATGG 

501 TCGGAAGATT GTGCACCGTG ACCTCAAAGC TGAAAATCTC CTGCTGGATA 

551 ACAACATGAA TATCAAAATA GCAGATTTCG GTTTTGGAAA TTTCTTTAAA 

601 AGTGGTGAAC TGCTGGCAAC ATGGTGTGGC AGCCCCCCTT ATGCAGCCCC 

651 AGAAGTCTTT GAAGGGCAGC AGTATGAAGG ACCACAGCTG GACATCTGGA 

701 GTATGGGAGT TGTTCTTTAT GTCCTTGTCT GTGGAGCTCT GCCCTTTGAT 

751 GGACCGACTC TTCCAATTTT GAGGCAGAGG GTTCTGGAAG GAAGATTCCG 

801 GATTCCGTAT TTCATGTCAG AAGATTGCGA GCACCTTATC CGAAGGATGT 

851 TGGTCCTAGA CCCATCCAAA CGGCTAACCA TAGCCCAAAT CAAGGAGCAT 

901 AAATGGATGC TCATAGAAGT TCCTGTCCAG AGACCTGTTC TCTATCCACA 

951 AGAGCAAGAA AATGAGCCAT CCATCGGGGA GTTTAATGAG CAGGTTCTGC 
1001 GACTGATGCA CAGCCTTGGA ATAGATCAGC AGAAAACCAT TGAGTCTTTG 
1051 CAGAACAAGA GCTATAACCA CTTTGCTGCC ATTTATTTCT TGTTGGTGGA 
1101 GCGCCTGAAA TCACATCGGA GCAGTTTCCC AGTGGAGCAG AGACTTGATG 
1151 GCCGCCAGCG TCGGCCTAGC ACCATTGCTG AGCAAACAGT TGCCAAGGCA 
1201 CAGACTGTGG GGCTCCCAGT GACCATGCAT TCACCGAACA TGAGGCTGCT 
1251 GCGATCTGCC CTCCTCCCCC AGGCATCCAA CGTGGAGGCC TTTTCATTTC 
1301 CAGCATCTGG CTGTCAGGCG GAAGCTGCAT TCATGGAAGA AGAGTGTGTG 
1351 GACACTCCAA AGGTCAATGG CTGTCTGCTT GACCCTGTGC CTCCTGTCCT 
1401 GGTGCGGAAG GGATGCCAGT CACTGCCCAG CAACATGATG GAGACCTCCA 
1451 TTGACGAAGG GCTGGAGACA GAAGGAGAGG CCGAGGAAGA CCCCGCTCAT 
1501 GCCTTTGAGG CATTTCAGTC CACACGCAGC GGGCAGAGAC GGCACACTCT 
1551 GTCAGAAGTG ACCAATCAAC TGGTCGTGAT GCCTGGGGCA GGGAAAATTT 
1601 TCTCCATGAA TGACAGCCCC TCCCTTGACA GTGTGGACTC TGAGTATGAT 
1651 ATGGGGTCTG TTCAGAGGGA CCTGAACTTT CTGGAAGACA ACCCTTCCCT 
1701 TAAGGACATC ATGTTAGCCA ATCAGCCTTC ACCCCGCATG ACATCTCCCT 
1751 TCATAAGCCT GAGACCTACC AACCCAGCCA TGCAGGCTCT GAGCTCCCAG 
1801 AAACGAGAGG TCCACAACAG GTCTCCAGTG AGCTTCAGAG AGGGCCGCAG 
1851 AGCATCAGAT ACCTCCCTCA CCCAGGGAAT TGTAGCATTT AGACAACATC 
1901 TTCAGAATCT GGCTAGAACC AAAGGAATTC TAGAGTTGAA CAAAGTGCAG 
1951 TTGTTGTATG AACAAATAGG ACCGGAGGCA GACCCTAACC TGGCGCCGGC 
2001 GGCTCCTCAG CTCCAGGACC TTGCTAGCAG CTGCCCTCAG GAAGAAGTTT 
2051 CTCAGCAGCA GGAAAGCGTC TCCACTCTCC CTGCCAGCGT GCATCCCCAG 
2101 CTGTCCCCAC GGCAGAGCCT GGAGACCCAG TACCTGCAGC ACAGACTCCA 
2151 GAAGCCCAGC CTTCTGTCAA AGGCCCAGAA CACCTGTCAG CTTTATTGCA 
2201 AAGAACCACC GCGGAGCCTT GAGCAGCAGC TGCAGGAACA TAGGCTCCAG 
2251 CAGAAGCGAC TCTTTCTTCA GAAGCAGTCT CAACTGCAGG CCTATTTTAA 
2301 TCAGATGCAG ATAGCAGAGA GCTCCTACCC ACAGCCAAGT CAGCAGCTGC 
2351 CCCTTCCCCG CCAGGAGACT CCACCGCCTT CTCAGCAGGC CCCACCGTTC 
2401 AGCCTGACCC AGCCCCTGAG CCCCGTCCTG GAGCCTTCCT CCGAGCAGAT 
2451 GCAATACAGC CCTTTCCTCA GCCAGTACCA AGAGATGCAG CTTCAGCCCC 
2501 TGCCCTCCAC TTCCGGTCCC CGGGCTGCTC CTCCTCTGCC CACGCAGCTA 
2551 GAGCAGCAGC AGCCGCCACC GCCACCACCC CCTCGACCAC CACGACAGCC 
2601 AGGAGCTGCC CCAGCCCCCT TACAGTTCTC CTATCAGACT TGTGAGCTGC 
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2651 CAAGCGCTGC TTCCCCTGCG CCAGACTATC CCACTCCCTG TCAGTATCCT 
2701 GTGGATGGAG CCCAGCAGAG CGACCTAACG GGGCCAGACT GTCCCAGAAG 
2751 CCCAGGACTG CAAGAGGCCC CCTCCAGCTA CGACCCACTA GCCCTCTCTG 
2801 AGCTACCTGG ACTCTTTGAT TGTGAAATGC TAGACGCTGT GGATCCACAA 
2851 CACAACGGGT ATGTCCTGGT GAATTAGTCT CAGCACAGGA ATTGAGGTGG 
2901 GTCAGGTGAA GGAAGAGTGT ATGTTCCTAT TTTTATTCCA GCCTTTTAAA 
2951 TTTAAAGCTT ATTTTCTTGC CCTCTCCCTA ACGGGGAGAA ATCGAGCCAC 
3001 CCAACTGGAA TCAGAGGGTC TGGCTGGGGT GGATGTTGCT TCCTCCTGGT 
3051 TCTGCCCCAC CACAAAGTTT TCTGTGGCAA GTGCTGGAAC ATAGTTGTAG 
3101 GCTGAGGCTC CTGCCCTTCG GTCGAGTGGA GCAAGCTCTC GAGGGCAGCA 
3151 CTGACAAATG TGTTCCTAAG AAGACATTCA GACCCAGGTC TTATGCAGGA 
3201 TTACATCCGT TTATTATCAA GGGCAACCTT GGTGAAAGCA GAAAGGGTGT 
3251 GTGCTATTGC ATATATATGG GGGAAAAGGC AATATATTTT TCACTGAAGC 
3301 TGAGCAACCA CATATTGCTA CAAGGCAAAT CAAGAAGACA TCAGGAAATC 
3351 AGATGCACAG GAAATAAAGG AAAGCTGTGC TTTGTCATTG AATCCTAAGT 
3401 TCTTAGCTGC TGATGCAAGT TGTCCCCCAA GGCCATCACA AAGCAGTGGG 
3451 GCATGAGCTG TGTTTCAGGG GCCACTAAAT AACAGCTGGT ACTGACCCCA 
3501 GAAACCGCCT TCATCTCCAT TCGGAAGCAG GTGACACACC CCTTCAGAAG 
3551 GTGCCCTGGG TTGCCGAGTG TCAGAATATA CTCAGGACTC CAGAGGTGTC 
3601 ACACGTGGAA CTGACAGGAG ACCCGCCACC GTGGAGGCAG GGGGCAAGAA 
3651 ACTCAAGAAC GCATCAAGAG CACCAGCCCT GGGCCAGGGA AGACAGGCTC 
3701 TTCCTGCAGT TTCTCGTGGA CACTGCTGGC TTGCGGGCAG TCGGTCTCCA 
3751 GGGTACCTGT TGTCTCTTTT CCGATGTAAT AACTACTTTG ACCTTACACT 
3801 ATATGTTGCT AGTAGTTTAT TGAGCTTTGT ATATTTGGAC AGTTTCATAT 
3851 AGGGCTTAGA GATTTTAAGG ACATGATAAA TGAACTTTTC TGTCCCATGT 
3901 GAAGTGGTAG TGCGGTGCCT TTCCCCCAGA TCATGCTTTA ATTCTTTCTT 
3951 TTCTGTAGAA ACCAACAGTT TCCATTTATG TCAATGCTAA ATCCAAAGTC 
4001 ACTTCAGAGT TTGTTTTCCA CCATGTGGGA ATCAGCATTC TTAATTTCGT 
4051 TAAAGTTTTG ACTTGTAATG AAATGTTCAA GTATTACAGC AATATTCAAA 
4101 GAAAGAACCA CAGATGTGTT AACCATTTAA GCAGATCATC TGCCAAACAT 
4151 TATATTACTA ATAAAACTTA ACCAACAGTT ACAATTCAGT CATCAAAGTA 
4201 AGTAAAAATT AGATGCTACA GCTAGCTAAC TGTATCCCTA GAAATGATGA 
4251 ATAATTTGCC ATTTGGACAG TTAACATCCA GGTGTTACAA AGTCAGTGTT 
4301 AATTCTAAAG ATGATCATTT CTGCCCTTTA GAATGGCTTG TCCCATCAGC 
4351 AGATGAATGT GTTAAGCACA AAGCATCTTC CTTAAAGCAC AAAGAGAGGG 
4401 ACTAACTGAT GCTGCATCTA GAAAACACCT TTAAGTTGCC TTTCCTCTTT 
4451 GTAGTTAGCG TTCAGGCAGG TGACGTGTGG AAAGTCTAGG GGGTTCCATT 
4501 CTGGCCATGC GAGCCCAGCT CCTACCAACG TCGGTAACTT GAGCAGTCCC 
4551 TGTTGCTGGC CAGAGACTGC CTGGTCGCCA GCGCTCACCA TGGGTGCCAG 
4 601 GATGCTTCGC AGAGGCACTG TGCTCACGGT TGGACTTGGT GTCAGTGGGA 
4651 AAGGGCAGTG TGGGGACTGT CATTTTTGTG ATTTAATAAC ACACAGTGAA 
4701 AATCCAGGAA GAATGAATTA AGCTTCTTCT GGGAGTTGTT TATTCCTGCT 
4751 CGTGCTTAAG ATTGATGATT TCGTGAAATA AAGAACATCA TTTCATTTAA 
4801 AAAAAAAAAA AAAAAAAGGG CGGCCGCTCT AGAGGATCCA AGCTTACGTA 
4851 CGCGTGAAAA AAAAAAAG 


BLAST Results 


Entry HSG4921 from database EMBL: 
human STS SHGC-37164. 
Score = 1605, P = 1.9e-66, identities = 349/369 

Entry AB018324 from database EMBL: 

Homo sapiens mRNA for KIAA0781 protein, partial cds. 
Score = 10725, P * O.Oe+00, identities » 2145/2145 


Medline entries 


No Medline entry 


Peptide information for frame 1 


ORF from the beginning to 2874 bp; peptide length: 959 
Category: knovm protein 


1 EQAERPSPKP SRAANPPARP RSCPPCLAAG PSMVMADGPR HLQRGPVRVG 
51 FYDIEGTLGK GNFAWKLGR HRITKTEVAI KIIDKSQLDA VNLEKIYREV 
101 QIMKMLDHPH IIKLYQVMET KSMLYLVTEY AKNGEIFDYL ANHGRLNESE 
151 ARRKFWQILS AVDYCHGRKI VHRDLKAENL LLDNNMNIKX ADFGFGNFFK 
201 SGELLATWCG SPPYAAPEVF EGQQYEGPQL DIWSMGWLY VLVCGALPFD 
251 GPTLPILRQR VLEGRFRIPY FMSEDCEHLI RRMLVLDPSK RLTIAQIKEH 
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301 KWMLIEVPVQ RPVLYPQEQE NEPSIGEFNE QVLRLMHSLG IDQQKTIESL 
351 QNKSYNHFAA lYFLLVERLK SKRSSFPVEQ RLDGRQRRPS TIAEQTVAKA 
401 QTVGLPVTMH SPNMRLLRSA LLPQASNVEA FSFPASGCQA EAAFMEEECV 
4 51 DTPKVNGCLL DPVPPVLVRK GCQSLPSNMM ETSIDEGLET EGEAEEDPAH 
501 AFEAFQSTRS GQRRHTLSEV TNQLWMPGA GKIFSMNDSP SLDSVDSEYD 
551 MGSVQRDLNF LEDNPSLKDI MLANQPSPRM TSPFISLRPT NPAMQALSSQ 
601 KREVHNRSPV SFREGRRASD TSLTQGIVAF RQHLQNLART KGILELNKVQ 
651 LLYEQIGPEA DPNLAPAAPQ LQDLASSCPQ EEVSQQQESV STLPASVHPQ 
701 LSPRQSLETQ YLQHRLQKPS LLSKAQNTCQ LYCKEPPRSL EQQLQEHRLQ 
751 QKRLFLQKQS QLQAYFNQMQ lAESSYPQPS QQLPLPRQET PPPSQQAPPF 
801 SLTQPLSPVL EPSSEQMQYS PFLSQYQEMQ LQPLPSTSGP RAAPPLPTQL 
851 QQQQPPPPPP PPPPRQPGAA PAPLQFSYQT CELPS7VASPA PDYPTPCQYP 
901 VDGAQQSDLT GPDCPRSPGL QEAPSSYDPL ALSELPGLFD CEMLDAVDPQ 
951 HNGYVLVN 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKF2phtes3_15kll, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_15)cll, frame 1 


Report for DKFZphtes3_15lcll . 1 


[LENGTH] 926 

IMWJ 103915.77 

(pU 5.70 

[HOMOL] TREMBL:AB018324_1 gene: 

mRNA for KIAA0781 protein, partial cds. 


(FUNCATJ 
8e-76 
(FUNCAT] 
(FUNCATJ 
t FUNCAT] 
[FUNCAT] 
(FUNCATJ 
3e-56 
I FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
(FUNCAT] 
[FUNCAT] 
[FUNCATJ 
repair) 
[FUNCAT] 
[FUNCATJ 
[FUNCAT] 
(FUNCAT] 
[FUNCATJ 
(FUNCAT) 
[FUNCATJ 
[FUNCAT] 
YPL031CJ 
[FUNCAT] 
le-23 
[FUNCAT] 
[FUNCAT] 
(FUNCAT] 
(FUNCATJ 


'KIAA0781"; 
0.0 


product: '•KIAA0781 protein"; Homo sapiens 


01.05.04 regulation of carbohydrate utilization 


[S. cerevisiae, YDR477wJ 


le-23 


11.01 stress response [S. cerevisiae, YDR47'7w] 8e-76 

30.03 organization of cytoplasm [S. cerevisiae, YDR477w] 8e-76 

98 classification not yet clear-cut (s. cerevisiae, YCL024w] 4e-58 
03.25 cytokinesis [S. cerevisiae, YDR507cl 3e-56 

03.04 budding, cell polarity and filament formation (S. cerevisiae, YDR507cJ 

30.02 organization of plasma membrane [S. cerevisiae, YDR122wI le-53 
03.22 cell cycle control and mitosis (S. cerevisiae, YKLlOlw) 3e-53 

30.10 nuclear organization (S. cerevisiae, YKLlOlw] 3e-53 

99 unclassified proteins [S. cerevisiae, YPL141c] 5e-51 

03.19 recombination and dna repair (S. cerevisiae, YPL153c] 3e-42 
03.22.01 cell cycle check point proteins (S. cerevisiae, YPL153c] 3e-42 

10.99 other signal-transduction activities [S. cerevisiae, YPL153c] 3e-42 
11.04 dna repair (direct repair, base excision repair and nucleotide excision 
{S. cerevisiae, YPL153c] 3e-42 

03.01 cell growth [s. cerevisiae, YFR014cI 5e-42 

03.16 dna synthesis and replication (S. cerevisiae, YMROOlc] 2e-34 

03.10 sporulation and germination [S. cerevisiae, YGLlSOwJ le-27 

08.13 vacuolar transport (S. cerevisiae, YGLlSOw] le-27 

06,13.04 lysosomal and vacuolar degradation [S. cerevisiae, YGLlSOw] le-27 

10.02.11 Icey kinases [S. cerevisiae, YBLlOSc] 3e-26 

04.99 other transcription activities [S. cerevisiae, YER129wJ 3e-26 

02.19 metabolism of energy reserves (glycogen, trehalose) [S. cerevisiae. 


01.04.04 regulation of phosphate utilization 


(S. cerevisiae, YPL031cI 


[FUNCAT] 
(FUNCAT] 
(FUNCAT] 
3e-19 
[FUNCAT] 
[FUNCAT] 
(FUNCAT) 
4e-18 
(FUNCAT) 
palmitylation, 
(FUNCAT) 
(FUNCAT) 
YNI.183CJ 2e-14 


04.05.01.04 transcriptional control [S. cerevisiae, YPLOBlc) le-23 
03.13 meiosis (S. cerevisiae, YOR351c] 2e-23 
10.05.11 key kinases [S. cerevisiae, YHL007c) 8e-2l 

03.07 pheromone response, mating-type determination, sex-specific proteins 
[S, cerevisiae, YHL007c) 8e-21 

09.01 biogenesis of cell wall (s. cerevisiae, YPL140c) 2e-20 

10.03.11 key kinases [S. cerevisiae, YLR113wl 7e-20 

04.05.01.01 general transcription activities (s. cerevisiae, YDLlOSw) 


10.05.09 regulation of g-protein activity (S. cerevisiae, YBL016w) 2e-18 
10.04.11 key kinases (S. cerevisiae, YLR362wJ 3e-18 

04.03.99 other trna-transcription activities [S, cerevisiae, YOR061w) 

06.07 protein modification (glycolsylation, acylation, myristylation, 
farnesylation and processing) * (s. cerevisiae, YFL033c] 4e-17 

05.07 translational control (S. cerevisiae, YDR283c] 2e-16 
01.02.04 regulation of nitrogen and sulphur utilization [S. cerevisiae. 
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(FONCATJ 08.99 other intracellular-transport activities [S. cerevisiae, YNL183c) 

2e-14 

(FUNCAT) 09.04 biogenesis of cytoskeleton (S. cerevisiae, YNLG20cl 5e-14 

[FUNCATl c energy conversion fM. genitalium, MG109} 2e-12 

[FUNCATJ 30.09 organization of intracellular transport vesicles [S. cerevisiae, 

YBR097wJ le-10 

(FUNCAT) 08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YBR097w] 

le-10 

[FUNCATl 30.08 organization of golgi [S. cerevisiae, YBR097w] le-10 

[FUNCAT] 06.04 protein targeting, sorting and translocation [S. cerevisiae, YBR097w] 

le-10 

[FUNCAT) 10.04,99 Other nutritional'-response activities [S. cerevisiae, YJR059wl 

4e-09 

[FUNCAT) 01.06.10 regulation of lipid, fatty-acid and sterol biosynthesis (S. 

cerevisiae," YHR079c] le-07 

[FUNCAT] 30.07 organization of endoplasmatic reticulum [S. cerevisiae, YHR079c] 

le-07 

[FUNCAT] 08.19 cellular import (S, cerevisiae, YNL154c) 2e-04 

[BLOCKS] BL00415A Synapsins proteins 

[BLOCKS] BL00239B Receptor tyrosine kinase class II proteins 

[BLOCKS] BL00107A Protein kinases ATP-binding region proteins 

[SCOPJ dlgol 5.1.1.1.9 MAP kinase Erk2 [rat Elattus norvegicus 3e-78 

[SCOPl dlwfc 5.1.1.1.8 MAP kinase p38 (human (Homo sapiens) le-81 

[SCOPJ dlkoa_2 5.1.1.1.7 (1-350) Twitchin, kinase domain (Caenorhabditi 5e-89 

[SCOPJ dlkoba_ 5.1.1.1.6 Twitchin, kinase domain [California sea har 5e-86 

[SCOP] dlphk 5.1.1-1-5 gamma-subunit of glycogen phosphorylase kinas 3e-80 

(SCOPJ dlirk 5.1.1-2.4 insulin receptor [Human (Homo sapiens) 6e-70 

[SCOP] dlapme_ 5.1.1.1-4 cAMP-dependent PK, catalytic subunit (mouse (Mu le-95 

[SCOPJ dlfgka_ 5.1.1.2.3 Fibroblast growth factor receptor 1 [human (Horn 7e-71 

[SCOP) dlydse_ 5.1.1.1.3 cAMP-dependent PK, catalytic subunit [bovine (Bo 2e-96 

[SCOPJ dlfmk_3 5.1.1.2.2 (168-437) c-src tyrosine kinase [human (Horn 2e-72 

[SCOP) dlcdka_ 5.1.1.1.2 cAMP-dependent PK, catalytic subunit [pig (Su 5e-97 

[SCOPJ d2hckb3 5.1.1.2.1 (167-437) Haemopoetic cell kinase Hck [huma 2e-68 

[SCOP] dlcsn 5.1.1.1.11 Casein kinase-1, CKl (Schizosaccharomyces pombe 3e~53 

[SCOP] dljsua_ 5.1-1,1.1 Cyclin-dependent PK (Human (Homo sapiens) 3e-78 

[SCOP] dlckia_ 5.1.1.1.10 Casein kinase-1, CKl [rat (Rattus norvegicus) le-58 

[EC] 2.7.1.117 Myosin-light-chain kinase 3e-49 

[EC] 2.7.1.109 (Hydroxymethylglutaryl-CoA reductase (NADPH) ) kinase 4e-78 

(EC) 2.7.1.38 Phosphorylase kinase 3e-41 

[EC] 2.7.1.37 Protein kinase 7e-45 

[EC] 2.7.1.123 Ca2+/calmodulin-dependent protein kinase 5e-42 

[EC] 2-7.1.128 (Acetyl-CoA carboxylase] kinase 4e-78 

[PIRKWJ phosphotransferase 3e-93 

[PIRKW) nucleus 2e-74 * 

[PIRKW] calcium 2e-40 

[PIRKW] transferase 3e-33 

[PIRKW] duplication 2e-32 

[PIRKWJ tandem repeat 7e-45 

t PIRKW) phorbol ester binding 4e-33 

[PIRKWJ zinc 4e-33 

(PIRKWJ ion transport le-32 

(PIRKWJ cell cycle control le-45 

[PIRKW] serine/threonine-specific protein kinase 2e-97 

[PIRKW] oncogene le-34 

[PIRKW] phospholipid binding 2e-32 

[PIRKW] autophosphorylation 2e-74 

[PIRKW] brain 6e-36 

[PIRKWJ heterotetraroer 8e-38 

(PIRKWJ mitosis le-45 

[PIRKW] polymer 5e-41 

[PIRKW] magnesium 6e-80 

[PIRKW] ATP 2e-97 

(PIRKW) polyprotein le-34 

(PIRKwj alternative initiators 2e-31 

[PIRKW] phosphoprotein 2e-74 

[PIRKW) apoptosis 8e-38 

(PIRKW) cGMP binding 4e-33 

[PIRKW] glycoprotein 3e-36 

(PIRKWJ skeletal muscle 8e-38 

[PIRKWj protein kinase 2e-50 

[PIRKW] testis 5e-41 

(PIRKWJ cAMP binding 8e-38 

(PIRKW) transforming protein 4e-33 

(PIRKW) purine nucleotide binding 7e-52 

[PIRKWJ calcium binding 7e-45 

[PIRKWJ alternative splicing 5e-42 

[PIRKW] P-loop 7e-52 

[PIRKW] lipoprotein 8e-38 

[PIRKWJ proto-oncogene 4e-33 

[PIRKWJ segmentation le-34 

[PIRKWJ core protein le-34 
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(PIRKW) muscle 8e-38 

[PIRKW] myristylation 8e-38 

IPIRKWJ EF hand 7e-45 

[PIRKW] cell division 3e-49 

(PIRKW] homodimer le-32 

[PIRKW] calmodulin binding 5e-42 

ISUPFAM) ^ ribosomal protein S6 kinase II le-34 

[SUPFAM) calcium-dependent protein Jcinase 7e-4 5 

[SUPFAM] AMP-activated protein kinase 6e-80 

[SUPFAM] protein Jcinase aJct 3e-36 

[SUPFAM] protein kinase SPKl 7e-41 

[SUPFAMI unassigned Ser/Thr or Tyr-specific protein kinases 8e-99 

[SUPFAM] Ca2+/calmodulin-dependent protein kinase 5e-42 

(SUPFAM) calmodulin repeat homology 7e-45 

[SUPFAM] CAMP receptor protein cyclic nucleotide-binding domain homology 3e-33 

{SUPFAM} protein kinase DUNl 6e-36 

[SUPFAM] protein kinase C zeta 4e-33 

[SUPFAMI Dictyostelium cAMP-dependent protein kinase catalytic chain 2e-34 

(SUPFAM] death-associated protein kinase 8e-38 

(SUPFAM) pleckstrin repeat homology 3e-36 

[SUPFAM) ankyrin repeat homology 8e->38 

[SUPFAMI protein kinase homology 8e-99 

(SUPFAM) Ca2+/calmodulin-dependent protein kinase II 6e-38 

[SUPFAM] protein kinase C zinc-binding repeat homology 4e-33 

(SUPFAM) protein kinase C delta 2e-32 

[SUPFAM] cGMP-dependent protein kinase 3e-33 

[SUPFAM] protein kinase cdrl le-45 

[SUPFAMJ kinase-related transforming protein 2e-50 

(SUPFAM) Ca2+/calmodulin-dependent protein kinase I 8e-42 

(SUPFAM] kinase interaction domain homology 7e-41 

(SUPFAM] gag-akt polyprotein le-34 

[PROSITEJ PROTEIN_KINASE_ATP 1 

(PROSITE] MYRISTYL 3 

(PROSITE) AMIDATION 2 

(PROSITEI CAMP_PHOSPHO_SITE 4 

(PROSITE) CK2_PH0SPH0_SITE 15 

f PROSITE) TYR_PHOSPHO_SITE 2 

[PROSITE) PKC_PHOSPHO_SITE 10 

[PROSITE] ASNGLYCOSYLATION 2 

(PROSITE] PROTEIN_KINASE_ST 1 

(PFAM) Eukaryotic protein kinase domain 

(KWJ Irregular 

IKW] 3D 

(KWJ LOW_COMPLEXITY 12.31 % 


SEQ MVMADGPRHLQRGPVRVGFYDIEGTLGKGNFAWKLGRHRITKTEVAIKIIDKSQLDAVN 
SEG 

ICtpE EEECTTTEEEEEEEETTTTEEEEEEEEEHHHHHHHC 


SEQ LEKIYREVQIMKMLDHPHIIKLYQVMETKSMLYLVTEYAKNGEIFDYLANHGRLNESEAR 

SEG 

IctpE HHHHHHHHHHHHCCCTTTBCCEEEEEEETTEEEEEEECTTTTBH^ 

SEQ RKFWQILSAVDYCHGRKIVHRDLKAENLLLDNNMNIKIADFGFGNFFKSGELLATWCGSP 

SEG 

IctpE HHHHHHHHHHHHHHHCCEECCCCCGGGEEETTTTCEEECCTTTTEETT-TTBC^ 

SEQ PYAAPEVFEGQQYEGPQLDIWSMGWLYVLVCGALPFDGPTLPILRQRVLEGRFRIPYFM 

SEG 

IctpE GGCCHHHHHCCCBC-'HHHHHHHHHHHHHHliHHCCT^T'm'T^ 

SEQ SEDCEHLIRRMLVLDPSKRLTIAQIKEHKWMLIEVPVQRPVLYPQEQENEPSIGEFNEQV 

SEG 

IctpE CHHHHHHHHHTTTTTGGGTTTHHHHHHCGG * i 1 ! 

SEQ LRLMHSLGIDQQKTIESLQNKSYNHFAAIYFLLVERLKSHRSSFPVEQRLDGRQRRPSTI 

SEG 

ictpE 1 i i !!!!!!!!!!!!!!!!!!!!! ! 

SEQ AEQTVAKAQTVGLPVTMHSPNMRLLRSALLPQASNVEAFSFPASGCQAEAAFMEEECVDT 

SEG 

ictpE i i ! 

SEQ PKVNGCLLDPVPPVLVRKGCQSLPSNMMETSIDEGLETEGEAEEDPAHAFEAFQSTRSGQ 

SEG xxxxxxxxxxx 

IctpE 

SEQ RRHTLSEVTNQLVVMPGAGKIFSMNDSPSLDSVDSEYDMGSVQRDLNFLEDNPSLKDIML 

SEG 

iCtpE 
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.SEQ 
SEG 
IctpE 

SEQ 
SEG 
IctpE 

SEQ 
SEG 
IctpE 

SEQ 

SEG 

IctpE 

SEQ 
SEG 
ICtpE 

SEQ 
SEG 
ICtpE 

SEQ 
SEG 
IctpE 


ANQPS PRMTS PFI SLRPTN P7VMQALSSQKREVHNRSPVS FREGRRAS DTSLTQG I VAFRQ 


HLQNLARTKGILELNKVQLLYEQIGPEADPNLAPAAPQLQDLASSCPQEEVSQQQESVST 
xxxxxxxxxxxxxxxx. . . .xxxxxxxxxxxx. 

LPASVHPQLSPRQSLETQYLQHRLQKPSLLSKAQNTCQLYCKEPPRSLEQQLQEHRLQQK 
xxxxxxxxxxxxx 


RLFLQKQSQLQAYFNQMQIAESSYPQPSQQLPLPRQETPPPSQQAPPFSLTQPLSPVLEP 
xxxxxxxxxxx xxxxxxxxxxxxxxx 


SSEQMQYSPFLSQYQEMQLQPLPSTSGPRAAPPLPTQLQQQQPPPPPPPPPPRQPGAAPA 
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 


PLQFSYQTCELPSAASPAPDYPTPCQYPVDGAQQSDLTGPDCPRSPGLQEAPSSYDPLAL 

XXX 


SELPGLFDCEMLDAVDPQHNGYVLVN 


Prosite for DKFZphtes3_15kll . 1 


PSOOOOl 
PSOOOOl 
PS00004 
PS00004 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00007 
PS00007 
PS00008 
PS00008 
PS00008 
PS00009 
PS00009 
PS00107 
PS00108 


115->119 
320->324 
258->262 
355->359 
481->485 
584->588 
257->260 
339->342 
420->423 
475->478 
534->537 
545->548 
554->557 
567->570 
579->582 
670->673 
42->46 
54->58 
128->132 
292->296 
359->363 
394->398 
450->454 
458->462 
484->488 
503->507 
515->519 
534->538 
579->583 
878-">882 
893->897 
672->680 
100->108 
372->378 
871->877 
905->911 
134->138 
582->586 
26->50 
138->151 


asn glycosylation 
asn*"glycosylation 
camp_phospho_s ite 

C AMP_PHOS PHO_S I TE 
CAMP_PHOSPHO_SITE 
CAMP_PHOSPHO_SITE 
PKC_PHOS PHO_S I TE 
PKC_PHOS PHO_S I TE 
PKC_PHOSPHO__SITE 
PKC_PHOS PHO_S I TE 
PKC_PHOS PHO_S I TE 
PKC_PHOS PHO_S ITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_S ITE 
PKC_PHOS PHO_S ITE 
PKC_PHOSPHO_S ITE 
CK2_PH0SPH0_S ITE 
C K2_PH0S PHO_S ITE 
CK2_PH0SPH0_SITE 

ck2 phosphors ite 
ck22phospho_site 
ck2_ph0sph0_s ite 
ck2_ph0s pho_s i te 

CK2_PH0SPH0_SITE 
CK2_PH0S PHO_S I TE 
CK2_PH0SPH0_SITE 
CK2_PH0S PHO_S ITE 
CK2_PH0SPH0_SITE 
CK2_PH0SPH0_SITE 

ck2__ph0sph0_site 

ck2_ph0sph0_site 

tyr_phospho_site 

tyr_phospho_site 

myristyl 

myristyl 

myristyl 

ami dat i on 

-amidation 

protein_kinase_atp 

PROTEIN kinase ST 


PDOCOOOOl 
PDOCOOOOl 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00009 
PDOC00009 
PDOCOOlOO 
PDOCOOlOO 


HHM NAME 


Pfam for DKF2phtes3__15kll.l 
Eukaryotic protein kinase domain 
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HMM ♦YeigRilGeGsFGtVYJcCiWr .TGelVAIKIIkkrsms FlREI 

y I++++G+G+F++V+++++R T +VAIKII+K++++ + RE+ 

Query 20 YDIEGTLGKGNFAWKLGRHRITKTEVAIKIIDKSQLDAVNLEKIYREV 

HMM qlMRrLnHPNiiRFYDwFedddDHIYMIMEYMeGGDLFDYIrrngpMsEw 
QIM++L+HP+II++y ++E +++ +Y+++EY+ +G++FDY+ ++G+++E 

Query 69 qimkmldhphiiklyqvme-tksmlylvteyakngeifdylanhgrlnes 

HMM elrflMyQILrGMeYLHSMgllHRDLKPENILIDeNgqlKIcDFGLARqM 
E+R+ ++QIL++++Y+H ++I+HRDLK+EN+L+D+N++IKI+DFG+ ++ 
Query 118 EARRKFWQILSAVDYCHGRKIVHRDLKAENLLLDNNMNIKIADFGFGNFF 

HMM nnYerMttf CGTPWYMMAPEVIImg . nyYttkVDMWSFGCILWEMMTGep 

+++E++ T CG+P+Y APEV +G +y +++ D+WS+G++L-*- +++G + 
Query 168 KSGELLATWCGSPPYA-APEV-FEGQQYEGPQLDIWSMGVVLYVLVCGAL 

HMM PFyddnMemlmrliqrf rrpfWpnCSeElyDFMrwCWnyDPekRPTFrQI 

PF++ ++ + + +++ R+++++ +SE++ +++R+++ +DP+KR+T+ QI 
Query 216 PFDGPTLPILRQRVLEGRFRI PYFMSEDCEHLIRRMLVLDPSKRLTIAQI 

HMM LnHPWF* 
+H W+ 

Query 266 KEHKWM 271 


PCT/IBOO/01496 

68 
117 
167 

215 
265 
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group: testes derived 

DKFZphtes3_15jl8 encodes a novel 710 amino acid protein with weak similarity to neurofilament 
proteins . 

No informative BLAST results; No predictive prosite, pfara or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes. 


similarity to neurofilament proteins 

Sequenced by GBF 

Locus: unknown 

Insert length: 2533 bp 

Poly A stretch at pos. 2507, no polyadenylation signal found 


1 CTTCAGTTCA ACTAAAAATG GACAGATCTC AGCAGACCAG CCGTACAGGA 
51 TACTGGACCA TGATGAACAT CCCCCCTGTA GAAAAAGTGG ACAAGGAACA 
101 ACAGACATAC TTTAGTGAAT CAGAAATAGT GGTTATTTCC AGGCCAGATA 
151 GTTCTTCTAC AAAGTCAAAG GAAGATGCCC TGAAACATAA ATCGTCGGGA 
201 AAGATTTTTG CTAGTGAACA CCCTGAATTT CAACCAGCAA CAAACAGCAA 
251 TGAAGAAATT GGGCAGAAAA ATATCAGCAG AACTTCATTT ACTCAGGAGA 
301 CTAAAAAAGG TCCCCCAGTA CTTTTAGAAG ATGAGCTTAG GGAAGAAGTA 
351 ACTGTACCTG TTGTACAAGA AGGTTCTGCT GTTAAAAAAG TGGCTTCTGC 
401 TGAAATAGAG CCTCCATCAA CAGAAAAATT CCCAGCTAAA ATACAGCCTC 
451 CATTAGTTGA AGAGGCCACT GCTAAAGCGG AGCCCAGACC TGCTGAAGAG 
501 ACCCATGTCC AAGTACAGCC ATCAACTGAA GAGACTCCTG ATGCTGAGGC 
551 AGCCACTGCA GTTGCGGAGA ATTCTGTTAA AGTTCAGCCT CCACCTGCTG 
601 AAGAGGCCCC TTTAGTGGAG TTTCCTGCTG AAATTCAGCC TCCATCAGCT 
651 GAAGAGTCTC CTTCTGTAGA GCTTCTGGCT GAAATTCTGC CTCCATCAGC 
701 TGAAGAGTCC CCTTCAGAAG AGCCTCCTGC TGAAATTCTG CCTCCACCAG 
751 CTGAAAAGTC TCCTTCAGTA GAGCTTCTTG GTGAAATTCG GTCTCCCTCA 
801 GCACAAAAGG CTCCCATTGA AGTACAGCCT TTACCAGCTG AGGGCGCCCT 
851 TGAAGAGGCC CCAGCTAAAG TAGAGCCTCC CACTGTTGAA GAGACCCTTG 
901 CTGAAGTTCA GCCTCTATTA CCTGAAGAGG CTCCTAGAGA AGAGGCTCGA 
951 GAACTTCAGC TTTCAACAGC TATGGAGACC CCTGCAGAAG AGGCTCCTAC 
1001 TGAATTTCAG TCTCCATTAC CTAAAGAGAC CACTGCAGAA GAGGCCTCTG 
1051 CTGAAATTCA GCTTCTAGCA GCTACGGAGC CTCCTGCAGA TGAAACTCCT 
1101 GCCGAAGCTC GGTCTCCACT ATCTGAGGAG ACTTCTGCAG AAGAGGCTCA 
1151 TGCTGAAGTT CAATCTCCAT TAGCTGAAGA GACCACTGCA GAAGAGGCCT 
1201 CTGCTGAAAT TCAGCTTCTA GCAGCTATAG AGGCTCCTGC AGATGAAACT 
1251 CCTGCTGAAG CTCAGTCTCC ACTATCTGAG GAGACTTCTG CAGAAGAGGC 
1301 TCCTGCTGAA GTTCAGTCTC CATCAGCTAA GGGAGTTTCT ATAGAAGAGG 
1351 CCCCTCTTGA GCTTCAGCCT CCATCAGGTG AAGAGACCAC TGCAGAAGAG 
1401 GCCTCTGCTG CAATTCAGCT TCTAGCAGCT ACAGAGGCTT CTGCAGAAGA 
1451 GGCTCCTGCT GAAGTTCAGC CTCCACCAGC TGAGGAGGCC CCCGCTGAAG 
1501 TTCAGCCTCC ACCAGCTGAG GAGGCCCCCG CTGAAGTTCA GCCTCCACCA 
1551 GCTGAGGAGG CCCCCGCTGA AGTTCAGCCT CCACCAGCTG AGGAGGCCCC 
1601 CGCTGAAGTT CAGCCTCCAC CAGCTGAGGA GGCCCCCGCT GAAGTTCAGC 
1651 CTCCACCAGC TGAGGAGGCC CCCTCTGAAG TTCAGCCTCC ACCAGCTGAG 
1701 GAGGCCCCTG CTGAAGTTCA GTCTCTACCA GCTGAGGAGA CTCCTATAGA 
1751 AGAGACCCTT GCTGCAGTAC ACTCTCCCCC AGCTGATGAT GTCCCTGCAG 
1801 AAGAGGCCTC CGTTGACAAA CATTCCCCAC CAGCTGATTT GCTTCTGACT 
1851 GAGGAGTTTC CTATAGGAGA GGCCTCTGCT GAAGTTTCAC CTCCACCATC 
1901 TGAACAAACC CCTGAAGATG AGGCTCTGGT AGAGAATGTG TCTACAGAAT 
1951 TTCAGTCACC GCAGGTGGCA GGAATTCCAG CAGTAAAATT AGGATCGGTT 
2001 GTTTTGGAAG GTGAAGCAAA ATTTGAAGAG GTTTCAAAAA TCAATTCTGT 
2051 CCTTAAAGAT TTGTCTAATA CCAATGATGG ACAGGCTCCC ACTCTTGAAA 
2101 TAGAAAGTGT TTTTCATATA GAATTAAAAC AACGTCCTCC TGAACTGTAG 
2151 TCAGGTTGTA CCTAAGCTAG CAATCAGAAG CTACATGGTT TTGGAAGAAC 
2201 ATACTTTAGA AAAGGGTGGG CAGCAGGAAG TAGCTTTGTC AATAAGGCAA 
2251 ATTAAAGGGG ACCCCAAGAC TTGGAATACA GGTTGGAAAA TGAACAATAA 
2301 AAACTGTAGC AGCATAAAAT TACTTGTGTT AATTTCATTC AAATTTATGG 
2351 CATGAAAAAT ACCTATTTTG AAAGTAAGTT TATAATTGAA AAAAATTGCT 
2401 TAAAATATCC TTCCTACAGT AAACTTGTTG ACACGAGTAA AGTTTAATCT 
2451 GCAGCCATCT TTTCTTGTCT TTGCCTTCCC TTTATAAGTA AATATAGTTT 
2501 CTAGTGGAAA AAAAAAAAAA AAAAAAAAAA AAA 


BLAST Results 
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No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 3 


ORF from 18 bp to 2147 bp; peptide length: 710 
Category: similarity to known protein 
Classification: unclassified 


1 MDRSQOTSRT GYWTMMNIPP VEKVDKEQQT YFSESEIVVI SRPDSSSTKS 
51 KEDALKHKSS GKIFASEHPE FQPATNSNEE IGQKNISRTS FTQETKKGPP 
101 VLLEDELREE VTVPVVQEGS AVKKVASAEI EPPSTEKFPA KIQPPLVEEA 
151 TAKAEPRPAE ETHVQVQPST EETPDAEAAT AVAENSVKVQ PPPAEEAPLV 
201 EFPAEIQPPS AEESPSVELL AEILPPSAEE SPSEEPPAEI LPPPAEKSPS 
251 VELLGEIRSP SAQKAPIEVQ PLPAEGALEE APAKVEPPTV EETLAEVQPL 
301 LPEEAPREEA RELQLSTAME TPAEEAPTEF QSPLPKETTA EEASAEIQLL 
351 AATEPPADET PAEARSPLSE ETSAEEAHAE VQSPLAEETT AEEASAEIQL 
401 LAAIEAPADE TPAEAQSPLS EETSAEEAPA EVQSPSAKGV SIEEAPLELQ 
451 PPSGEETTAE EASAAIQLLA ATEASAEEAP AEVQPPPAEE APAEVQPPPA 
501 EEAPAEVQPP PAEEAPAEVQ PPPAEEAPAE VQPPPAEEAP AEVQPPPAEE 
551 APSEVQPPPA EEAPAEVQSL PAEETPIEET LAAVHSPPAD DVPAEEASVD 
601 KHSPPADLLL TEEFPIGEAS AEVSPPPSEQ TPEDEALVEN VSTEFQSPQV 
651 AGIPAVKLGS WLEGEAKFE EVSKIKSVLK DLSNTNDGQA PTLEIESVFH 
701 lELKQRPPEL 

BLAST P hits 

NO BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_17f 10, frame 3 

PIR:A37221 neurofilament triplet H protein - rat, N » 1, Score « 480, P 
= 7.4e-43 

TREMBL:BNNFLH_1 Rat heavy neurofilament subunit (NF-H) mRNA, 3' end., N 
•= 1, Score « 475, P = le-42 


>PIR:A37221 neurofilament triplet H protein 
Length « 1,072 


rat 


Score = 480 (72.0 bits). Expect = 7.4e-43, P ^ 7.4e-43 
Identities -.185/622 (29%), Positives = 320/622 (51%) 


Query: 

33 

Sbjct: 

436 

Query: 

93 

Sbjct: 

496 

Query: 

153 

Sbjct: 

555 

Query: 

212 

Sbjct: 

610 

Query: 

269 

Sbjct: 

670 

Query: 

328 

Sbjct: 

722 

Query: 

384 


SE +1 V+ + + + 


+E 


+ + + ++ 


E Q 


E G + + TS 


++EE 


V+ P+T ++P 


K + AE + P+ K PA'*'++ P 


++ A 


+ A A++ +V+ P 


+ PAE + P+ 


+SP 


V+ PAE 


AE 


P++ +SP E + PAE 


KSP+ V+ E +SP+ K+P+ 


++P +V+ P 


++ +E + 


++P E A+ 


++PAE ++P 


E + P ++ AE S 


LAATEPPAD- ET PAEARS PLSEETS AEEAHAEVQS 383 

A + PA+ ++PAEA+SP+ E S E+A + V+ 

^ AEAKS P AEAKS P AEAKS P V - E VKS PEKAKS P VKE GAK 775 


384 PLAE ETTAEEAS AE I QLLAAI EAPAD- ET PAEAQS PLS EET - S AEEAPA- EVQS PS AKGV 440 
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LAE^ + E+A + ++ 1+ PA+ ++P +A+SP+ EE S E+A +V+SP AK 
Sbjct: 77 6 SLAEAKSPEKAKSPVK— EEIKPPAEVKSPEKAKSPMKEEAKSPEKAKTLDVKSPEAKTP 833 

Query: 441 SIEEA--PLELQPPSGEETTA-EEASAAIQLLAATEASA— -EEAPAEVQPPPAEEAPAE 494 

+ EEA P +++ P ++ A EEA + + TE A EE + V+ A+E P + 
Sbjct-; 834 AKEEAKRPADIRSPEQVKSPAKEEAKSPEKEETRTEKVAPKKEEVKSPVEEVKAKEPPKK 893 

Query: 495 VQPPPAEEAP-AEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAPS 553 

V+ P EV+ +EAP E Q P AEE + P +++P E + EEA 

Sbjct: 894 VEEEKTPATPKTEVKESKKDEAPKEAQKPKAEEKEPLTEKP— KDSPGEAKK— EEAKE 948 

Query: 554 EVQPPPAEEAPAEV QSLP AEETPIEETL— AAVHSPPADDVPAEEASVD-KHS 603 

+ P EE PA++ ■ ++ P AE+ +E + P ++VPA D K 

Sbjct: 949 KKAAAPEEETPAKLGVKEEAKPKEKAEDAKAKEPSKPSEKEKPKKEEVPAAPEKKDTKEE 1008 

Query: 604 PPADLLLTEEFPIGEASAEVSPP— PSEQT-PEDEALVENVSTEFQSPQ 649 
+ EE P +A A+ P E + P+ E ++ ST+ + Q 

Sbjct: 1009 KTTESKKPEEKPKMQAKAKEEDKGLPQEPSKPKTEKAEKSSSTDQKDSQ 1057 

Score = 473 (71.0 bits). Expect « 4.8e-42, P = 4.86-42 
Identities = 184/628 (29%), Positives = 310/628 (49%) 

Query: 18 IPPVEKVDKEQQTYFSESEIVVISRP— DSSSTKSKEDALKHKSSGKIFASEHPEFQPA 74 
I VEK +KE ++E + ++ + E+ + + G+ A+ P + A 

Sbjct: 440 IKWEKSEKETVIVEEQTEEIQVTEEVTEEEDKEAQGEEEEEAEEGGEEAATTSPPAEEA 499 

Query: 75 TNSNEEIGQKNISRTSFTQETKKGPPVLLEDELREEVTVPVVQEGSAVKKVASAEIEPPS 134 

+ +E + + ++ KP E+ E P + AK+AE + P+ 
Sbjct: 500 ASPEKET-KSPVKEEAKSPAEAKSPA— EAKSPAEAKSPAEVKSPAEVK-SPAEAKSPA 554 

Query: 135 TEKFPAKIQPPLVEEATAKAEPRPAEETHVQVQ-PSTEETPDAEAATAVAENSVKVQPPP 193 

K PA+++ P ++ A+A+ ++ +V+ P+T ++P + A A++ +V+ P 

Sbjct: 555 EAKSPAEVKSPATVKSPAEAKSPAEAKSPAEVKSPATVKSPGEAKSPAEAKSPAEVKSPV 614 

Query: 194 AEEAPL-VEFPAEIQPPSAEESPS-VELLAEILPPSAEESPSE-EPPAEILPPPAEKSPS 250 

++P + PA ++ P +SP+ + AE+ P+ +SP E + PAE+ P KSP+ 
Sbjct: 615 EAKSPAEAKSPASVKSPGEAKSPAEAKSPAEVKSPATVKSPVEAKSPAEVKSPVTVKSPA 674 

Query: 251 -VELLGEIRSPSAQKAPIEVQ-PLPAEGALE-EAPAKVEPPTVEETLAEVQPLLPEEAPR 307 

+ E++SP++ K+P E + P A+ E ++P + P ++ AE +P ++P 
Sbjct: 675 EAKSPVEVKSPASVKSPSEAKSPAGAKSPAEAKSPVVAKSPAEAKSPAEAKPPAEAKSPA 734 

Query: 308 EEARELQLSTAME— TPAE-EAPTEFQSP LP-KE— TTAEEASAEIQLLAATE— 354 

E + + E +PAE ++P E +SP P KE + AE S E E 

Sbjct: 735 EAKSPAEAKSPAEAKSPAEAKSPVEVKSPEKAKSPVKEGAKSLAEAKSPEKAKSPVKEEI 794 

Query: 355 -PPAD-ETPAEARSPLSEET-SAEEAHA-EVQSPLAEETTAEEAS— AEIQLLAAIEAPA 408 

PPA+ ++P +A+SP+ EE S E+A +V+SP A+ EEA A+I+ +++PA 
Sbjct: 795 KPPAEVKSPEKAKSPMKEEAKSPEKAKTLDVKSPEAKTPAKEEAKRPADIRSPEQVKSPA 854 

Query: 409 DETPAEAQSPLSEETSAEE-APA— EVQSPSAKGVSIEEAPLELQPPSGEETTAEEASAA 465 

E EA+SP EET E+ AP EV+SP +EE +PP E EE + A 

Sbjct: 855 KE— EAKSPEKEETRTEKVAPKKEEVKSP VEEVKAK-EPPKKVE— EEKTPA 901 

Query: 466 IQLLAATEASAEEAPAEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAE 525 

E+ +EAP E Q P AEE + P +++P E + A+E A P E 

Sbjct: 902 TPKTEVKESKKDEAPKEAQKPKAEEKEPLTEKP— KDSPGEAKKEEAKEKKAAA PEE 956 

Query: 526 EAPAEV Qpppaeeapaevqpppaeeapsevqpppaeeapaevqslpaeetpieetl 581 

E PA++ + P E+A P++ PSE + P EE PA + +E E+ 

Sbjct: 957 ETPAKLGVKEEAKPKEKAEDAKAKEPSK— PSEKEKPKKEEVPAAPEKKDTKEEKTTESK 1014 

Query: 582 AAVHSPPADDVPAEEASVDKHSPPADLL-LTEEFPIGEASAEVSPPPSEQTPEDEA 636 

P EE DK P TE+ ++ + PSE+ PED+A 

Sbjct: 1015 KPEEKPKMQAKAKEE— -DKGLPQEPSKPKTEKAEKSSSTDQKDSQPSEKAPEDKA 1067 

Score = 421 (63.2 bits). Expect = 3.7e-36, P « 3.7e-36 
Identities = 162/540 (30%), Positives = 275/540 (50%) 

Query: 135 TEKFPAKIQPPLVEEATAKAEPR PAEETHVQVQPSTEETPDAEAATAVAENSVKV 189 

TE P KI P + K+E + +E+ V V+ TEE E T E + 

Sbjct: 419 TEGLP-KI-PSMSTHIKVKSEEKIKWEKSEKETVIVEEQTEEIQVTEEVTE— EEDKEA 474 

Query: 190 QPPPAEEAPLVEFPAEIQPPSAEESPSVELLAEILPPSAEE— SPSE-EPPAEILPPPAE 246 

Q EEA A P AEE+ S E E P EE SP+E + PAE P 

Sbjct: 475 QGEEEEEAEEGGEEAATTSPPAEEAASPE— KETKSPVKEEAKSPAEAKSPAEAKSPAEA 532 

Query: 247 KSPSVELLGEIRSPSAQKAPIEVQPLPAEGALEEAPAKVEPPTVEETLAEVQPLLPEEAP 306 

KSP+ E++SP+ K+P E + PAE ++pA+v+ p ++ AE + ++P 
Sbjct: 533 KSPA EVKSPAEVKSPAEAKS-PAEA KSPAEVKSPATVKSPAEAKSPAEAKSP 583 
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Query: 307 REEARELQLSTAME— TPAE~EAPTEFQSPLPKETTAEEAS-AEIQLLAATEPPAD-ETP 361 

E + + E +PAE ++P E +SP+ ++ AE S A ++ + PA+ ++P 

Sbjct: 584 AEVKSPATVKSPGEAKSPAEAKSPAEVKSPVEAKSPAEAKSPASVKSPGEAKSPAEAKSP 643 

Query: 362 AEARSPLSEETSAE-EAHAEVQSPLAEETTAEEASAEIQLLAAIEAPAD-ETPAEAQSPL 419 

AE +SP + ++ E ++ AEV+SP+ ++ AE A + ++ +++PA ++P+EA+SP 
Sbjct: 644 AEVKSPATVKSPVEAKSPAEVKSPVTVKSPAE-AKSPVE VKSPASVKSPSEAKSP- 697 

Query: 420 SEETSAEEAPAEVQSPS-AKGVSIEEAPLELQPPSGEETTAEEASAAIQLLAATEASAEE 478 
+ ++PAE +SP AK + ++P E +PP+ ++ AE S A A + A A+ 

Sbjct: 698 AGAKSPAEAKSPVVAKSPAEAKSPAEAKPPAEAKSPAEAKSPAE AKSPAEAK- 749 

Query: 479 APAEVQPPPAEEAPAEVQPPPAEEAP--AEVQPPPAEEAPA— EVQPPPAEEAPAEVQPP 534 

+ PAE + P ++P + + P EA AE + P ++P E++PP ++P + + P 
Sbjct: 750 SPAEAKSPVEVKSPEKAKSPVKEGAKSLAEAKSPEKAKSPVKEEIKPPAEVKSPEKAKSP 809 

Query: 535 PAEEAPAEVQPPPAEEAPSEVQPPPAEEA— PAEVQSLPAEETPIEETLAAVHSPPADDV 592 

EEA + + + E + P EEA PA+++S ++P *Z SP ++ 

Sbjct: 810 MKEEAKSPEKAKTLDVKSPEAKTPAKEEAKRPADIRSPEQVKSPAKEE— AKSPEKEET 866 

Query: 593 PAEEASVDKHS— PPADLLLTEEFPIGEASAEVSPPPSEQTPEDEALVENVSTEFQSPQV 650 

E++K P+++EP + E P + +T E++ EQP+ 

Sbjct: 867 RTEKVAPKKEEVKSPVEEVKAKEPP— KKVEEEKTPATPKTEVKESKKDEAPKEAQKPKA 924 

Query: 651 AGIPAVKLGSVVLEGEAKFEEVSK 67 4 

+ GEAK EE + 

Sbjct: 925 EEKEPLTEKPKDSPGEAKKEEAKE 948 

Score = 406 (60.9 bits). Expect = 1.7e-34, P = 1.7e-34 
Identities = 123/390 (31%), Positives = 213/390 (54%) 

Query: 308 EEARELQLSTAMETPAEEAPTEFQSPLPKETTAEEASAEIQLLAATEPPADETPA EA 364 

E+ E+Q-I-+ E EE E Q +E AEE E AT PPA+E + E 
Sbjct: 455 EQTEEIQVT— EEVTEEEDKEAQGE— EEEEAEEGGEEA ATTSPPAEEAASPEKET 506 

Query: 365 RSPLSEETSAEEAHAEVQSPLAEETTAEEAS-AEIQLLAAIEAPAD-ETPAEAQSPLSEE 422 

+SP+ EE + AE +SP ++ AE S AE++ A +++PA+ ++PAEA+SP + 
Sbjct: 507 KSPVKEEAKSP AEAKSPAEAKSPAEAKSPAEVKSPAEVKSPAEAKSPAEAKSPAEVK 563 

Query: 423 TSAE-EAPAEVQSPS-AKGVSIEEAPLELQPPSGEETTAEEASAAIQLLAATEASAEEAP 480 

+ A ++PAE +SP+ AK + ++P ++ P GE + EA + ++ + EA ++P 
Sbjct: 564 SPATVKSPAEAKSPAEAKSPAEVKSPATVKSP-GEAKSPAEAKSPAEVKSPVEA KSP 619 

Query: 481 AEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAP 540 

AE + P + + + P E + P ++PAEV+ P ++P E + P ++P V+ P ++P 
Sbjct: 620 AEAKSPASVKSPGEAKSPAEAKSPAEVKSPATVKSPVEAKSPAEVKSPVTVKSPAEAKSP 679 

Query: 541 AEVQPPPAEEAPSEVQPPPAEEAPAEVQSLPAEETPIEETLAAVHSPPAD-DVPAEEASV 599 

EV+ P + ++PSE + P ++PAE +S ++P E A PPA+ PAE S 
Sbjct: 680 VEVKSPASVKSPSEAKSPAGAKSPAEAKSPWAKSPAEAKSPAEAKPPAEAKSPAEAKSP 739 

Query: 600 DKHSPPADLLLTEEFPIGEASAEVSPPPSEQTPEDEALVENVSTEFQSPQVAGIPAVKLG 659 

+ PA+ E ++ EV P ++P E ++++ E +SP+ A P VK 
Sbjct: 740 AEAKSPAEAKSPAE AKSPVEVKSPEKAKSPVKEG-AKSLA-EAKSPEKAKSP-VK-E 792 

Query: 660 SVVLEGEAKFEEVSKINSVLKDLSNTNDGQAPTLEIES 697 
+ E K E +K S +K+ + + + +A TL+++S 

Sbjct: 793 EIKPPAEVKSPEKAK — SPMKEEAKSPE-KAKTLDVKS 827 

Score = 255 (38.3 bits). Expect « 5.5e-18, P = 5.5e-18 
Identities = 124/420 (29%), Positives 199/420 (47%) 

Query: 252 ELLGEIRSPSAQKAPIEVQPLPA EGALEEAPAKVEPPTVEETLAEVQPLLPEEAP 306 

ELLG+I+ A +A + + A AL E A++E TV+ TL + 

Sbjct: 236 ELLGQIQGCGAAQAQAQAEARDALKCDVTSALREIRAQLEGHTVQSTLOSEEWFRVRLDR 295 

Query: 307 REEARELQLSTAMETPAEEAPTEFQSPLPKETTAEEASAEIQLLAATEPPADETPAEARS 366 

EA ++ + AM + EE TE++ L TT E++ L +T+ + +E 

Sbjct: 296 LSEAAKVN-TDAMRSAQEEI-TEYRRQLQARTT ELEALKSTKESLERQRSELED 347 

Query: 367 PLSEE-TSAEEAHAEVQSPLAEETTAEEASA--EIQLLAAIEAPAD-ETPAEAQSPLSEE 422 

+ S ++A ++ + L T E A+ E Q L ++ D E A + EE 
Sbjct: 348 RHQVDMASYQDAIQQLDNEL-RNTKWEMAAQLREYQDLLNVKMALDIEIAAYRKLLEGEE 406 

Query: 423 TSAEEAPAEV QS PS -AKGVS I E-EAPLELQPPSGEETT-AEEASAAIQLLA- A 471 

P+ + PS + + ++ E +++ S +ET EE + IQ+ 

Sbjct: 407 CRIGFGPSPFSLTEGLPKIPSMSTHIKVKSEEKIKWEKSEKETVIVEEQTEEIQVTEEV 466 

Query: 472 TEASAEEAPAEVQPPPAEEAPAEVQP— PPAEEAPA EVQPPPAEEA— PAEVQPPPA 524 

TE +EA E + AEE E PPAEEA + E + P EEA PAE + P 

Sbjct: 467 TEEEDKEAQGE-EEEEAEEGGEEAATTSPPAEEAASPEKETKSPVKEEAKSPAEAKSPAE 525 
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Query; 

525 

EEAPAEVQPPPAEEAPAEVQPPPAEEAPSEVQPPPAEEAPAEVQSLPAE-ETPIE-ETLA 582 



++PAE + P ++PAEV+ P ++P+E + P ++PA V+S PAE ++P E ++ A 


Sbjct: 

526 

AKSPAEAKSPAEVKSPAEVKSPAEAKSPAEAKSPAEVKSPATVKS-PAEAKSPAEAKSPA 

584 

Query: 

583 

AVHSPPADDVPAEEASVDKHSPPADLLLTEEFPIGEASAEVSPPPSEQTP-EDEALVENV 

641 



V SP P E S PA++ E ++ AE PS ++P E ++ E 


Sbjct: 

585 

EVKS PAT VK S PGEAKS PAEAKSP AEVKS PVE- - - AKS PAEAKSPAS VKSPGEAKS PAEAK 

641 

Query: 

642 

S-TEFQSPQVAGIP 654 




S E +SP P 


Sbjct: 

642 

SPAEVKSPATVKSP 655 


Score 

= 253 

(38.0 bits). Expect = 9.0e-18, P = 9.0e-18 


Identities = 

" 115/364 (31%), Positives = 166/364 (45%) 


Query: 

110 EVTVP VVQEGS A VKKV AS AE lEPPSTEK FPAK I Q PPLV E EATAKAEPRPAE - ETH VQVQ- 

167 



E PVV + A K + AE +PP+ K PA+ + P ++ A+A+ PAE ++ V+V+ 


Sbjct: 

705 

EAKSPVVAKSPAEAK-SPAEAKPPAEAKSPAEAKSPAEAKSPAEAKS-PAEAKSPVEVKS 

762 

Query: 

168 

PSTEETPDAEAATAVAE — NSVKVQPPPAEEA~PL-VEFPAEIQPPSAEE— SPSVELL 

220 



P ++P E A ++AE + K + P EE P V+ P + + P EE SP 


Sbjct: 

763 

PEKAKSPVKEGAKSLAEAKSPEKAKSPVKEEIKPPAEVKSPEKAKSPMKEEAKSPEKAKT 

822 


Query: 221 AEILPPSAEESPSEEP — PAEILPPPAEKSPSVELLGEIRSPSAQKAPIE-VQPLPAE— 275 

++ P A+ EE PA+I P KSP+ E E +SP ++ EVP E 
Sbjct: 823 LDVKSPEAKTPAKEEAKRPADIRSPEQVKSPAKE EAKSPEKEETRTEKVAPKKEEVK 879 

Query: 276 GALEEAPAKVEPPTVEETLAEVQPLLPEEAPREEARELQLSTAMETPAEEA-P-TEFQSP 333 

+EE AK P VEE E P P+ +E ++ A + AEE P TE 

Sbjct: 880 SPVEEVKAKEPPKKVEE EKTPATPKTEVKESKKDEAPKEAQKPKAEEKEPLTEKPKD 936 

Query: 334 LPKETTAEEASAEIQLLAATEPPADETPAE — ARSPLSEETSAEEAHA-EVQSPLAEETT 390 

P E EEA + AA P +ETPA+ + + AE+A A E P +E 

Sbjct: 937 SPGEAKKEEAKEK KAAA— PEEETPAKLGVKEEAKPKEKAEDAKAKEPSKPSEKEKP 991 

Query: 391 A-EEASAEIQLLAAZEAPADETPAEAQSPLSEETSAEEAPAEVQSPSA-KGVSIEEAPLE 448 

EE A + E E+ + P + + EE Q PS K E++ 

Sbjct: 992 KKEEVPAAPEKKDTKEEKTTESKKPEEKPKMQAKAKEEDKGLPQEPSKPKTEKAEKSSST 1051 

Query: 449 LQPPSGEETTAEEASAA 4 65 

Q S A E AA 

Sbjct: 1052 DQKOSQPSEKAPEDKAA 1068 


Pedant information for DKFZphtes3_17f 10, frame 3 


Report for DKFZphtes3_17fl0.3 


[LENGTH] 710 

(MW] 75131.94 

(pi) 4.02 

(KWJ All Alpha 

[KW] LOW~COMPLEXITY 34.08 % 


SEQ MDRSQQTSRTGYWTMMNI PPVEKVDKEQQTYFSESEIVVISRPDSSSTKSKEDALKHKSS 

SEG 

PRD cccccccccccccccccccceeehhhhhhhccccceeeeeccccccccchhhhhhhhccc 

SEQ GKIFASEHPEFQPATNSNEEIGQKNISRTSFTQETKKGPPVLLEDELREEVTVPWQEGS 

SEG 

PRD cceeecccccccccccccccccccccccccceeeecccccchhhhhhhhhheeeeccccc 

SEQ AVKKVASAEIEPPSTEKFPAKIQPPLVEEATAKAEPRPAEETHVQVQPSTEETPDAEAAT 

SEG xxxxxxxxxxx 

PRD chhhhhhhccccccccccccccccchhhhhhhhhccccccceeeecccccccccchhhhh 

SEQ AVAENSVKVQPPPAEEAPLVEFPAEIQPPSAEESPSVELLAEILPPSAEESPSEEPPAEI 

SEG xxxx xxxxxxxxxxxxxxxxxxxx 

PRD hhhhhcccccccccccceeeeccccccccccccccchhhhhhcccccccccccccccccc 

SEQ LPPPAEKSPSVELLGEIRSPSAQKAPIEVQPLPAEGALEEAPAKVEPPTVEETLAEVQPL 

SEG xxxxxx xxxxxxxxxxxxx xxx 

PRD cccccccccccccccccccccccccccccccccchhhhhcccccccccchhhhhhhhhhc 

SEQ LPEEAPREEARELQLSTAMETPAEEAPTEFQSPLPKETTAEEASAEIQLLAATEPPADET 

SEG xxxxxxxxxxxxxxx . . . .xxxxxxxxxx xxxxxxxxxx 

PRD ccccchhhhhhhhhhhhhhccccccccccccccccchhhhhhhhhhhhhhhhcccccccc 
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SEQ PAEARSPLSEETSAF.EAHAEVQSPLAEETTAEEASAEIQLLAAIEAPADETPAEAQSPLS 

SEG XXXX xxxxxxxxxxxx xxxxxxxxxxxx xxxx 

PRD cccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccc 

SEQ EETSAEEAPAEVQSPSAKGVSIEEAPLELQPPSGEETTAEEASAAIQLLAATEASAEEAP 

SEG xxxxxxxxxxx xxxxxxxxxxx xxxxxxxx 

PRD chhhhhcccccccccccceeecccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhc 

SEQ AEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAP 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ AEVQPPPAEEAPSEVQPPPAEEAPAEVQSLPAEETPIEETLAAVHSPPADDVPAEEASVD 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccchhhhhhhhhcccccccccccccccc 

SEQ KHSPPADLLLTEEFPIGEASAEVSPPPSEQTPEDEALVENVSTEFQSPQVAGIPAVKLGS 

SEG 

PRD cccccceeeeeccccccccccccccccccccccchhhhhccccccccccccccccccccc 

SEQ VVLEGEAKFEEVSKINSVLKDLSNTNDGQAPTLEIESVFHIELKQRPPEL 

SEG 

PRD eeeehhhhhhhhccceeeeeeccccccccceeeehhhhhhhhhhcccccc 

(No Prosite data available for DKFZphtes3_17f 10 . 3) 
(No Pfam data available for DKFZphtes3_17f 10. 3) 
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DKFZphtes3_17117 


group: metabolism 

DKFZphtes3_17H7 encodes a novel 626 amino acid protein with similarity to transketaloases (EC 
2.2.1.1) . 

The novel protein contains a ATP/GTP-binding site motif A (P-loop) . it is a new testis- 
specific transketolase, Transketolase requires thiamin pyrophosphate as cofactor and shows a 
wide specificity for both reactants, e.g. converts hydroxypyruvate and R-CHO into CO (2) and R- 

CHOH-CO-CH(2)0H. 

The new protein can find application in modulation of metabolic pathways involving this 
transketolase activity and as a new enzyme for biotechnologic production processes. 


strong similarity to transketolases 

few EST hits (all from testis or pooled librarys containing testis) 
testis specific transketolase? 

Sequenced by GBF 

Locus: unknown 

Insert length: 2688 bp 

Poly A stretch at pos. 264 9, polyadenylation signal at pos, 2630 


1 GACAAAAGAG AGATGATGGC CAACGACGCC AAGCCCGACG TGAAGACCGT 
51 GCAGGTGCTG CGGGACACAG CCAACCGCCT GCGGATCCAT TCCATCAGGG 
101 CCACGTGTGC CTCTGGTTCT GGCCAGCTCA CGTCGTGCTG CAGTGCAGCG 
151 GAGGTCGTGT CTGTCCTCTT CTTCCACACG ATGAAGTATA AACAGACAGA 
201 CCCAGAACAC CCGGACAACG ACCGGTTCAT CCTCTCCAGG GGACATGCTG 
251 CTCCTATCCT CTATGCTGCT TGGGTGGAGG TGGGTGACAT CAGTGAATCT 
301 GACTTGCTGA ACCTGAGGAA ACTTCACAGC GACTTGGAGA GACACCCTAC 
351 CCCGCGATTG CCGTTTGTTG ACGTGGCAAC AGGGTCCCTA GGTCAGGGAT 
401 TAGGTACTGC ATGTGGAATG GCTTATACTG GCAAGTACCT TGACAAGGCC 
4 51 AGCTACCGGG TGTTCTGCCT TATGGGAGAT GGCGAATCCT CAGAAGGCTC 
501 TGTGTGGGAG GCTTTTGCTT TTGCCTCCCA CTACAACTTG GACAATCTCG 
551 TGGCGGTCTT CGACGTGAAC CGCTTGGGAC AAAGTGGCCC TGCACCCCTT 
601 GAGCATGGCG CAGACATCTA CCAGAATTGC TGTGAAGCCT TTGGATGGAA 
651 TACTTACTTA GTGGATGGCC ATGATGTGGA GGCCTTGTGC CAAGCATTTT 
701 GGCAAGCAAG TCAAGTGAAG AACAAGCCTA CTGCTATAGT TGCCAAGACC 
751 TTCAAAGGTC GGGGTATTCC AAATATTGAG GATGCAGAAA ATTGGCATGG 
801 AAAGCCAGTG CCAAAAGAAA GAGCAGATGC AATTGTCAAA TTAATTGAGA 
851 GTCAGATACA GACCAATGAG AATCTCATAC CAAAATCGCC TGTGGAAGAC 
901 TCACCTCAAA TAAGCATCAC AGATATAAAA ATGACCTCCC CACCTGCTTA 
951 CAAAGTTGGT GACAAGATAG CTACTCAGAA AACATATGGT TTGGCTCTGG 
1001 CTAAACTGGG CCGTGCAAAT GAAAGAGTTA TTGTTCTGAG TGGTGACACG 
1051 ATGAACTCCA CCTTTTCTGA GATATTCAGG AAAGAACACC CTGAGCGTTT 
1101 CATAGAGTGT ATTATTGCTG AACAAAACAT GGTAAGTGTG GCACTAGGCT 
1151 GTGCTACACG TGGTCGAACC ATTGCTTTTG CTGGTGCTTT TGCTGCCTTT 
1201 TTTACTAGAG CATTCGATCA GCTCCGAATG GGAGCCATTT CTCAAGCCAA 
1251 TATCAACCTT ATTGGTTCCC ACTGTGGGGT ATCCACTGGA GAAGATGGAG 
1301 TCTCCCAGAT GGCCCTGGAG GATCTAGCCA TGTTCCGAAG CATTCCCAAT 
1351 TGTACTGTTT TCTATCCAAG TGATGCCATC TCGACAGAGC ATGCTATTTA 
1401 TCTAGCCGCC AATACCAAGG GAATGTGCTT CATTCGAACC AGCCAACCAG 
14 51 AAACTGCAGT TATTTATACC CCACAAGAAA ATTTTGAGAT TGGCCAGGCC 
1501 AAGGTGGTCC GCCACGGTGT CAATGATAAA GTCACAGTAA TTGGAGCTGG 
1551 AGTTACTCTC CATGAAGCCT TAGAAGCTGC TGACCATCTT TCTCAACAAG 
1601 GTATTTCTGT CCGTGTCATC GACCCATTTA CCATTAAACC CCTGGATGCC 
1651 GCCACCATCA TCTCCAGTGC AAAAGCCACA GGCGGCCGAG TTATCACAGT 
1701 GGAGGATCAC TACAGGGAAG GTGGCATTGG AGAAGCTGTT TGTGCAGCTG 
1751 TCTCCAGGGA GCCTGATATC CTTGTTCATC AACTGGCAGT GTCAGGAGTG 
1801 CCTCAACGTG GGAAAACTAG TGAATTGCTG GATATGTTTG GAATCAGTAC 
1851 CAGACACATT ATAGCAGCCG TAACACTTAC TTTAATGAAG TAAACTAGGC 
1901 TTATTTCTAA AAAGTCAAGT CTATTGGCTT TGGCCCAAAA GCACTGGTAT 
1951 CTTTGTATTA AATTCATGTT TATTGTCACA AAACCATTAT TTATACCTAT 
2001 ACAGTTGTAC TGTTTCTTTT AAAGCAAAGC CATTTAACAT CTTTCTTCAT 
2051 TCCTAATTTG GAAATTAAAG TTTACCTTTC TGTTAATCTA TGTATAAATG 
2101 TTACTCTGAG TTATTAATGT GGATTTTAAA ATTGTAAGCA ATAGAATAGG 
2151 AAATAAAACA ACTACCTAAT ACT^TATTT CTGATAAGAC TACAAATATC 
2201 TGACTGAGCT GGGGATTAAA GTAGAGGTAA CTGTATCTTA AATGAGTATG 
2251 ATTTCCTTGT AAGTTAAAAA AATTGAAATT TAATTGTAGA CTTCAATAGT 
2301 CCAAGTTTTG AAGGATGTTT GAGCTTTTGT ATAATGCCAT TTATACCTGC 
2351 AGTTTTACAG ATAATGTTTG ACTGCAGTTG CCTTGGAAAT TCCTCCAAAG 
2401 TTTGCCTTCA TCTCTCCTCT ACAGTTTGGA GGTGATGGTG CAGCAGTGGA 
2451 ACATCTCTTG ATGCACCACA CTACTTGTGT TCTGTGAAGT GATGAAAGTA 
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2501 TAACTGGTTC TAGTTTGCAC ACTACACACA TAGTTTTGTG AAGCTTCAGA 

2551 AATGTTTTTT CTTTTCCTTG TGGCCAAACC AGTTTGTTAA TCTGATTATA 

2601 TTCATCTGCT AATGATACTA AAGTTAATGT AATAAAGCAT TTAAAAATCA 

2651 GAAAAAAAAA AAAAAAAAAA AAAAAAAA7A AAAAAAAA 


BLAST Results 


No BLAST result 


Medline entries 


96214928: 

Amplification of the transketolase gene in desensitization-resistant 
mutant 

Yl mouse adrenocortical tumor cells. 
99123875: 

Properties and functions of the thiamin diphosphate dependent enzyme 
transketolase. 


Peptide information for frame 1 


ORF from 13 bp to 1890 bp; peptide length: 626 
Category: strong similarity to knovm protein 
Classification: Metabolism 
Prosite motifs: ATPGTPA (595-603) 


1 MMANDAKPDV KTVQVLRDTA NRLRIHSIRA TCASGSGQLT SCCSAAEWS 

51 VLFFHTMKYK QTDPEHPDND RFILSRGHAA PILYAAWVEV GDISESDLLN 

101 LRKLHSDLER HPTPRLPFVD VATGSLGQGL GTACGMAYTG KYLDKASYRV 

151 FCLMGDGESS EGSVWEAFAF ASHYNLDNLV AVFDVNRLGQ SGPAPLEHGA 

201 DIYQNCCEAF GWNTYLVDGH DVEALCQAFW QASQVKNKPT AIVAKTFKGR 

251 GIPNIEDAEN WHGKPVPKER ADAIVKLIES QIQTNENLIP KSPVEDSPQI 

301 SITDIKMTSP PAYKVGDKIA TQKTYGLALA KLGRANERVI VLSGDTMNST 

351 FSEIFRKEHP ERFIECIIAE QNMVSVALGC ATRGRTIAFA GAFAAFFTRA 

4 01 FDQLRMGAIS QANINLIGSH CGVSTGEDGV SQMALEDLAM FRSIPNCTVF 

451 YPSDAISTEH AIYLAANTKG MCFIRTSQPE TAVIYTPQEN FEIGQAKWR 

501 HGVNDKVTVI GAGVTLHEAL EAADHLSQQG ISVRVIDPFT IKPLDAATII 

551 SSAKATGGRV ITVEDHYREG GIGEAVCAAV SREPDILVHQ LAVSGVPQRG 

601 KTSELLDMFG ISTRHIIAAV TLTLMK 

BLASTP hits 

NO BLASTP hits available 

Alert BLASTP hits for DKFZphtes3__17117, frame 1 

SWISSPROT:TKT_M0USE TRANSKETOLASE (EC 2.2.1.1) (TK) (P68)., N = 1, 
Score = 2222, P = 2.5e-230 

SWISSPROT:TKT_RAT TRANSKETOLASE (EC 2.2.1.1) (TK)., N = 1, Score = 
2202, P = 3.3e-228 

TREMBL:RN09256_1 product: •'transketolase"; Rattus norvegicus 
Sprague-Dawley transketolase mRNA, complete cds., N « 1, Score = 2202, 
P = 3.3e-228 

SWISSPROT:TKT_HUMAN TRANSKETOLASE (EC 2.2.1.1) (TK)., N 1, Score = 
2200, P - 5.3e-228 

>SWISSPROT:TKT_MOUSE TRANSKETOLASE (EC 2.2.1.1) (TK) (P68) . 
Length =623 

HSPs: 

Score = 2222 (333.4 bits), Expect = 2.5e-230, P = 2.5e-230 
Identities = 417/614 (67%), Positives = 501/614 (81%) 

Query: 7 KPDVKTVQVLRDTANRLRIHSIRATCASGSGQLTSCCSAAEWSVLFFHTMKYKQTDPEH 66 
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KPD + +Q L+DTANRLRI SI+AT A+GSG TSCCSAAE+++VLFFHTM+YK DP + 
Sbjct: 6 KPDQQKLQALKOTANRLRISSIQATTAAGSGHPTSCCSAAEIMAVLFFHTMRYKALDPRN 65 

Query: 67 PDNDRFILSRGHAAPILYAAWVEVGDISESDLLNLRKLHSDLERHPTPRLPFVDVATGSL 126 
P NDRF+LS+GHAAPILYA W E G + E++LLNLRK+ SDL+ HP P+ F DVATGSL 

Sbjct: 66 PHNDRFVLSKGHAAPILYAVWAEAGFLPEAELLNLRKISSDLDGHPVPKQAFTDVATGSL 125 

Query; 127 GQGLGTACGMAYTGKYLDKASYRVFCLMGDGESSEGSVWEAPAFASHYNLDNLVAVFDVN 186 

GQGLG ACGMAYTGKY DKASYRV+C++GDGE SEGSVWEA AFA Y LDNLVA+FD+N 
Sbjct: 126 GQGLGAACGMAYTGKYFDKASYRVYCMLGDGEVSEGSVWEAMAFAGIYKLDNLVAIFDIN 185 

Query: 187 RLGQSGPAPLeHGAOIYQNCCEAFGWNTYLVDGHOVEALCQAFWQASQVKNKPTAIVAKT 246 

RLGQS PAPL+H DIYQ CEAFGW+T +VDGH VE LC+AF QA K++PTAI+AKT 
Sbjct: 186 RLGQSDPAPLQHQVDIYQKRCEAFGWHTIIVDGHSVEELCKAFGQA KHQPTAIIAKT 242 

Query: 247 FKGRGIPNIEDAENWHGKPVPKERADAIVKLIESQIQTNENLIPKSPVEDSPQISITDIK 306 

FKGRGI lED E WHGKP+PK A+ I++ I SQ+Q+ + ++ P ED+P + I +1 + 
Sbjct: 243 FKGRGITGIEDKEAWHGKPLPKMMAEQIIQEIYSQVQSKKKILATPPQEDAPSVDIANIR 302 

Query: 307 MTSPPAYKVGDKIATQKTYGLALAKLGRANERVIVLSGDTMNSTFSEIFRKEHPERFIEC 366 

M +PP+YKVGDKIAT+K YGLALAKLG A++R+I L GDT NSTFSE+F+KEHP+RFIEC 
Sbjct: 303 MPTPPSYKVGDKIATRKAYGLALAKLGHASDRIIALDGDTKNSTFSELFKKEHPDRFIEC 362 

Query: 367 IIAEQNMVSVALGCATRGRTIAFAGAFAAFFTRAFDQLRMGAISQANINLIGSHCGVSTG 426 

lAEQNMVS+A-t-GCATR RT+ F FAAFFTRAFDQ+RM AIS+-fNINL GSHCGVS G 
Sbjct: 363 YIAEQNMVSIAVGCATRDRTVPFCSTFAAFFTRAFDQIRMAAISESNINLCGSHCGVSIG 422 

Query: 427 EDGVSQMALEDLAMFRSIPNCTVFYPSDAISTEHAIYLAANTKGMCFIRTSQPETAVIYT 486 

EDG SQMALEDLAMFRS+P TVFYPSD ++TE A+ LAAMTKG+CFIRTS+PE A+IY+ 
Sbjct: 423 EDGPSQMALEDLAMFRSVPMSTVFYPSDGVATEKAVELAANTKGICFIRTSRPENAIIYS 482 

Query: 487 PQENFEIGQAKVVRHGVNDKVTVIGAGVTLHEALEAADHLSQQGISVRVIDPFTIKPLDA 546 

E+F++GQAKVV +D+VTVIGAGVTLHEAL AA+ L + IS+RV+DPFTIKPLD 
Sbjct: 483 NNEDFQVGQAKVVLKSKDDQVTVIGAGVTLHEALAAAESLKKDKISIRVLDPFTIKPLDR 542 

Query: 547 ATIISSAKATGGRVITVEDHYREGGIGEAVCAAVSREPDILVHQLAVSGVPQRGKTSELL 606 

1+ SA+AT GR++TVEDHY EGGIGEAV AAV EP + V +LAVS VP+ GK +ELL 
Sbjct: 543 KLILDSARATKGRILTVEDHYYEGGIGEAVSAAVVGEPGVTVTRLAVSQVPRSGKPAELL 602 

Query: 607 DMFGISTRHIIAAV 620 

MFGI 1+ AV 
Sbjct: 603 KMFGIDKDAIVQAV 616 

Pedant information for 0KF2phtes3_17117, frame 1 


Report for DKFZphtes3_171i7 . l 


[LENGTH] 

IMW] 

(pIJ 

[HOHOL] 

(FUNCAT] 

[FUNCATJ 

(FUNCATJ 

IFUNCAT] 

(FUNCAT) 

[FUNCATJ 

[FUNCATJ 

{FUNCATJ 

[FUNCATJ 

2e-05 

[FUNCATJ 

deliydrogenasej 

[BLOCKS J 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[SCOP] 

[EC] 

[EC] 

[ECJ 

[EC] 

[PIRKW] 

[PIRKWJ 

[PIRKWJ 

[PIRKWJ 


626 

67877.52 
5.90 

SWISSPROT:TKT_MOUSE TRANSKETOLASE 
m outer membrane and cell wall [M 
g carbohydrate metabolism and transport 


01.05.01 carbohydrate utilization 
30.03 organization of cytoplasm 
02,07 pentose-phosphate pathway 
01,01.01 amino-acid biosynthesis 
i lipid metabolism [H. influenzae, 
c energy conversion (H. influenzae, 


02.01 glycolysis 


(EC 2.2.1.1) (TK) {P68) . 0.0 

jannaschii, MJ0681J 3e-48 

[H. influenzae, HI 1023) 9e-36 
[S. cerevisiae, YPR074cJ 5e-32 
{S. cerevisiae, 
[S. cerevisiae, 
[S. cerevisiae, 
HH439J 3e-17 
HI1233) 2e-09 


YPR074CI 5e-32 
YPR074cJ 5e-32 
YPR074CJ 5e-32 


[S. cerevisiae, YBR221c PDBl - pyruvate dehydrogenase] 


(S. cerevisiae, YBR221C PDBl - pyruvate 


30.16 mitochondrial organization 
2e-05 
BL00801F 
BL00801E 

BL008010 Translcetolase proteins 
BL00801C TransJcetolase proteins 
BL00801B TransJcetolase proteins 
BL00801A TransJcetolase proteins 

dltr}ca2 3.28.1.2.1 Translcetolase Translcetolase, C-terminal domai le-21 

1.2.4.1 Pyruvate dehydrogenase (lipoamide) 8e-ll 

1.2.4.4 3-Methyl-2-oxobutanoate dehydrogenase (lipoamide) 4e-10 

2.2.1.1 TransJcetolase 0.0 

2.2.1.3 Formaldehyde translcetolase le-20 

transferase 0.0 

flavoprotein 2e-07 

Calvin cycle le-40 

heterotetramer 2e-07 
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pentose phosphate pathway 0.0 

magnesium le-40 

thiamine pyrophosphate 0.0 

oxidoreductase 7e-12 

fatty acid biosynthesis 4e-10 

mitochondrion 2e-07 

peroxisome le-20 

homodimer le-40 

pyruvate dehydrogenase (lipoamide) alpha chain le-06 

pyruvate dehydrogenase (lipoamide) beta chain 7e-12 

ferredoxin 2 f4Fe-4S) -related protein 8e-47 

thiamine pyrophosphate-binding domain homology 0.0 

pyruvate dehydrogenase (lipoamide) 6e-08 

ferredoxin 2(4Fe-4S] homology 8e-47 

hypothetical protein C2814 2e-21 

transketolase 0.0 

ATP_GTP_A 1 

Transketolase 

Alpha_Beta 

3D 

L0W_C0MPLEX1TY 3.04 % 


SEQ MMANDAKPDVKTVQVLRDTANRLRIHSIRATCASGSGQLTSCCSAAEWSVLFFHTMKYK 

SEG 

IngsB HHHHHHHHHHHHCCCCHHHHHHHHHHHHHHH-HHCCCT 

SEQ QTDPEHPDNDRFILSRGHAAPILYAAWVEVGDISESDLLNLRKLHSDLERHPTPRLPFVD 

SEG 

SEQ VATGSLGQGLGTACGMAYTGKYLDKASYRVFCLMGDGESSEGSVWEAFAFASHYNLDNLV 

SEG 

IngsB CCCCTTTHHHHHHHHHHHHHHHHCBTTBTTEEEECHHHHHCHHHHHHHHHHHHHCTTT^ 

SEQ AVFDVNRLGQSGPAPLEHGADI YQNCCEAFGWNTYLVDGHDVEALCQAFWQASQVKNKPT 


SEG 

IngsB EEEEECCEETTEEG<3GCCCCCHHHHH-HHHCCEEEETTTTTHHHHHHHHHHHHHTTTTCE 

SEQ AIVAKTFKGRGIPNIEDAENWHGKPVPKERADAIVKLIESQIQTNBNLIPKSPVEDSPQI 

SEG 

IngsB EEEEECTTTTTTCCHHHHHHHHHHTCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCH^ 

SEQ SITDIKMTSPPAYKVGDKIATQKTYGLALAKLGRANERVIVLSGDTMNSTFSEIFRKEHP 

SEG 

IngsB HHHHHHHHHTCCCTTTTCBCHHHHHHHHHHHHHTTTTTEEEEETTTHHHHCCTTCEE^^ 

SEQ ERFIECIIAEQNMVSVALGCATRGRTIAFAGAFAAFFTRAFDQLRMGAISQANINLIGSH 

SEG xxxxxxxxxxxxxxxxxxx 

IngsB GCEEETTTTHHHHHHHHHHHHHHTTTTEEEEEEGGGGGGGHHHHHHHHHHCTTTEEEEEC 

SEQ CGVSTGEDGVSQMAIiEDLAMFRSIPNCTVFYPSDAISTEHAIYLAANTKGMCFIRTSQPE 

SEG 

IngsB CCGGGTTTTTTTTCCHHHHHHHCTTTTEEECCCCHHHHHHHHHHHTTTTCEEEECCCCCB 

SEQ TAVIYTPQENFEIGQAKWRHGVNDKVTVIGAGVTLHEALEAADHLSQQGISVRVIDPFT 

SEG 

IngsB CCTTTTCHHHHHCC-CEEEETTTTTTEEEEECCHHHHHHHHHHHHHHHHCCCEEEE. . . ! 

SEQ IKPLDAATIISSAKATGGRVITVEDHYREGGIGEAVCAAVSREPDILVHQLAVSGVPQRG 

SEG 

IngsB 

SEQ KTSELLDMFGISTRHIIAAVTLTLMK 

SEG 

IngsB 


Prosite for DKFZphtes3_17117.1 
PS00017 595->603 ATP_GTP_A PDOC00017 


Pfam for DKF2phtes3_17117 . l 


HMM_NAME Transketolase 

HMM *vNtIRiLaMDAVEKANSGHPGaPMGMAPMAHVLWqrMMRHNPNDPrWPN 


(PIRKW] 

(PIRKWJ 

I PIRKW J 

{ PIRKW) 

[PIRKW] 

(PIRKW) 

[PIRKWJ 

t PIRKW] 

[SUPFAM] 

ISUPFAMJ 

(SOPFAM) 

(SUPFAM] 

[SUPFAM) 

[SUPFAM] 

(SUPFAM] 

[SUPFAM] 

[PROSITE) 

[PFAM] 

(KW) 

(KWJ 

IKW] 
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+N++RI ++ A + +SG ++++++A++ VL++++M+++++DP P+ 
Query 20 ANRLRIHSIRATCASGSGQLTSCCSAAEWSVLFFHTMKYKQTDPEHPD 68 

HMM RDRFVLSNGHaCMLLYsMWHLyGYDMpMWDLkQFRQWHSrTPGHPEIgHT 

+DRF+LS GHA+++LY+ W + G ++++DL+++R++HS++ +HP ++ 
Query 69 NDRFILSRGHAAPILYAAWVEVGD-ISESDLLNLRKLHSDLERHPTPRLP 117 

HMM PGVEVTTGPLGQGIdNaVWMAIAERnLAATYNRPGFDIfDHYTYCFMGDG 
++ +V+TG+I.GQG++ +++++Y++++ D+++++++C+MGDG 

Query 118 FV-DVATGSLGQGLG TACGMAYTGKYLDKASYRVFCLMGDG 157 

HMM CLMEGISWEACSLAGHMqLGNWIaFYDDNrlSIDGdTdlWFqEDtYakRF 
+ +EG++WEA ++A+H++I,+N++A +D NR++++G+++-I- + D+Y+ + 
Query 158 ESSEGSVHEAFAFASHYNLDNLVAVFDVNRLGQSGPAPLEHGAOIYQNCC 207 

HMM EAYGMHVIEVEnDGHDvEelcaAIEeAKaekDRPTLIiCRTVIGYGSPNk 

EA+GW++ +V DGHDVE++C A+ +A +K++PT+I ++T++G+G+PN 
Query 208 EAFGWNTYLV— DGHDVEALCQAFWQASQVKNKPTAIVAKTFKGRGIPNI 255 

HMM QGTHdWHGAPLGeD* 

++ + WHG+P +++ 
Query 256 EDAENWHGKPVPKE 269 

HMM * PqWePnddklATRKASQqaLeaiGPaLPEfWGGSADLTPSNLTrWKGmv 

P++++ +DKIAT K+++ AL+++G A +++ +S+D+ +S++++++ ++ 
Query 311 PAYKV-GDKIATQKTYGLALAKLGRANERVIVLSGDTMNSTFSEIFRKE 358 

HMM WFMPPSISTDCynGNWsGRYIHYGIREHgMgAIMNGIAlHGgNFRPYGGT 

+ + R+r++ I+E++M++++ G+A++G+ ++++ G 

Query 359 H ' PERFIECIIAEQNMVSVALGCATRGR-TIAFAGA 392 

HMM FMMFyDYARPAIRMAALMelPVIWVWTHDSIGLGEDGPTHQPVEHLAHFR 
F++F+++A++++RM A++ +++++++H++++ GEDG +++++E+LA+FR 
Query 393 FAAFFTRAFDQLRMGAISQANINLIGSHCGVSTGEDGVSQMALEDLAMFR 442 

HMM a I PNMs VWRPCDgNETa yAWyl AvERehTP t iLI LSRQNLPQl E r N Prq f 

+IPN +V++P+D+ T+ A yLA++++-l- +++++S ++ +++++ P + 
Query 443 SIPNCTVFYPSDAISTEHAIYLAANTKGM-CFIRTSQPETAVIYT-PQEN 490 

HMM ekvaRGGYVLkDmdnePDVILIATGSEMELAvaAAKlLadEGIkaRWSM 
++++++++V + + + V++I++G+++++A++AA+ L+ +GI +RV+++ 
Query 491 FEIGQAKWRHGVN— DKVTVIGAGVTLHEALEAADHLSQQGISVRVIDP 538 

HMM PCTeWFD kQDeEYReSVLPDhVPqRVaVEmGvtWCWYKYVGqq 

++++++D + ++++R +++DH++ +++++++V ++ +++ + 

Query 539 FTIKPLDAATIISSAKATGGRVITVEDHYR-EGGIGEAVCAAVSREPOIL 587 

HMM GalfGMNrFGESSGKAPpevLYkMFGFTPENI* 

+ +++ +++ ++ +L+ MFG+ +1 

Query 588 VHQLAVSGVPQR GKTSELLDMFGISTRHI 616 
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DKFZphtes3_17nl2 


group: transcription factors 

DKFZphtes3_17nl2.1 encodes a novel 804 amino 
and trout SOX-LZ. 


acid protein which is nearly identical to mouse 


Sox proteins belong to the HMG box superfamily of DNA-binding proteins and are involved in the 
regulation of developmental processes as germ layer formation, organ development and cell type 
specification. Deletion or mutation of Sox proteins often results in developmental defects and 
congenital disease in humans. Sox proteins perform their function in a complex interplay with 
other transcription factors in a manner highly dependent on cell type and promoter context 
The new protein is related to the SOX-LZ protein and contains an additional leucin-zipper . 

The new protein can find application in modulating/blocking the expression of SOX-controlled 
genes . 


nearly identical to mouse SOX-LZ 


complete cDNA, complete cds, few EST hits 

mouse and trout SOX-LZ, involved in spermatogenesis 

Sequenced by GBF 


Locus: unknown 


Insert length: 2802 bp 

Poly A stretch at pos. 2692, polyadenylation signal at pos . 2660 


1 GGGATAGGAA AGATGAAAGG TCATGGTGAG CTTCAAGGAC ATGAAAGGTT 
51 GTTGTCTCAT GTAACAATAG TAGATTGTTT TTTTTCCTAA TATTTCTAGC 
101 CAGCCCCTAA GTCAGGTGAT GGAACAAATA CCTACAGTTT AGTCAGGTGA 
151 AACAGGAGTG GGTGGAGGAA GGAAAGAAGA AAAATGGGAA GAATGTCTTC 
201 CAAGCAAGCC ACCTCTCCAT TTGCCTGTGC AGCTGATGGA GAGGATGCAA 
251 TGACCCAGGA TTTAACCTCA AGGGAAAAGG AAGAGGGCAG TGATCAACAT 
301 GTGGCCTCCC ATCTGCCTCT GCACCCCATA ATGCACAACA AACCTCACTC 
351 TGAGGAGCTA CCAACACTTG TCAGTACCAT TCAACAAGAT GCTGACTGGG 
401 ACAGCGTTCT GTCATCTCAG CAAAGAATGG AATCAGAGAA TAATAAGTTA 
451 TGTTCCCTAT ATTCCTTCCG AAATACCTCT ACCTCACCAC ATAAGCCTGA 
501 CGAAGGGAGT CGGGACCGTG AGATAATGAC CAGTGTTACT TTTGGAAGCC 
551 CAGAGCGCCG CAAAGGGAGT CTTGCCGATG TGGTGGACAC ACTGAAACAG 
601 AAGAAGCTTG AGGAAATGAC TCGGACTGAA CAAGAGGATT CCTCCTGCAT 
651 GGAAAAACTA CTTTCAAAAG ATTGGAAGGA AAAAATGGAA AGACTAAATA 
701 CCAGTGAACT TCTTGGAGAA ATTAAAGGTA CACCTGAGAG CCTGGCAGAA 
751 AAAGAACGGC AGCTCTCCAC CATGATTACC CAGCTGATCA GTTTACGGGA 
801 GCAGCTACTG GCAGCGCATG ATGAACAGAA AAAACTGGCA GCGTCACAAA 
851 TTGAGAAACA ACGGCAGCAA ATGGACCTTG CTCGCCAACA GCAAGAACAG 
901 ATTGCGAGAC AACAGCAGCA ACTTCTGCAA CAGCAGCACA AAATTAATCT 
951 CCTGCAGCAA CAGATCCAGG TTCAGGGTCA CATGCCTCCG CTCATGATCC 
1001 CAATTTTTCC ACATGACCAG CGGACTCTGG CAGCAGCTGC TGCTGCCCAA 
1051 CAGGGATTCC TCTTCCCCCC TGGAATAACA TACAAACCAG GTGATAACTA 
1101 CCCCGTACAG TTCATTCCAT CAACAATGGC AGCTGCTGCT GCTTCTGGAC 
1151 TCAGCCCTTT ACAGCTCCAG CAGCTCTATG CCGCTCAGCT GGCCAGCATG 
1201 CAGGTGTCAC CTGGAGCAAA GATGCCATCA ACTCCACAGC CACCAAACAC 
1251 AGCAGGGACG GTCTCACCTA CTGGGATAAA AAATGAAAAG AGAGGGACCA 
1301 GCCCTGTAAC TCAAGTTAAG GATGAAGCAG CAGCACAGCC TCTGAATCTC 
1351 TCATCCCGAC CCAAGACAGC AGAGCCTGTA AAGTCCCCAA CGTCTCCCAC 
1401 CCAGAACCTC TTCCCAGCCA GCAAAACCAG CCCTGTCAAT CTGCCAAACA 
1451 AAAGCAGCAT CCCTAGCCCC ATTGGAGGAA GCCTGGGAAG AGGATCCTCT 
1501 TTAGGTAAAT GGAAAAGTCA ACACCAGGAA GAGACTTACG AATTAGATAT 
1551 CCTATCTAGT CTCAACTCCC CTGCCCTTTT TGGGGATCAG GATACAGTGA 
1601 TGAAAGCCAT TCAGGAGGCG CGGAAGATGC GAGAGCAGAT CCAGCGGGAG 
1651 CAACAGCAGC AACAGCCACA TGGTGTTGAC GGGAAACTGT CCTCCATAAA 
1701 TAATATGGGG CTGAACAGCT GCAGGAATGA AAAGGAAAGA ACGCGCTTTG 
1751 AGAATTTGGG GCCCCAGTTA ACGGGAAAGT CAAATGAAGA TGGAAAACTG 
1801 GGCCCAGGTG TCATCGACCT TACTCGGCCA GAAGATGCAG AGGGAAGTAA 
1851 AGCAATGAAT GGCTCTGCAG CTAAACTACA GCAGTATTAT TGTTGGCCAA 
1901 CAGGAGGTGC CACTGTGGCT GAAGCACGAG TCTACAGGGA CGCCCGCGGC 
1951 CGTGCCAGCA GCGAGCCACA CATTAAGCGA CCAATGAATG CATTCATGGT 
2001 TTGGGCAAAG GATGAGAGGA GAAAAATCCT TCAGGCCTTC CCCGACATGC 
2051 ATAACTCCAA CATTAGCAAA ATCTTAGGAT CTCGCTGGAA ATCAATGTCC 
2101 AACCAGGAGA AGCAACCTTA TTATGAAGAG CAGGCCCGGC TAAGCAAGAT 
2151 CCACTTAGAG AAGTACCCAA ACTATAAATA CAAACCCCGA CCGAAACGCA 
2201 CCTGCATTGT TGATGGCAAA AAGCTTCGGA TTGGGGAGTA TAAGCAACTG 
2251 ATGAGGTCTC GGAGACAGGA GATGAGGCAG TTCTTTACTG TGGGGCAACA 
2301 GCCTCAGATT CCAATCACCA CAGGAACAGG TGTTGTGTAT CCTGGTGCTA 
2351 TCACTATGGC AACTACCACA CCATCGCCTC AGATGACATC TGACTGCTCT 
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2401 AGCACCTCGG CCAGCCCGGA GCCCAGCCTC CCGGTCATCC AGAGCACTTA 
24 51 TGGTATGAAG ACAGATGGCG GAAGCCTAGC TGGAAATGAA ATGATCAATG 
2501 GAGAGGATGA AATGGAAATG TATGATGACT ATGAAGATGA CCCCAAATCA 
2551 GACTATAGCA GTGT^AAATGA AGCCCCGGAG GCTGTCAGTG CCAACTGAGG 
2601 AGTTTTTGTT TGCTGAATTA AAGTACTCTG ACATTTCACC CCCCTCCCCA 
2651 ACAAAGAGTT ATTAAAGAGC CCGCATGCAT TTGTGGCTCC ACAATTAAAA 
2701 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
2751 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
2801 AA 


BLAST Results 


No BLAST result 


Medline entries 


95311974: 

A gene that is related to SRY and is expressed in the testes 
encodes a leucine zipper-containing protein. 

96032826: 

The Sry-related HMG box-containing gene Sox6 is 

expressed in the adult testis and developing nervous system 

of the mouse. 


Peptide information for frame 1 


ORF from 184 bp to 2595 bp; peptide length: 804 
Category: strong similarity to known protein 


1 MGRMSSKQAT SPFACAADGE DAMTQDLTSR EKEEGSDQHV ASHLPLHPIM 
51 HNKPHSEELP TLVSTIQQDA DWDSVLSSQQ RMESEKNKLC SLYSFRNTST 
101 SPHKPDEGSR DREIMTSVTF GTPERRKGSL ADVVDTLKQK KLEEMTRTEQ 
151 EDSSCMEKLL SKDWKEKMER LNTSELLGEI KGTPESLAEK ERQLSTMITQ 
201 LISLREQLLA AHDEQKKLAA SQIEKQRQQM DLARQQQEQI ARQQQQLLQQ 
251 QHKINLLQQQ IQVQGHMPPL MIPIFPHDQR TLAAAAAAQQ GFLFPPGITY 
301 KPGDNYPVQF IPSTMAAAAA SGLSPLQLQQ LYAAQLASMQ VSPGAKMPST 
351 PQPPNTAGTV SPTGIKNEKR GTSPVTQVKD EAAAQPLNLS SRPKTAEPVK 
4 01 SPTSPTQNLF PASKTSPVNL PNKSSIPSPI GGSLGRGSSL GKWKSQHQEE 
4 51 TYELDILSSL NSPALFGDQD TVMKAIQEAR KMREQIQREQ QQQQPHGVD6 
501 KLSSINNMGL NSCRNEKERT RFENLGPQLT GKSNEDGKLG PGVIDLTRPE 
551 DAEGSKAMNG SAAKLQQYYC WPTGGATVAE ARVYRDARGR ASSEPHIKRP 
601 MNAFMVWAKD ERRKILQAFP DMHNSNISKI LGSRWKSMSN QEKQPYYEEQ 
651 ARLSKIHLEK YPNYKYKPRP KRTCIVDGKK LRIGEYKQLM RSRRQEMRQF 
701 FTVGQQPQIP ITTGTGWYP GAITMATTTP SPQMTSDCSS TSASPEPSLP 
751 VIQSTYGMKT DGGSLAGNEM INGEDEMEMY DDYEDDPKSD YSSENEAPEA 
801 VSAN 

BLASTP hits 
Entry MMSpXLZ2_l from database TREMBL: 

product: "SOX-LZ"; Mouse mRNA for SOX-LZ, complete cds. 

Score = 3910, P = O.Oe+00, identities « 764/801, positives « 774/801 

Entry 151083 from database PIR: 
SOX-LZ - rainbow trout 

Score = 1774, P = l.le-287, identities ^ 365/532, positives = 431/532 

Entry S59121 from database PIR: 
SOX 6 protein - mouse 

Score - 2319, P « 1.2e-240, identities « 489/660, positives * 527/660 
Entry AB006330_1 from database TREMBL: 

gene: "mSoxSL"; product: "SOXS"; Mus musculus mSoxSL mRNA, complete 
cds. 

Score = 1212, P = 8.9e-209, identities = 274/457, positives = 324/457 
Entry MMU010604_1 from database TREMBL: 

gene: "sox5"; product: "L-Sox5 protein"; Mus musculus mRNA for 
transcription factor L-Sox5 

Score « 879, P = 4.2e-195, identities = 190/281, positives = 218/281 
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Alert BLASTP hits for DKFZphtes3_17nl2, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_17nl2, frame 1 

Report for DKF2phtes3_17nl2 , 1 


t LENGTH) 
[MW] 
(PU 
(HOMOLJ 

IFUNCAT] 

{FUNCAT] 

IFUNCAT J 

cerevisiae, 

I FUNCAT 1 

7e-06 

( FUNCAT ] 

IFUNCAT] 

(FUNCAT] 

IFUNCAT] 

I SCOP] 

{SCOP} 

ISCOP) 

IPIRKW] 

IPIRKW] 

fPIRKWJ 

tPIRKH] 

[PIRKW] 

[PIRKWJ 

[SUPFAMJ 

[SOPFAM] 

IPROSITEl 

[PROSITEJ 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

{PROSITE] 

[PROSITE] 

(PFAM] 

{KW) 

IKW) 

IKW] 

[KW] 


804 

89332.69 
6.97 

TREMBL:MMSOXL22_l product: "SOX-LZ"; Mouse mRNA for S0X-L2, complete cds. 0.0 

04.05.01.04 transcriptional control [S. cerevisiae, YKL032c] 8e-07 
30.10 nuclear organization IS. cerevisiae, YKL032c] 8e-07 
01.07.07 regulation of vitamins, cofactors, and prosthetic groups IS. 
YPR065WJ 5e-06 

03.04 budding, cell polarity and filament formation IS. cerevisiae, yBR089c-a] 

30.13 organization of chromosome structure [S. cerevisiae, YBR089c-a) 7e-06 

03.01 cell growth (S. cerevisiae, YBR089c-a] 7e-06 

03.16 dna synthesis and replication [S. cerevisiae, YMR072w] 2e-04 

30.16 mitochondrial organization [S. cerevisiae, YMR072wi 2e-04 

<^^^t 1.20.1.1.1 HMGl, fragments A and B [rat/hamster (Rattu le-13 

dllefa_ 1.20.1.1.6 Lymphoid enhancer-binding factor, LEFl [mous 4e-15 

dlhrya_ 1.20.1.1.4 SRY (Human (Homo sapiens) 7e-17 

DNA binding 4e-94 

T-cell receptor 4e-07 

leucine zipper le-38 

alternative splicing 2e-07 

transcription factor 4e-16 

transcription regulation le-12 

HMG box homology 0.0 

unassigned HMG box proteins 4e-94 

ATP_GTP_A 1 

LEUCINE_ZIPPER 1 

MYRISTYL 6 

AMIDATION 1 

CAMP_PHOSPH0_SITE 2 

CK2_PH0SPH0_SITE 14. 

PKC_PHOSPHO_SITE 10 

ASN_GLYCOSYLATION 6 

HMG (high mobility group) box 

Irregular 

3D 

LOW_COMPLEXITY 13.81 % 
COILED COIL 3.48 % 


SEQ MGRMSSKQATSPFACAADGEDAMTQDLTSREKEEGSDQHVASHLPLHPIMHNKPHSEELP 

SEG 

COILS [[ 

inhm- !!!!!!!!!!!!!!!! 

SEQ TLVSTIQQDADWDSVLSSQQRMESENNKLCSLYSFRNTSTSPHKPDEGSRDREIMTSVTF 

SEG 

COILS 

Inhm- 

SEQ GTPERRKGSLADVVDTLKQKKLEEMTRTEQEDSSCMEKLLSKDWKEKMERLNTSELLGEI 

SEG 

COILS !!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 

inhm- !!!!!!!!!!!!!!!!!! 

SEQ KGTPESLAEKERQLSTMITQLISLREQLLAAHDEQKKLAASQIEKQRQC^IDLARQQQEQI 

xxxxxxxxxxxxxxx 

COILS CCCCCC 

Inhm- 

SEQ ARQQQQLLQQQHKINLLQQC jGHMPPLMIPIFPHDQRTLAAAAAAQQGFLFPPGITY 

SEG xxxxxxxxxxxxxxxxxxxx:: xxxxxx . 

COILS CCCCCCCCCCCCCCCCCCCCCC ][[ 

Inhm- , !!!!!!!!!!!!!!!!!!! 

SEQ KPGDK . 1- yQFI PSTMAAAAASGLSPLQLQQLYAAQLASMQVSPGAKMPSTPQPPNTAGTV 

SEG xxxxxxxxxxxx 
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COILS 

Inhm- 

SEQ SPTGIKNEKRGTSPVTQVKDEAAAQPLNLSSRPKTAEPVKSPTSPTQNLFPASKTSPVNL 

SEG 

COILS 

Inhm- 

SEQ PNKSSIPSPIGGSLGRGSSLGKWKSQHQEETYELDILSSLNSPALFGDQDTVMKAIQEAR 

SEG . . .xxxxxxxxxxxxxxxxxx 

COILS 

Inhm- 

SEQ KMREQIQREQQQQQPHGVDGKLSSINNMGLNSCRNEKERTRFENLGPQLTGKSNEDGKLG 

SEG . .xxxxxxxxxxxx 

COILS 

Inhm- [[[ 

SEQ PGVI DLTRPEDAEGSKAMNGSAAKLQQYYCWPTGGATVAEARVYRDARGRASSEPHIKRP 

SEG 

COILS .* 

Inhm- CCC 

SEQ MNAFMVWAKDERRKILQAFPDMHNSNISKILGSRWKSMSNQEKQPYYEEQARLSKIHLEK 

SEG X 

COILS 

Inhm- CCCHHHHHHHHHHHHHHHTTTTCCHHHHHHHHHHHTTTTTTHHHHHHHHHHHHHHHHHHH 

SEQ YPNYKYKPRPKRTCIVDGKKLRIGEYKQLMRSRRQEMRQFFTVGQQPQI PITTGTGWYP 

SEG xxxxxxxxxxxx 

COILS 

Inhm- HHHTTTTTTT 

SEQ GAITMATTTPSPQMTSDCSSTSASPEPSLPVIQSTYGMKTDGGSLAGNEMINGEDEMEMY 

SEG xxxxxxx 

COILS 

Inhm- 

SEQ DDYEDDPKSDYSSENEAPEAVSAN 

SEG xxxxxx 

COILS 

Inhm- 
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PSOOOOl 

97 

->101 

ASN GLYCOSYLATION 

PDOCOOOOl 

PSOOOOl 

172 

->176 

ASN GLYCOSYLATION 

PDOCOOOOl 

PSOOOOl 

388 

->392 

ASN GLYCOSYLATION 

PDOCOOOOl 

PSOOOOl 

422 

->426 

ASN GLYCOSYLATION 

PDOCOOOOl 

PSOOOOl 

559- 

->563 

ASN_GLYCOSYLATION 

PDOCOOOOl 

PSOOOOl 

626 

->630 

ASN GLYCOSYLATION 

PDOCOOOOl 

PS00004 

126->130 

CAMP PHOSPHO SITE 

PDOC00004 

PS00004 

369- 

->373 

CAMP PHOSPHO~SITE 

PDOC00004 

PS00005 


5->8 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

28->31 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

94->97 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

136- 

->139 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

203- 

->206 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

299->302 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

390- 

->393 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

512- 

->515 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

530- 

->533 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

692- 

->695 

PKC PHOSPHO SITE 

PDOC00005 

PS00006 

28->32 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

129- 

->133 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

146->150 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

148- 

->152 

CK2 PHOSPHORS ITE 

PDOC00006 

PS00006 

154- 

->158 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

186- 

■>190 

•CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

203- 

■>207 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

221- 

■>225 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

520- 

•>524 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

533- 

■>537 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

547- 

•>551 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

577- 

>581 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

639- 

•>643 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

793- 

•>797 

CK2 PHOSPHO SITE 

PDOC00006 

PS00008 

182- 

■>188 

MYRISTYL 

PDOC00008 

PS00008 

431- 

>437 

MYRISTYL 

PDOC00008 
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PS00008 
PS00008 
PS00008 
PS00008 
PS00009 
PS00017 
PS00029 


437->443 
509->515 
575->581 
762->768 
677->681 
526->534 
187->209 


MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

AMIDATION 

ATP__GTP_A 

LEUCINE ZIPPER 


PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00009 
PDOC00017 
PI>CX:00029 


Pfam for DKFZphtes3_17nl2 . 1 


HMM_NAME 
HMM 

Query 

HMM 

Query 


HMG (high mobility group) box 

*PKRPMNAYMLWMQEMRekIKaENPNdMhNtEISKMiGEMWKnMsEEEKin 
+KRPMNA+M+W+++ R+KI + P DMHN++ISK++G +WK+MS +EK4 
597 IKRPMNAFMVWAKDERRKILQAFP-DMHNSNISKILGSRWKSMSNQEKQ 644 

PYEdMAeeEKqRYMKEMPeYK* 
PY+++ +++ + +++ +p+yK 
645 PYYEEQARLSKIHLEKYPNYK 665 


627 


wo 01/12659 


PCT/IBOO/01496 


DKFZphtes3_17nl8 


group: intracellular transport and trafficking 

DKFZphtes3_17nl8 encodes a novel 782 amino acid protein with weak partial similarity to known 
proteins. 

The novel protein contains a ATP/GTP-binding site motif A (P-loop) and a TonB-dependent 

receptor protein signature 1. In E. coli, the tonB protein interacts with outer membrane 
receptor proteins that mediate uptake of specific substrates into the periplasmic space. In 
the absence of tonB these receptors bind their substrates but do not carry out active 
transport. The novel protein seems to be involved in ATP-dependent transport of substances 
into the cell. 

The new protein can find application in modulation of cell-permeability and transport of 
suitable substrates into the cell. 


unknown receptor 

protein containes T0NB_DEPENDENT_REC_1 Pattern and ATP_GTP_A Pattern, 

Sequenced by GBF 

Locus: unknown 

Insert length: 2853 bp 

Poly A stretch at pos, 2806, no polyadenylation signal found 


1 GTCCTTTTAA GTCAGTAAAT TGAACTAAGT CGGTTATTCG GCAAGCAGTT 
51 CCTATAAAAA ACTACATGGC TAAGGTTCTT AATGATTGAC CACAAGCAGA 
101 TCTTTCACCC TCGGATCTCT AGCTACAAAA GGTCCCCACA CTGAAGAAGC 
151 CACTACCTCC ACCACCACCA GCACCACCAC GTCCAGTGCT GCTGGCAACC 
201 ACTGGGGCAG CCAAGCGCTC CACCCTCTCT CCCACCATGG CCCGTCAGGT 
251 GCGCACCCAC CAGGAGACCC TGAACAGGTT TCAGCAGCAG TCCATCCACC 
301 TGCTGACGGA GCTCCTCAGA CTGAAGATGA AGGCCATGGT GGAGTCTATG 
351 TCGGTGGGTG CCAACCCCTT GGACATCACC AGGCGCTTTG TGGAGGCCAG 
401 CCAGCTCCTC CACCTCAATG CCAAGGAGAT GGCCTTCAAC TGCCTGATCA 
451 GCACAqCCGG GAGAAGTGGC TACAGCAGCG GACAGTTGTG GAAAGAGTCC 
501 CTCGCAAACA TGTCCGCCAT TGGGGTGAAC TCGCCTTACC AGCTGATCTA 
551 CCACTCTTCC ACAGCCTGTC TGAGCTTTTC TCTCTCTGCT GGAAAAGAAG 
601 CCAAGAAGAA AATAGGCAAA TCTAGAACTA CAGAAGATGT CAGCATGCCG 
651 CCCCTGCATC GAGGAGTGGG AACCCCTGCC AACAGCCTGG AGTTCAGCGA 
'701 CCCCTGCCCT GAGGCCCGGG AGAAGCTGCA GGAGTTGTGT CGCCACATAG 
751 AAGCTGAAAG GGCCACATGG AAAGGGAGGA ATATCTCCTA CCCCATGATC 
801 TTACGAAACT ACAAGGCAAA GATGCCCTCT CATCTAATGT TGGCCCGCAA 
851 AGGAGACTCT CAGACCCCGG GTTTACATTA CCCTCCCACT GCAGGTGCTC 
901 AGACTCTCAG CCCCACCTCT CACCCATCTT CTGCCAACCA TCATTTCAGT 
951 CAGCATTGTC AAGAGGGGAA GGCACCCAAG AAGGCCTTCA AGTTTCATTA 
1001 CACCTTCTAT GATGGCTCCT CCTTCGTTTA CTATCCCTCT GGAAACGTCG 
1051 CTGTATGTCA GATCCCCACA TGCTGCAGAG GGAGAACCAT CACCTGCCTC 
1101 TTTAATGACA TACCTGGATT CTCCTTGCTG GCCCTATTCA ATACTGAAGG 
1151 CCAGGGCTGT GTTCACTACA ACCTAAAAAC CAGTTGCCCA TATGTCTTAA 
1201 TCTTGGATGA GGAAGGTGGG ACCACCAATG ACCAGCAGGG CTATGTAGTC 
1251 CACAAGTGGA GCTGGACTTC CAGGACAGAG ACCCTGCTTT CCCTGGAATA 
1301 CAAGGTGAAT GAGGAAATGA AACTAAAGGT ACTGGGACAG GACTCCATCA 
1351 CAGTCACCTT CACCTCCCTG AATGAGACAG TAACACTCAC TGTGTCGGCC 
1401 AACAATTGTC CCCATGGAAT GGCATATGAC AAACGGCTGA ACCGCAGAAT 
1451 CAGCAACATG GACGACAAGG TGTATAAGAT GAGCCGAGCC CTGGCTGAGA 
1501 TCAAGAAGCG GTTTCAGAAG ACAGTGACTC AGTTCATTAA TTCTATCTTG 
1551 CTGGCCGCAG GTCTGTTTAC CATTGAATAT CCCACCAAAA AGGAGGAGGA 
1601 AGAATTTGTT CGGTTCAAGA TGAGATCCAG AACTCATCCC GAGCGGCTCC 
1651 CCAAGCTAAG TTTATACTCA GGAGAAAGTC TTTTACGATC TCAGTCAGGC 
1701 CACCTGGAAT CCTCAATTGC AGAGACTTTG AAGGATGAGC CTGAGTCTGC 
1751 TCCTGTGAGC CCAGTTCGGA AGACCACCAA AATCCACACC AAAGCCAAGG 
1801 TCACATCCAG AGGGAAGGCC CGCGAGGGGC GCAGCCCCAC CAGGTGGGCG 
1851 GCCTTGCCCT CAGACTGCCC GCTGGTGCTG CGGAAGCTCA TGCTCAAGGA 
1901 AGACACCCGT GCTGGCTGCA AGTGCCTGGT GAAGGCGCCC CTGGTCTCTG 
1951 ACGTGGAGCt GGAGCGCTTC CTGTTGGCGC CCCGAGACCC CAGCCAAGTG 
2001 CTGGTGTTTG GGATCATCTC AAGCCAGAAC TACACCAGCA CTGGGCAGCT 
2051 CCAGTGGCTG CTGAACACTC TCTACAACCA CCAGCAGCGG GGCCGTGGCT 
2101 CCCCCTGCAT CCAGTGCCGG TATGACTCCT ACCGCCTGCT GCAGTATGAC 
2151 CTGGACAGCC CCCTGCAGGA GGACCCTCCC CTGATGGTGA AGAAGAACTC 
2201 TGTGGTGCAG GGGATGATTC TGATGTTTGC CGGGGGGAAG CTCATTTTTG 
2251 GGGGCCGTGT TTTGAATGGA TATGGCCTCA GCAAGCAGAA TCTGCTGAAA 
2301 CAGATCTTCC GGTCTCAACA GGATTACAAG ATGGGCTACT TCCTGCCGGA 
2351 TGACTACAAA TTCAGTGTTC CCAACTCTGT CCTGAGCCTG GAGGATTCTG 
2401 AATCAGTCAA GAAAGCCGAG TCAGAAGATA TCCAAGGAAG CAGCTCCTCA 
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2451 TTGGCCCTGG AAGACTATGT GGAt^AAGGAG TTATCTCTGG AGGCTGAGAA 
2501 GACAAGAGAG CCTGAAGTGG AGCTACATCC TCTCAGCAGG GACAGCAAGA 
2551 TAACTAGTTG GAAGAAGCAG GCCTCfcAAGA AGTAGCGCCA TCCTGGCAGC 
2601 AGCCAAGTGA GCCAGGCCCC GGCCCGGGGT GCTGGGGCTT CTTGCCAGCC 
2651 CAGCCCTGCC TCCCCGGTCT CCCACCCTGr CCTCCAAGCT TCTATAATAA 
2701 ACCAGCGGGC CTCCAGCATT GGGGTGAGGC TCTGGGGAAG GACAAAAAAA 
2751 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAGGG 
2801 CGGCCGAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAGGGCGG 
2851 CCG 


ORF from 237 bp to 2582 bp; peptide length: 782 
Category: putative protein 
Prosite motifs: ATP_GTP_A (122-130) 
TONB_DEPENDENT_REC_l (1-44) 


1 MARQVRTHQE TLNRFQQQSI HLLTELLRLK MKAMVESMSV GANPLDITRR 
51 FVEASQLLHL NAKEMAFNCL ISTAGRSGYS SGQLWKESLA NMSAIGVNSP 
101 YQLIYHSSTA CLSFSLSAGK EAKKKIGKSR TTEDVSMPPL HRGVGTPANS 
151 LEFSDPCPEA REKLQELCRH lEAERATWKG RNISYPMILR NYKAKMPSHL 
201 MLARKGDSQT PGLHYPPTAG AQTLSPTSHP SSANHHFSQH CQEGKAPKKA 
251 FKFHYTFYDG SSFVYYPSGN VAVCQIPTCC RGRTITCLFN DIPGFSLLAL 
301 FNTEGQGCVH YNLKTSCPYV LILDEEGGTT NDQQGYVVHK WSWTSRTETL 
351 LSLEYKVNEE MKLKVLGQDS ITVTFTSLNE TVTLTVSANN CPHGMAYDKR 
401 LNRRISNMDD KVYKMSRAIA EIKKRFQKTV TQFINSILLA AGLFTIEYPT 
451 KKEEEEFVRF KMRSRTHPER LPKLSLYSGE SLLRSQSGHL ESSIAETLKD 
501 EPESAPVSPV RKTTKIHTKA KVTSRGKARE GRSPTRWAAL PSDCPLVLRK 
551 LMLKEDTRAG CKCLVKAPLV SDVELERFLL APRDPSQVLV FGIISSQNYT 
601 STGQLQWLLN TLYNHQQRGR GSPCIQCRYD SYRLLQYDLD SPLQEDPPLM 
651 VKKNSVVQGM ILMFAGGKLI FGGRVLNGYG LSKQNLLKQI FRSQQDYKMG 
701 YFLPDDYKFS VPNSVLSLED SESVKKAESE DIQGSSSSLA LEDYVEKELS 
751 LEAEKTREPE VELHPLSRDS KITSWKKQAS KK 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 3 


BLASTP hits 


No BLASTP hits available 


Alert BLASTP hits for DKFZphtes3_17nl8, frame 3 
No Alert BLASTP hits found 


Pedant information for DKFZphtes3_17nl8, frame 3 


Report for DKFZphtes3_17nl8 .3 


[LENGTH! 
[MW] 


782 

88030.16 
9.22 

BL00286 Squash family of serine protease inhibitors proteins 


Iplj 


(BLOCKS] 
[ PROSITE) 
[PROSITE) 
[PROSITEJ 
[PROSITE) 
[PROSITE) 
[PROSITE) 
(PROSITE) 
[PROSITE] 
(KW) 


CAMP_PHOSPHO_SITE 3 

CK2_PHOSPH0_SITE 14 

PR0KAR_LIP0PROTEIN 1 

T0NB_DEPENDENT_REC_1 1 

PKC_PHOSPHO_SITE 10 

ASN^GLYCOSYLATION 4 
Alpha__Beta 


ATP_GTP_A 1 
MYRISTYL 4 
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SEQ MARQVRTHQETLNRFQQQSIHLLTELLRLKMKAMVESMSVGANPLDITRRFVEASQLLHL 

PRD ccchhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhh 

SEQ NAKEMAFNCLISTAGRSGYSSGQLWKESLANMSAIGVNSPYQLIYHSSTACLSFSLSAGK 

PRD hhhhhhhhhhhhcccccccccccchhhhhhhhhcccccccceeeecccceeeecccccch 

SEQ EAKKKIGKSRTTEDVSMPPLHRGVGTPANSLEFSDPCPEAREKLQELCRHIEAERATWKG 

PRD hhhhhhhcccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhc 

SEQ RNISYPMILRNYKAKMPSHLMLARKGDSQTPGLHYPPTAGAQTLSPTSHPSSANHHFSQH 

PRD cccccchhhhhhhhcccccceeeccccccccccccccccccccccccccccccccccccc 

SEQ CQEGKAPKKAFKFHYTFYDGSSFVYYPSGNVAVCQIPTCCRGRTITCLFNDIPGFSLLAL 

PRD ccccccchhhhheeeecccccceeeecccceeeeeccccccceeeeeeccccccceeeee 

SEQ FNTEGQGCVHYNLKTSCPYVLILDEEGGTTNDQQGYWHKWSWTSRTETLLSLEYKVNEE 

PRD ecccccceeeeeccccccceeeeecccccccccceeeeeeecccchhhhhhhhhhhhhhh 

SEQ MKLKVLGQDSITVTFTSLNETVTLTVSANNCPHGMAYDKRLNRRISNMDDKVYKMSRALA 

PRD hhhhhhccceeeeeeccccceeeeeeecccccccchhhhhhhhhhhcccchhhhhhhhhh 

SEQ EIKKRFQKTVTQFINSILLAAGLFTIEYPTKKEEEEFVRFKMRSRTHPERLPKLSLYSGE 

PRD hhhhhhhhhhhhhhhhhhhhcccceeecccchhhhhhhhhhhccccccccccceeeeccc 

SEQ SLLRSQSGHLESSIAETLKDEPESAPVSPVRKTTKIHTKAKVTSRGKAREGRSPTRWAAL 

PRD eeeecccccchhhhhhhhhccccccccccccccccccceeeeeccccccccccccccccc 

SEQ PSDCPLVLRKLMLKEDTRAGCKCLVKAPLVSDVELERFLLAPRDPSQVLVFGIISSQNYT 

PRD ccccchhhhhhhhhhhhhhhhhhhhccccchhhhhhhhhccccccceeeeeeeecccccc 

SEQ STGQLQWLLNTLYNHQQRGRGSPCIQCRYDSYRLLQYDLDSPLQEDPPLMVKKNSVVQGM 

PRD ccchhhhhhhhhhhhhcccccccceeeecccccceeecccccccccccccccccchhhhh 

SEQ ILMFAGGKLIFGGRVLNGYGLSKQNLLKQIFRSQQDYKMGYFLPDDYKFSVPNSVLSLED 

PRD heeeccccccccccccccccccchhhhhhhhhhhhhccccccccccceeecccceeeccc 

SEQ SESVKKAESEDIQGSSSSLALEDYVEKELSLEAEKTREPEVELHPLSRDSKITSWKKQAS 

PRD chhhhhhhhcccccccccchhhhhhhhhhhhhhhhhcccceeeccccccccccccccccc 

SEQ KK 

PRD cc 


Prosite for DKFZphtes3_17nl8 . 3 


PSOOOOl 
PSOOOOl 
PSOOOOl 
PSOOOOl 
PS00004 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PSOOO06 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00008 
PS00008 
PSOOO08 
PS00008 


131->135 
256->260 
329->333 
345->349 
377->381 
4O6->410 
450->454 
466->470 
493->497 
497->501 
571->575 
693->697 
717->721 
145->151 
327->333 
592->598 
734->740 


177->180 
344->347 
450->453 
497->500 
5i3->516 
523->526 
631->634 
723->726 
774->777 


182->186 
379->383 
598->602 
403->407 
511->515 
652->656 


48->51 


91->95 


7->ll 


MYRISTYL 
MYRISTYL 
MYRISTYL 


ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GLYCOS YLAT ION 

ASN_GL YCOS YLAT ION 

C AMP_PHOS PHO_S ITE 

CAMP_PHOS PHO_S ITE 

CAMP_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOS PHO_S ITE 

PKC_PHOSPHO_SITE 

PKC_PHOS PHO_S I TE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_S ITE 

PKC_PHOSPHO_S I TE 

PKC_PHOSPHO_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_S I TE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2 PH0SPH0_SITE 

CK2~PH0SPH0_SITE 

CK2^PH0SPH0 SITE 

CK2_PH0SPH0~SITE 

MYRISTYL 


PDOCOOOOl 
PDOCOOOOl 
PDOCOOOOl 
PDOCOOOOl 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
.PDOC00005 
PDOCG0005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
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PS00013 101->112 PROKAR LIPOPROTEIN PDOC00013 

PS00017 122->130 ATP_GTP A PDOC00017 

PSO043O l->44 TONB_DEPENDENT REC 1 PDOC00354 


(NO Pfam data available for DKF2phtes3_17nl8 . 3) 


631 


wo 01/12659 


PCT/IBOO/01496 


DKFZphtes3_18f3 


group: testes derived 

DKFZphtes3_18f3 encodes a novel 248 amino acid protein with partial similarity to human TNF- 
inducible protein CG12-1- 

The novel protein contains two leucine zippers. 

No informative BLAST results; No predictive prosite, pfara or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 


similarity to TNF-inducible protein CG12-1 

Sequenced by MediGenomix 

Locus: unknown 

Insert length: 4608 bp 

Poly A stretch at pos. 4570, polyadenylation signal at pos. 4550 


1 GACAGAAGTG AATGGGAATG GAGAGGCCGG CGGCCCGGGA GCCGCATGGG 
51 CCCGACGCGC TGCGGCGCTT CCAGGGACTG CTGCTGGACC GCCGAGGCCG 
101 GCTGCACCGC CAGGTGCTGC GCCTGCGCGA GGTGGCCCGG CGCCTGGAGC 
151 GCCTGCGCAG GCGCTCCCTC GTAGCCAACG TGGCCGGCAG CTCGCTGAGC 
201 GCAACGGGCG CCCTCGCCGC CATCGTGGGG CTCTCGCTCA GCCCGGTCAC 
251 CCTGGGGACC TCGCTGCTGG TGTCGGCCGT GGGGCTGGGG GTGGCCACAG 
301 CCGGAGGGGC CGTCACCATC ACGTCCGATC TCTCGCTGAT CTTCTGCAAC 
351 TCCCGGGAGC TGCGGAGGGT GCAGGAGATC GCGGCCACCT GCCAGGACCA 
401 GATGCGAGAG ATCCTGAGCT GCCTCGAGTT TTTCTGCCGC TGGCAGGGCT 
451 GCGGGGACCG CCAGCTGCTG CAGTGCGGGA GGAACGCCTC CATCGCCCTG 
501 TACAATTCTG TCTACTTCAT CGTCTTCTTT GGCTCACGTG GCTTCCTCAT 
551 CCCCAGGCGG GCGGAGGGGG ACACCAAGGT TAGCCAGGCC GTGCTGAAGG 
601 CCAAGATTCA GAAACTGGCC GAGAGCCTGG AGTCCTGCAC CGGGGCTCTG 
651 GACGAACTCA GCGAGCAGCT GGAGTCTCGG GTTCAGCTCT GCACCAAGTC 
701 CAGTCGTGGC CACGACCTCA AGATCTCTGC TGACCAGCGT GCAGGGCTGT 
751 TTTTCTGAGA ACATCCTTTC CCCCTAATGA CCGAGGCCAG CAAATCATCC 
801 TCATGGGATG CTCCAGAATT TGTAGCTCCC TTAGGAAAAC ACCAAGCTGG 
851 GTTAGGAGCC GAAGGCAAAG GATGAGAAAA ACTGTTTTTG AAGTGGGCAG 
901 GTCCCCAAAG CCCTTCTTTT CCCATCACTG TGACATCTGC CTGGGCTTGA 
951 GTGCTACGGA CTTTTCAGTC TTCCTAGTGG AAAAATGTGA CCCAAAAACT 
1001 CCTTTTCCTT TATCAAAAAC TTTCTGTCTA AACACAGCTG GGCAGGCACT 
1051 CCTGTTTTAA AGTTATTTCG GGGTCCCTGA CCCTGCCCTG GTGGCTTGGC 
1101 CTGAGACTGG AGAGAGTGCC ATCCTCTGGG TCCTCTCCAA GTCCTACTAG 
1151 TCTTTGAAGT CCTCAAAATG TGCGTGAGGA AGGCATTTGC CTCTATTCCA 
1201 GAATTTCTGA TACAAAGAAC TCCAGAATCC AGAGCAAATC AGCCCTTCTC 
1251 TGAACGTTGT AGGATGGTTC AGAACCCAGA GAGGACCCTG GTGCTGATAT 
1301 CTCCTCCTCT TCCCTTTCCC CTCAGCTTAC TTACTCCCAG ATGCGGCCTG 
1351 GGTATGAAGT AGGCCTTTCC TGAGTGGCTC CCAATCCAGT CCTCCAAGTA 
1401 CTCAGAGGGG AAGCCCGTGA AGCCGTCATC TAAGTCCTGC TCCCTCACAT 
1451 GAAGCTGAGG GCCAGATAGA TGGAGCGACT GCCAACTTCA TTTCCCGACA 
1501 TCATTGTGTT CAGAAGAGAG TGATGGGTTT TGAGTTAGAC AGTCCTGGGC 
1551 TTGAGACAGG CTTTGTCACT ACTGTGTGAG TGTAGCCACC TAATCTCTCT 
1601 GAGACTGTGT AAAACAAAGA TGATAAAATC TCACCCTGTT GTGAGATATT 
1651 AAATGAGCCA AAGTGCCTAG CATGATGGTG CTGGCTCATA TAGTGTAGTC 
1701 CCTGGAATGG CAAATTAACA TCACCCAGGA ACTTGTTAGA AAGGCAAATT 
1751 CTTGGACACA ACCCTCCTGA TTTATGGAAT CAGAAACTCT GGCTGTGGGG 
1801 CCCAGCAACC TGAGTTTAAA CAATTTCTCT GGGTGGTTCT GCGGCACACT 
1851 AAGGTTTGAA AATCACTACA ACAAATGCTA ACTTCTAATC CCCTTGATGA 
1901 GCTTTCACGA AGTCTCACGG CTTCTCTAGG GACTCCATGG TCTTCAGAGT 
1951 CGTTCACAGA TGACCAAGGA GAGACTGTGT CCCAGAAGCC AAAATGAGAG 
2001 AGAGAGAGAG AGCACGCGTA CGTGCACCCT GGGGCAGTGT CTCACCGTAT 
2051 GAATAAGGGA TGTAACACTA AAAGCCCATT AGGGGGCAGT GTTTCCCGCC 
2101 TGTTGTAGAA ACTGGTACAG AAAGGATCCT ATATGAAGTT CCTGAAACTG 
2151 ACCTTTGTCT ATTATTACCT TCTCTGAAAA GTGCCAGTCC ATGTATTTTT 
2201 TATTTATTTT AAGTTTGTAA TTTAATTTTT AATTATTGTT TAGTGTTTGC 
2251 ATTTAATTTT ATTTAATCAC CACATTTAGA AAATAATAAG AGCAAGTTTC 
2301 TAAATGGGAG ACTGCTGAGG CTCTTTGCAA GAGATGAGAT TAAGTTTGAG 
2351 TTTCTAAGGC AGGGCATGAG CTGGAAATAG CATTGCTTTC CTTGATTGTC 
2401 TCTCTCCTTC AGGGAGATTC TTTTTCTCTA GTGTTTTAAG TGATCCTTTG 
2451 AAGTAAGTGT GGAGAGTCTT GAATGGCAAG ACCAGGAGCT GAGTTTAAGC 
2501 TTGTAATGGA AGCTTGCATT GTGGGATATA TAACTGAGGA AGCATATTTA 
2551 TCCTGAAGGT ATTTTGCCAG AAGGTATCAC TTGACCTGGA AAAGGAATCT 
2 601 ATTTAGTTCA GGAAAGATAA AAAGTTTAGA GGTATGTGAA GGAAGCACTT 
2651 AGAACTTGCA AGCCTGATGT CCTATCAAGT TATGTCTTCT GGGTGACAGA 
2701 CAAAATAGCT TGTCTTATGG TGGTGATGTG TTGCATTTTC ACTTTGGGGT 
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2751 
2801 
2851 
2901 
2951 
3001 
3051 
3101 
3151 
3201 
3251 
3301 
3351 
3401 
3451 
3501 
3551 
3601 
3651 
3701 
3751 
3801 
3851 
3901 
3951 
4001 
4051 
4101 
4151 
4201 
4251 
4301 
4351 
4401 
4451 
4501 
4551 
4601 


CTGTAAGAAA 

TTAACAACTG 

ATGCAAAAAC 

ACCCAATCTG 

GTATAATTTC 

TTATTTTTTC 

TTTATATTGG 

CAGATGCTAG 

TAGGCTGGAG 

CTGGGTTCAA 

AGATGTGCAC 

GGGTTTCGCC 

TCCGCCCACC 

TCGCCTGGCC 

ATTGTTTTGT 

TATTAGTTGT 

TTATTAATGC 

CTAGAATTAA 

AAACTTTTCC 

GTTATGATTT 

GAACGATACT 

ATAATTAACT 

CATCCATCGC 

TCAAGTTCAG 

ACGGGGATGT 

CAGGTTGAAA 

ATGATAAGTG 

TGTTCTAAAT 

TTTGAATTGT 

TCAGGATTCA 

TCGAATCCTA 

AATATGGTAG 

GAGTGGTTGT 

CGTGGTGTAT 

AAAGACATCG 

TGAGTTCATT 

AATAAATTGC 

AAAAAAGG 


CTGTCAGTGA 

TAATGTTGAA 

GGTGCCTCTG 

TCCGCACCCT 

AGTACTGGGG 

TATAAATTGC 

TTTCTTTTCA 

CATTTTTTTT 

TTGCAGTGGT 

GCAATTCTTC 

CAGCACACCC 

ATGTTGGCCA 

TTGGCTTCCC 

AGATGCTAGC 

TTCACAATCA 

GTTATGGCAT 

TTAAGTTTAA 

ACTGGGCACT 

TCTCATATTT 

CAGTGGCCCA 

TTGCACATAG 

GTTTAGCTAT 

CTTATGTGTG 

TTAGATTGAT 

GAATAAGGCT 

TGGTATGTTG 

TACTTCACAA 

GTTTAAGTGC 

TCTGTTTCAC 

ATAGAACTGC 

ACTGCTTTGA 

GTGTCAAAGT 

AGAAGTCTCC 

TTCTCATTCA 

TGCAGAGATA 

TTTTCCCACT 

TCATTCCTCC 


AAATATGTAC 
AAATAAGTTG 
TTACTTAATT 
TCCCAGTGAT 
TCGGGGAGAG 
AATTGGTCTG 
AGCTGGTGTC 
TTTTTTGAGA 
TTGATCTCGG 
TGCCTCAGCC 
GGCTAATTTT 
GGCTGGTCTT 
AAAGTGCTGG 
ATTTTAGATC 
TTTTAAATCA 
AAAGGTACAA 
ATTATATTCT 
TTTGGAAGCA 
TGGGTGTATC 
CTTTATTTCT 
TAGGAACTCA 
CTTAATGAGA 
AGTAAGATTG 
TCTAGAAACA 
TTTCCTTAAG 
TAAAAGAGAA 
AAATGCCAAA 
TTCTCTGTTA 
AATAAAGGAG 
TCCATTAAAA 
TGCACTTGCC 
CAAAAGTATT 
CTAAATCAGA 
ATATTTTAGT 
AATGGGGATA 
GTAGCAAAAT 
AAAAAAAAAA 


AATTCCTTCA 

AAAAGTCTTT 

ATTTAATATT 

GGGGCAGTAT 

GAGGTGATGT 

TATGCTGGTT 

ATCTCCTAGA 

CAGAGTCTCA 

CTCACTGCAA 

TCCTGAGTAG 

TTGTATTTTT 

GAACTCCTGG 

GATTACAGGC 

AAACAATTCA 

TTTTAGAATG 

CCATTCCCTA 

TCCAATGCCT 

GCAACAGTAA 

AAAAGTTCTA 

AAGGAAGAGT 

AGAAATACAT 

ATTTGTTGAC 

GAGCCTCTAT 

AATATTTATT 

GCCTTCATTC 

GACGGGAGAG 

GTTTGAAAAA 

GGTTCTGGGG 

ATTCACTGGG 

AAATAATCCT 

CTCGGGCACC 

TACTGGGAGA 

CATGTCAAGC 

GTGAATTGAG 

CAGTTAAATG 

TAATGCTTTC 

AAAAAAAAAA 


ATTTCCATTC 
GGGACCATAC 
CTATAAATGT 
GTCTGAGGAA 
TTCTACATTT 
TATTTTGAAA 
CTGTTTCACC 
CTCTGTCACC 
CCTCCGACTC 
CTGGGATTAC 
AGTAGAGACA 
CCTTATGTGA 
ATGAGCCACC 
TTTTAGATGA 
TACTTCACAT 
ACTCCATCTT 
AAGCTATTCC 
CAGCAGCAGC 
GACTTTTGAA 
GTCTACTTTG 
TTGAATAATT 
AACAAAAGAT 
CAAGATTTAG 
TCTTTCTTTT 
TTTAAACAAA 
AGGTATTTAG 
TAGGTATGTT 
CTTGCAATCA 
TTCTGCATTT 
TAGCAAGCAT 
TGTCATTTCC 
AAAAAGAGAG 
AATCAGCCAA 
ACACTGAGAT 
TAGCAACTCT 
TCTTTATTGA 
AAAAAAAAAA 


BLAST Results 


Entry HSG27587 from database EMBL: 
human STS SHGC-32548. 
Score = 1951, P = 9.0e-101, identities = 411/425 


Entry HS073350 from database EMBL: 
human STS EST303564. 
Score =» 1417, p = 8.7e-58, identities 


285/287 


No Medline entry 


Medline entries 


Peptide information for frame 2 


ORF from the beginning to 580 bp; peptide length: 194 
Category: questionable ORF 
Classification: no clue 


BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_18f3, frame 2 

PIR:CGB01S collagen alpha 1(1) chain - bovine (fragments), N = 1, Score 
= 155, P * 4.5e-10 

TREMBL:HSCG1PA1_1 gene: "COLlAl"; Human proalpha 1 (I) chain of type I 
procollagen mRNA (partial)., N - 1, Score » 155, P « 6.5e-10 
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>PIR:CGBOlS collagen alpha chain - bovine (fragments) 

Length «= 779 

HSPs: 

Score = 155 (23.3 bits). Expect = 4.5e-10, P = 4.5e-iO 
Identities = 60/152 (39%), Positives 67/152 (44%) 

Query: 7 GEAGGPGAAWARRAAALPGTAA—GPPRPAAPPGA—APARGGPAPGAPAQALPRSQRGR 62 

G+ G PG + AR PG GPP PA P GA AP G A A P SQ 

Sbjct: 230 GDLGAPGPSGARGERGFPGERGVEGPPGPAGPRGANGAPGNDGAKGDAGAP6APGSQGAP 289 

Query: 63 QLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQLPGAAE 122 

h G P RGA PG GD +GA G + G VR L + PG A 
Sbjct: 290 GL QGMPGE-RGAAGLPGPKGDRGDAGPKGADGAPGKDG VRGLTGPIGPPGPAG 341 

Query: 123 GAGDRGHL-P-GP DARDPELPRVFLPLAGLRGPPAA 156 

GD+G P GP D +P P P AG GPP A 

Sbjct: 342 APGDKGEAGPSGPAGTRGAPGDRGEPGPPG— P-AGFAGPPGA 381 

Score = 121 (18.2 bits). Expect = 5.4e-05, P = 5.4e-05 
Identities » 52/154 (33%), Positives = 60/154 (38%) 

Query: 7 GEAGGPGAAWARRAAALPGTAAGPPRPAAPPGAAPARG GPAPGAPAQALPRSQRG 61 

G G PGAA R P AGPP P P G ++G GPA G P + P G 

Sbjct: 434 GATGFPGAA-GRVGPPGPSGNAGPPGPPGPAGKEGSKGPRGETGPA-GRPGEVGPPGPPG 491 

Query: 62 RQLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQLPGAA 121 

AGP G PG PG RG G +RG R L PG + 
Sb j c t : 4 92 P- - AGEKGAPGAD-GPAGAPGT PGPQGI AGQRGVVGLPGQRGE RGFPGL PGPS 541 

Query: 122 EGAGDRGHLPGPDARDPELPRVFLPLAGLRGPPAAAVRE 160 

G +G R P P + GL GPP + RE 
Sbjct: 542 GEPGKQGPSGASGERGPPGP MGPPGLAGPPGESGRE 577 

Score = 117 (17.6 bits). Expect « 1.8e-04, P = 1.8e-04 
Identities « 52/148 (35%), Positives = 62/148 (41%) 

Query: 7 GEAGGPCSAAWARRAAALPGTAAGPPRPAA PPGAAPARGGPAPGAPAQALPRSQRG-R 62 

G G PG AR +A PG A G P A PPG + GP PG P A +G R 

Sbjct: 416 GNVGAPGPKGARGSAGPPG-ATGFPGAAGRVGPPGPS-GNAGP-PGPPGPAGKEGSKGPR 472 

Que r y : 63 QLAERNGRPRRHRGALAQPGH PGDLAAGVGRGAGGGH S RRGRH — HHVRS LADLLQLPGA 120 

GRP G + PG PG GA G G + ++ LPG 

Sbjct: 473 GETGPAGRP GEVGPPGPPGPAGEKGAPGADGPAGAPGTPGPQGIAGQRGWGLPGQ 528 

Query: 121 AEGAGDRGH— LPGPDARDPEL-PRVFLPLAGLRGPP 154 

G+RG LPGP + P +G RGPP 

Sbjct: 529 R GERGFPGLPGPSGEPGKQGPS GASGERGPP 559 

Score « 117 (17.6 bits). Expect = 1.8e-04, P - 1.8e-04 
Identities = 54/162 (33%), Positives = 64/162 (39%) * 

Query: 7 GEAGGPGAAWARRAAALPGT — AAGPPRPAAPPGAAPARG — GPA — PGAPAQALPRSQR 60 

G G PG + PG A+GP P PPG GGAPGP+P + 

Sbjct: 29 GPPGAPGPQGFQGPPGEPGEPGASGPMGPRGPPGPPGKNGDDGEAGKPGRPGERGPPGPQ 88 

Query: 61 G-RQLAERNGRP— RRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHV— RSLADLL 115 

G R L G P + HRG G GD +G G G + R L 

Sbjct: 89 GARGLPGTAGLPGMKGHRGFSGLDGAKGDAGPAGPKGEPGSPGENGAPG®1GPRGLPGFP 148 

Query: 116 QLPGAA — EG-AGDRGHLPGPDARDPELPRVFLPLAGLRGPPAAA 157. 

GAA G AG+RG +PGP P AG +GPP A 

Sbjct: 149 GPKGAAGEPGKAGERG-VPGPPGAVG— PAGKDGEAGAQGPPGPA 190 

Score = 113 (17.0 bits). Expect 5.4e-04, P = 5.4e-04 
Identities = 54/148 (36%), Positives - 58/148 (39%) 

Query: 7 GEA(5GPGAAWARRAAALPGTA AGPPRPAAP PGAAPARGGPAP-GAPAQALPR 57 

G AG PGA A PG A AGPP PA P PG G P P GA A P 

Sbjct: 374 GFAGPPGADGQPGAKGEPGDAGAKGDAGPPGPAGPAGPPGPIGNVGAPGPKGARGSAGPP 433 


Query: 58 SQRGRQLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQL 117 

G A P G PG PG +G G GR V 
Sb j ct : 434 GATGFPGAAGRVGPPGPSGNAGPPGPPGPAGKEGSKGPRGETGPAGRPGEVGP 486 

Query: 118 PGAAEGAGORGHLPGPD— ARDPELPRVFLPLAGLRG 152 

PG AG++G PG D A P P +AG RG 

Sbjct: 487 PGPPGPAGEKG-APGADGPAGAPGTPGP-QGIAGQRG 521 

Score = 110 (16.5 bits), Expect = 1.3e-03, P = 1.2e-03 
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7 GEAGGPGAAWARRAAALPGTAAGPPRPAAPPG— AAPAR-GGPAP-GAPAQALPRSQRGR 62 
GE G G A + LPG A GPP A PG P G P P GA + +RG 

194 GERGEQGPAGSPGFQGLPGPA-GPPGEAGKPGEQGVPGDLGAPGPSGARGERGFPGERGV 252 

63 QLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQLPGAAE 122 
+ PR GA G GD A G+ G +G R A L PG 

253 EGPPGPAGPRGANGAPGNDGAKGDAGAPGAPGSQGAPGLQGMPGE-RGAAGL— PGPK- 307 

123 GAGDRGHLPGPDARD — PELPRVFLPLAGLRGPPAAA 157 

GDRG GP D P V L G GPP A 
308 — GDRGDA-GPKGADGAPGKDGV-RGLTGPIGPPGPA 340 


Identities « 54/151 <35%), Positives = 60/151 (39%) 
Query 
Sbjct 
Query 

Sbjct 
Query 
Sbjct 

Score - 109 (16.4 bits). Expect = 1.7e-03, P = 1.7e-03 
Identities = 55/154 (35%), Positives = 60/154 (38%) 

Query: 4 NGN-GEAGGPGAAWARRAAALPGTAAGPPRPAAPPGAAPARG-GPAPGAPAQALPRSQRG 61 

NG+ GEAG PG R P A G P A PG RG GA A P +G 

Sbjct: 67 NGDDGEAGKPGRP-GERGPPGPQGARGLPGTAGLPGMKGHRGFSGLDGAKGDAGPAGPKG 125 

Query: 62 RQLAE-RNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSL ADLL 115 

+ NGP+G PGPGA GG G V A 

Sbjct: 126 EPGSPGENGAPGQ-MGPRGLPGFPGPKGAAGEPGKAGERGVPGPPGAVGPAGKDGEAGAQ 184 

Query: 116 QLPGAAEGAGDRGHLPGPDARDPELPRVFLPLAGLRGPPAAA 157 

PG A AG+RG GP A P F L G GPP A 

Sbjct: 185 GPPGPAGPAGERGE-QGP-AGSPG FQGLPGPAGPPGEA 220 

Score = 104 (15.6 bits), Expect = 6.6e-03, P = 6.6e-03 
Identities - 44/131 (33%), Positives - 49/131 (37%) 

Query: 2 EVNGNGEAGGPGAAWARRAAALPGTAAGPPRPAAPPGAAPARGGPAP-GAPAQALPRSQR 60 

E GE G PG R LPG GP A . PG A RG P P GA A + 
Sbjct: 126 EPGSPGENGAPG(»IGPR GLPGFP-GPKGAAGEPGKAGERGVPGPPGAVGPAGKDGEA 181 

Query: 61 GRQLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQLPGA 120 

G Q P RG G PG G+ G G G+ DL PG 

Sbjct: 182 GAQGPPGPAGPAGERGEQGPAGSPG— FQGLP-GPAGPPGEAGKPGEQGVPGDL-GAPGP 237 

Query: 121 AEGAGDRGHLPG 132 

+ G+RG PG 
Sbjct: 238 SGARGERG-FPG 248 


Score =104 (15.6 bits). Expect = 6.6e-03, P = 6.6e-03 
Identities = 43/131 (32%), Positives = 55/131 (41%) 

Query: 7 GEAGGPGAAWARRAAALPGTAAGPPRPAAPPGAAPARGGPAPGAPAQALPRSQRGRQLAE 66 

GEAG GARA PG GPPPGA GPPGAQ + + G A+ 
Sbjct: 347 GEAGPSGPAGTRGA PGDR-GEPGPPGPAGFA GP-PGADGQPGAKGEPGDAGAK 397 

Query: 67 RNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQLPGAAEGAGD 126 

+ P G PG G++ A +GA G G + A + PG + AG 

Sbjct: 398 GDAGPPGPAGPAGPPGPIGNVGAPGPKGARGSAGPPGATGFPGA-AGRVGPPGPSGNAGP 456 

Query: 127 RGHLPGPDARD 137 

G PGP ++ 
Sbjct: 457 PGP-PGPAGKE 4 66 

Score = 104 (15.6 bits). Expect = 6.6e-03, P = 6.6e-03 
Identities = 56/162 (34%), Positives 62/162 (38%) 

Query: 7 GEAGGPGAAWARRAAALPGTAA— GPPRPAAPPGAAPARGGPAPGAPAQALPRSQRGRQL 64 

G G PGA A G GP P P G A ARG P P Q PR +G 

Sbjct: 608 GPPGAPGAPGPVGPAGKSGDRGETGPAGPIGPVGPAGARG-— PAGP-QG-PRGBKG2TG 662 

Query: 65 AERNGRPRRHRG ALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLA-DLLQ-LPG 119 

+ + + HRG PG PG GA G RG SDL LPG 

Sbjct: 663 ZZGBRGIKGHRGFSGLQGPPGPPGSPGEQGPSGASGPAGPRGPPGSAGSPGKDGLNGLPG 722 

Query: 120 AAEGAGDRGHL— PGPDARDPELPRVFLPLAGLRGPPAAAVREERLHRPVQ 168 

G RG GP A P P P G GPP+ L +P Q 

Sbjct: 723 PIGPPGPRGRTGDAGP-AGPPGPPG P-PGPPGPPSGGYDLSFLPQPPQ 768 

Score ^ 101 (15.2 bits). Expect « 1.5e-02, P = 1.5e-02 
Identities = 49/148 (33%), Positives * 55/148 (37%) 

Query: 7 GEAGGPGAAWARRAAALPGTAAGPPRPAAPPGAAPARGGPAPGAPA QALPRSQRGR 62 

G AG PG A R PG A GP A G A A+G P P PA + P G 

Sbjct: 152 GAAGEPGKAGERGVPGPPG-AVGP— AGKDGEAGAQGPPGPAGPAGERGEQGPAGSPGF 207 
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Query: 63 QLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQLPGAAE 122 

Q P G + G PGDL A G G RG R + PG A 

Sbjct: 208 QGLPGPAGPPGEAGKPGEQGVPGDLGAP GPSGARGERGFPGE-RGVEGP PGPAG 260 

Query: 123 GAGDRGHLPGPDARDPELPRVFLPLAGLRGPP 154 

GGPGD + PG+GP 
Sbjct: 261 PRGANG-APGNDGAKGDAGAPGAP— GSQGAP 289 

Score = 100 (15.0 bits), Expect « 1.9e-02, P = 1.9e-02 
Identities « 40/130 (30%), Positives = 48/130 (36%) 

Query: 


Sbjct: 
Query: 

Sbjct: 
Query: 
Sbjct: 


7 GEAGGPGAAWARRAAALPGT--AAGPPRPAAPPGAAPARG--GPA— PGAPAQALPRSQR 60 
G G PG + PG A+GP P PPG GGAPGP+P + 

29 GPP6APGPQGF0GPPGEPGEPGASGPMGPRGPPGPPGKNGDDGEAGKPGRPGERGPPGPQ 88 

61 G-RQLAERNGRP— RRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQL 117 

G R L G P 4^ HRG G GO +G G G + L 

89 GARGLPGTAGLPGMKGHRGFSGLDGAKGDAGPAGPKGEPGSPGENGAPGQMGPRG-LPGF 147 

118 PGAAEGAGDRG 128 

PG AG+ G 
148 PGPKGAAGEPG 158 


Score = 99 (14.9 bits). Expect 2.5e-02, P = 2.5e-02 
Identities = 53/156 (33%), Positives = 61/156 (39%) 


7 GEAGGPGAAWARRA AALPGT--AAGPPRPAAPPGAAPARG— GPA PGAPAQAL 55 

G G PGA R APG AGPPPG+RG GPA P PA A 

587 GRDGSPGAKGDRGETGPAGAPGPPGAPGAPGPVGPAGKSGDRGETGPAGPIGPVGPAGAR 64 6 

56 PRSQRGRQLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHV 108 

PR +G + + + HRG G PG + +G G G 

647 GPAGPQGPRGBKGZTGZZGBRGIKGHRGFSGLQGPPGPPGSPGEQGPSGASGPAGPRGP- 705 

109 RSLADLLQLPGAAEGAGDRG— HLPGPDARDPELPRVFLPLAGLRGPP 154 

PG+A G G LPGP P PR AG GPP 

706 PGSAGSPGKDGLNGLPGPIG— PPGPRGRTGDAGPAGPP 742 


Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 

Score « 98 (14.7 bits). Expect = 3.3e-02, P = 3.3e-02 
Identities » 51/158 (32%), Positives = 58/158 (36%) 

Query: 7 GEACKSPGAAWARRAAALPGTA AGPPRPAAPPCyuVPARGGPAP-GAPAQALPRSQR 60 

G G G R AA LPG AGP PG RG P G P A + 

Sbjct: 287 GAPGLQGMPGERGAAGLPGPKGDRGDAGPKGADGAPGKDGVRGLTGPIGPPGPA<5APGDK 346 

Query: 61 GRQLAERNGRPRRHRGA LAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQL 117 

G A+GP RGA +PG PG GA G +G + D 

Sbjct: 347 GE— AGPSG-PAGTRGAPGDRGEPGPPGPAGFAGPPGADGQPGAKGEPGDAGAKGDAGP- 402 

Query: 118 PGAAEGAGDRGHLPGPDARDPELPRVFLPLAGLRGPPAAAVR 159 

PGAAGG-t- AP+R GGPAAR 
Sbjct: 403 PGPAGPAGPPGPIGNVGAPGPKC^RGSAGPPGATGFPGT^GR 444 

Score =96 (14.4 bits). Expect = 5.7e-02, P = 5.5e-02 
Identities 46/152 (30%), Positives = 57/152 (37%) 

Query: 6 NGEAGGPGAAWARRAAALPGTAA — GPPRPAAPPGAAPARGGPAPGAPA-QALPRSQRGR 62 

+G G PGA + PG G PA PG A G P P PA ++ R + G 

Sbjct: 574 SGREGAPGAEGSPGRDGSPGAKGDRGETGPAGAPGPPGAPGAPGPVGPAGKSGDRGETGP 633 

Query: 63 QLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQLPGAAE 122 

P RG G G+ +G G RG H R + L PG 

Sbjct: 634 AGPIGPVGPAGARGPAGPQGPRGB KGZTGZ2GBRGIKGH-RGFSGLQGPPGPPG 686 

Query: 123 GAGDRGHLPGPDARDPELPRVFLPLAGLRGPPAAA 157 

G++G PAP AG RGPP +A 

Sbjct: 687 SPGEQG~PS-GASGP AGPRGPPGSA 709 

Score « 94 (14.1 bits). Expect = 9.7e-02, P - 9.2e-02 
Identities « 45/134 (33%), Positives = 56/134 (41%) 

Query: 24 PGTAAGPPRPAAPPGAAPARGGPA-PGAPAQALPRSQRGRQLAERNGRPRRHR— GALAQ 80 

P GPP PG +G P PG P + P RG G P ++ G + 

Sbjct: 21 PSGPRGLPGPPGAPGPQGFQGPPGEPGEPGASGPMGPRGPP GPPGKNGDDGEAGK 75 


Query: 81 PGHPGDLAA-GV— GRGAGGGHSRRGRHHHVRSLADLLQLPGAAEGAGDRGH— LPGPDA 135 

PG PG+ G RG G G H R + L G A AG +G PG + 

Sb j ct : 76 PGRPGERGPPGPQGARGLPGTAGLPGMKGH-RGFSGIiDGAKGDAGPAGPKGEPGS PGENG 134 

Query: 136 RDPEL-PRVFLPLAGLRGPPAAA 157 
++ PR LP G GP AA 
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Sbjct: 135 APGQMGPRG-LP— GFPGPKGAA 154 

Score - 92 (13.8 bits). Expect =» 1.7e-01, P = 1.5e-01 
Identities = 52/155 (33%), Positives = 58/155 (37%) 

Query: 1 GEAGGPGAAWARRAAALPGTAAGPPRPAAPPGAAPARGGP-APGAPAQALPRSQRGRQLA 65 

GEAG G A R A G GPP PA G AGPAGPA + G 
Sbjct: 347 GEAGPSGPAGTRGAPGDRGEP-GPPGPAGFAGPPGADGQPGAKGEPGDAGAKGDAGPPGP 405 

Query: 66 ERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGR— HHHVRSLADLLQLPGAA— 121 

P G + PG G GA G GR A PG A 

Sbjct: 406 AGPAGPPGPIGNVGAPGPKGARGSAGPPGATGFPGAAGRVGPPGPSGNAGPPGPPGPAGK 465 

Query: 122 EGA-GDRGHLPGPDARDPELPRVFLP-LAGLRGPPAA 156 

EG+ G RG GP R E+ P AG +G P A 

Sbjct: 466 EGSKGPRGCT-GPAGRPGEVGPPGPPGPAGEKGAPGA 501 

score =92 (13.8 bits). Expect - 1.7e-01, P = 1.5e-01 
Identities = 51/156 (32%), Positives = 57/156 (36%) 

Query: 7 GEAGGPGAAWARRA AALPGT— AAGPPRPAAPPGAAPARGGPAPGAPAQAL-PRSQR 60 

G G PGA R APG AGPPPG+RG PP +P R 

Sbjct: 587 GRDGSPGAKGDRGETGPAGAPGPPGAPGAPGPVGPAGKSGDRGETGPAGPIGPVGPAGAR 646 

Query: 61 GRQLAERNGRPRRHRGALAQPGHPGDLA-AGVG—RGAGGGHSRRGRH— HHVRSIADLL 115 

GAGPR+G+GG G +GG G A 

Sbjct: 647 GP~-AGPQG-PRGBKGZTGZZGBRGIKGHRGFSGLQGPPGPPGSPGEQGPSGASGPAGPR 703 

Query: 116 QLPGAAEGAGDRG — HLPGPDARDPELPRVFLPLAGLRGPP 154 

PG+A G G LPGP P PR AG GPP 

Sbjct: 704 GPPGSAGSPGKDGLNGLPGPIG—PPGPRGRTGDAGPAGPP 742 

Score = 90 (13.5 bits), Expect = 2.8e-01, P = 2.5e-01 
Identities = 45/134 (33%), Positives = 53/134 (39%) 

Query: 7 GEAGGPGAAWARRAAALPGTAAGPPRPAAPPGAAPARGGPAPGAPAQALPRSQRGRQ-LA 65 

GGPGA+A GAPPPGARG GPQ R +RG L 
Sbjct: 485 GPPGPPGPAGEKGAPGADGPAGAPGTPG-PQGIAGQRG— VVGLPGQ RGERGFPGLP 538 

Query: 66 ERNGRPRRH--RGALAQPGHPGDLA AGV GR-GAGGGHSRRGRHHHVRSLADL 114 

+G P + GA + G PG + AG GR GA G GR + D 

Sbjct: 539 GPSGEPGKQGPSGASGERGPPGPMGPPGLAGPPGESGREGAPGAEGSPGRDGSPGAKGDR 598 

Query: 115 LQL-PGAAEGAGDRGHLPGP 133 
+ P A G PGP 

Sbjct: 599 GETGPAGAPGPPGAPGAPGP 618 

Score = 83 (12.5 bits). Expect = 1.8e+00, P = 8.3e-01 
Identities - 49/156 (31%), Positives - 56/156 (35%) 

Query: 7 GEAGGPGAAWARRAAALPGTAA— GPPRPAAPPGAAPARG— GPAP—GAPAQALPRSQR 60 

G+AG GA A + G GPP PA PG G GPA GAP R + 

Sbjct: 311 GDAGPKGADGAPGKDGVRGLTGPIGPPGPAGAPGDKGEAGPSGPAGTRGAPGD RGEP 367 

Query: 61 GRQLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQLPGA 120 

G P G G PGD A G G G + PG 
Sbjct: 368 GPPGPAGFAGPPGADGQPGAKGEPGDAGAKGDAGPPGPAGPAGPPGPIGNVG APGP 423 

Query: 121 AEGAGDRGHLPGPDARDPELPRVFLP LAGLRGPPAAAVRE 160 

G G PG RV P AG GPP A +E 

Sbjct: 424 KGARGSAGP-PGATGFPGAAGRVGPPGPSGNAGPPGPPGPAGKE 466 

Score « 82 (12.3 bits). Expect = 2.3e+00, P = 9.0e-01 
Identities « 46/148 (31%), Positives = 52/148 (35%) 

Query: 7 geaggpgaawarraaalpgtaagpprpaappgaaparggpapgapaqalprsqrgrqlae 66 

G+AG PGA ++ALGG APG RGPAP RL 
Sb j C t : 275 GDAGAPGAPGSQGAPGLQGMP-GERGAAGLPGPRGDRGDAGPKG-ADGAPGKDGVRGLTG 332 

Query: 67 RNGRPRRHRGALAQPGHPGOLAAGVGRGAGGGHSRRGRHHHVRSLADLLQLPGAAEGAGD 126 

G P G PG G+ G G RG A PGA G 

Sbjct: 333 PIGPP GPAGAPGDKGEAGPSGPAGTRGAPGDRGEPGPPGP-AGFAGPPGADGQPGA 387 

Query: 127 RGHLPGP-DARDPELPRVFLPLAGLRGPP 154 

+G PG A+ P P AG GPP 
Sbjct: 388 KGE-PGDAGAKGDAGPPG— P-AGPAGPP 412 


Peptide information for frame 3 
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ORF from 12 bp to 755 bp; peptide length: 248 
Category: similarity to known protein 

Classification: unset 

Prosite motifs: LEUCINE^ZIPPER (17-39) 
LEUCINE ZIPPER (24-46) 


BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_18f 3, frame 3 

TREMBL:AF07057 5_1 product: "TNF-inducible protein CG12-1"; Homo 
sapiens TNF-inducible protein CG12-1 mRNA, complete cds., N = 1, Score 
= 135, P = le-06 

TREMBL:HS6802_1 gene: "dJ6802.1"; product: •■dJ6802 . 1"; Homo sapiens 

DNA sequence from PAC 6802 on chromosome 22. Contains apolipoprotein L, 
myosin heavy chain, ESTs, CA repeat, STS and GSS., N = 1, Score = 107. 
P = 0.0023 

>TREMBL:AF070675_1 product: "TNF-inducible protein CG12-1"; Homo sapiens 
TNF-inducible protein CG12-1 mRNA, complete cds. 
Length = 331 

HSPs: 

Score = 135 (20.3 bits), Expect = l.Oe-06, P = l.Oe-06 
Identities = 30/103 (29%)/ Positives « 55/103 (53%) 

Query: 30 RLHRQVLRLREVARRLERLRRRSLVANVAGSSLSATGALAAIVGLSLSPVTLGTSLLVSA 89 

++ + +LR +A +E + R ++NV SS A + ++ GL L+P T GTSL ++A 
Sbjct: 91 KIQESIEKLRALANGIEEVHRGCTISNVVSSSTGAASGIMSLAGLVLAPFTAGTSLALTA 150 

Query: 90 VGLGVATAGGAVTITSDL-SLIFCNSRELRRVQEIAATCQDQMR 132 

G+G+ A IT+ + + +S E + AT D+++ 

Sbjct: 151 AGVGLGAASAVTGITTSIVEHSYTSSAEAE-ASRLTATSIDRLK 193 

Pedant information for DKFZphtes3_18f 3, frame 2 

Report for DKrzphtes3_18f3.2 

t LENGTH j 193 
[MW] 19708.24 
Ipl) 11.90 
[KWJ All_Alpha 

[KW] LOW_COMPLEXITY 55.44 % 

SEQ TEVNGNGEAGGPGAAWARRAAALPGTAAGPPRPAAPPGAAPARGGPAPGAPAQALPRSQR 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx. . . 

PRD cccccccccccccchhhhhhhhhccccccccccccccccccccccccccccccchhhhhh 

SEQ GRQLAERNGRPRRHRGALAQPGH PGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQLPGA 

SEG xxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhcccccccccccccccccccccccccccccccccccccccchhhhhhhhhccccc 

SEQ AEGAGDRGHLPGPDARDPELPRVFLPLAGLRGPPAAAVREERLHRPVQFCLLHRLLWLTW 

xxxxxxxxxxxxx xxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccchhhhhhhhcccchhhhhhhhhhhhc 

SEQ LPHPQAGGGGHQG 
SEG xxxxxxxxxxxxx 
PRD ccccccccccccc 

(No Prosite data available for DKFZphtes3_18f3.2) 
(No Pfam data available for DKFZphtes3_18f 3.2) 

Pedant information for DKFZphtes3 18f3, frame 3 
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Report for DKFZphtes3_18f 3 . 3 


{LENGTH] 248 

IMW] 27162.56 

[pU 9.92 

IPROSITE) LEUCINE ZIPPER 2 

(KWl TRANSMEMBRANE 1 

IKWJ LOW COMPLEXITY 30.65 % 

IKWJ COILED COIL 12.10 % 


SEQ MGMERPAAREPHGPDALRRFQGLLLDRRGRLHRQVLRLREVARRLERLRRRSLVANVAGS 

SEG XXXXXXXXXXXXXXXXXX . XXXXXXXXXXXXXXXXXXXX . . XXX 

PRO cccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccc 


COILS 
MEM 


SEQ SLSATGALAAI VGLSLSPVTLGTSLLVSAVGLGVATAGGAVTITSDLSLI FCNSRELRRV 

SEG XXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXX 

PRD cchhhhhhhhhhhhcccccccccccccccccceeeeccceeeeeeceeeeecchhhhhhh 

COILS 

MEM MMMMMMMHMMMMMMMMH 

SEQ QEIAATCQDQMREILSCLEFFCRWQGCGDRQLLQCGRNASIALYNSVYFIVFFGSRGFLI 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhcccccchhhhhccccchhhhhcceeeeeecccccccc 

COILS 

MEM 


SEQ PRRAEGDTKVSQAVLKAKIQKLAESLESCTGALDELSEQLESRVQLCTKSSRGHDLKISA 

SEG 

PRO ccccccccchhhhhhhhhhhhhhhhhhhhhcchhhhhhhhhhhhhhhhcccccceeeehh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

MEM 

SEQ DQRAGLFF 

SEG 

PRD hhhhhccc 

COILS 

MEM 


Prosite for DKFZphtes3_18f3.3 

PS00029 17->39 LEUCINEZIPPER PDOC00029 

PS00029 24->46 LEUCINE ZIPPER PDOC00029 


(NO Pfam data available for DKF2phtes3_18f3.3) 
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DKFZphtes3_1817 


group: cell structure and motility 

DKFZphtes3_18l7 encodes a novel 1050 amino acid protein with weak partial similarity to 
ankyrins. 

The novel protein contains an ATP/GTP-binding site motif A (P-loop) and an Ank repeat, 
Ankyrins are peripheral membrane proteins which interconnect integral proteins with the 
spectrin-based membrane skeleton. Thus the novel protein seems to be involved in coupling of 
cyto skeleton and cell membrane. 

The new protein can find application in modulation of cyto skeleton-membrane interactions. 


similarity to ankyrins 
Sequenced by MediGenomix 
Locus: unknown 
Insert length: 4501 bp 

Poly A stretch at pos. 4423, no polyadenylation signal found 


1 GATCGCCGCG CGAGGGTGGT GGGCATCGAG GTCCCAGCAG CGGACGAGGG 
51 AGGTGCCGCC GTCGCCCAGG ATGGGCTGGG AATGAAGCGA TGTAGCCTTT 
101 TAAGAGATTT GCTCTGACCC ATCTGAAGTC CATATGGCTC TGTATGATGA 
151 AGACCTCCTG AAAAATCCTT TCTATCTGGC TCTGCAAAAG TGCCGCCCTG 
201 ACTTGTGCAG CAAAGTGGCC CAAATCCATG GCATTGTCTT AGTACCCTGC 
251 AAAGGAAGCC TGTCGAGCAG CATCCAGTCT ACTTGTCAGT TTGAGTCCTA 
301 CATTTTGATA CCTGTGGAAG AGCATTTTCA GACCTTAAAT GGAAAGGATG 
351 TCTTTATTCA AGGGAACAGG ATTAAATTAG GAGCTGGTTT TGCCTGTCTT 
401 CTCTCAGTGC CCATTCTCTT TGAAGAAACT TTCTACAATG AAAAAGAAGA 
451 GAGTTTCAGC ATCCTGTGTA TAGCCCATCC TTTGGAAAAG AGAGAGAGTT 
501 CAGAAGAGCC TTTGGCACCC TCAGATCCCT TTTCCCTGAA AACCATTGAA 
551 GATGTGAGAG AGTTCTTGGG AAGACACTCC GAGCGATTTG ACAGGAACAT 
601 CGCCTCTTTC CATCGAACAT TCCGAGAATG CGAGAGAAAG AGCCTCCGTC 
651 ACCACATAGA CTCAGCGAAT GCTCTCTACA CCAAATGCCT CCAGCAGCTT 
701 CTGAGGGACT CTCACCTGAA AATGCTCGCC AAGCAGGAGG CCCAGATGAA 
751 CCTGATGAAG CAGGCAGTGG AGATATACGT CCATCATGAA ATTTACAACC 
801 TGATCTTTAA ATACGTGGGG ACCATGGAGG CAAGTGAGGA TGCGGCCTTT 
851 AACAAAATCA CAAGAAGCCT TCAAGATCTT CAGCAGAAAG ATATTGGTGT 
901 GAAACCGGAG TTCAGCTTTA ACATACCTCG TGCCAAAAGA GAGCTGGCTC 
951 AGCTGAACAA ATGCACCTCC CCACAGCAGA AGCTTGTCTG CTTGCGAAAA 
1001 GTGGTGCAGC TCATTACACA GTCTCCAAGC CAGAGAGTGA ACCTGGAGAC 
1051 CATGTGTGCT GATGATCTGC TATCAGTCCT GTTATACTTG CTTGTGAAAA 
1101 CGGAGATCCC TAATTGGATG GCAAATTTGA GTTACATCAA AAACTTCAGG 
1151 TTTAGCAGCT TGGCAAAGGA TGAACTGGGA TACTGCCTGA CCTCATTCGA 
1201 AGCTGCCATT GAATATATTC GGCAAGGAAG CCTCTCTGCT AAACCCCCTG 
1251 AGTCTGAGGG ATTTGGAGAC AGGCTGTTCC TTAAGCAGAG AATGAGCTTA 
1301 CTCTCTCAGA TGACTTCGTC TCCCACCGAC TGCCTGTTTA AGCACATTGC 
1351 ATCAGGTAAC CAGAAAGAAG TGGAGAGACT TCTGAGCCAA GAGGACCATG 
1401 ATAAAGATAC CGTCCAAAAG ATGTGTCACC CTCTCTGCTT CTGCGATGAC 
1451 TGTGAGAAAC TCGTCTCTGG GAGGTTGAAT GATCCCTCAG TTGTCACTCC 
1501 ATTCTCCAGA GACGACAGGG GGCACACCCC TCTCCATGTG GCTGCTGTCT 
1551 GTGGGCAGGC ATCCCTCATC GACCTCCTGG TTTCCAAGGG CGCCATGGTA 
1601 AATGCCACAG ACTACCATGG GGCCACTCCG CTCCACCTGG CCTGTCAGAA 
1651 GGGCTACCAG AGCGTGACGC TGCTGCTGCT GCACTACAAG GCCAGCGCGG 
1701 AAGTGCAGGA CAACAATGGG AATACGCCAC TCCACCTGGC CTGCACCTAC 
1751 GGCCACGAGG ACTGTGTGAA GGCTCTGGTT TACTACGACG TGGAGTCGTG 
1801 CAGACTTGAC ATTGGCAATG AGAAAGGAGA CACCCCTCTA CACATTGCTG 
1851 CCCGCTGGGG CTACCAAGGC GTCATAGAGA CATTGCTGCA GAACGGAGCG 
1901 TCCACCGAGA TCCAGAACAG ACTGAAGGAG ACGCCCCTCA AGTGTGCATT 
1951 AAACTCAAAG ATTCTGTCTG TAATGGAAGC CTATCACCTG TCCTTCGAGA 
2001 GGAGGCAGAA GTCGTCCGAG GCCCCTGTGC AGTCCCCGCA GCGCTCCGTG 
2051 GACTCCATCA GCCAAGAGTC CTCCACTTCC AGCTTCTCCT CCATGTCAGC 
2101 CGGCTCAAGG CAGGAGGAGA CCAAGAAGGA CTACAGAGAG GTAGAAAAAC 
2151 TTTTGAGAGC AGTTGCTGAT GGAGATCTAG AAATGGTGCG TTACCTGTTG 
2201 GAATGGACAG AGGAGGACCT GGAGGATGCG GAGGACACTG TCAGTGCAGC 
2251 AGACCCCGAA TTCTGTCACC CGTTGTGCCA GTGCCCCAAG TGTGCCCCAG 
2301 CTCAGAAGAG GCTGGCGAAG GTTCCTGCCA GTGGGCTTGG TGTGAACGTG 
2351 ACCAGCCAGG ACGGCTCCTC CCCGCTGCAT GTCGCCGCCC TGCACGGCCG 
2401 GGCGGACCTC ATCCGCCTCC TGCTGAAGCA CGGGGCCAAC GCAGGTGCCA 
2451 GGAACGCAGA CCAAGCCGTC CCGCTCCACC TGGCCTGCCA GCAGGGCCAC 
2501 TTTCAGGTGG TGAAGTGTCT GTTAGATTCG AATGCAAAAC CCAATAAGAA 
2551 GGACCTCAGT GGAAACACGC CCCTCATTTA CGCCTGCTCC GGTGGCCATC 
2601 ACGAGCTTGT GGCACTGCTG CTACAGCACG GGGCCTCCAT TAACGCTTCT 
2651 AACAATAAGG GCAACACAGC GCTGCACGAG GCTGTGATTG AAAAGCACGT 
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2701 CTTCGTGGTA GAGCTGCTTC TGCTCCACGG AGCGTCAGTT CAGGTGCTGA 
2751 ACAAGCGGCA GCGCACGGCT GTAGACTGTG CTGAACAGAA TTCAAAAATA 
2801 ATGGAATTGC TTCAGGTGGT ACCAAGCTGT GTTGCTTCAT TAGATGATGT 
2851 GGCTGAAACT GACCGCAAGG AGTATGTCAC TGTTAAGATC AGGAAAAAAT 
2901 GGAACTCAAA ACTGTATGAT CTACCAGATG AGCCTTTTAC AAGACAGTTT 
2951 TACTTTGTCC ACTCAGCTGG TCAGTTTAAG GGAAAGACTT CAAGGGAGAT 
3001 TATGGCAAGA GATAGAAGTG TCCCTAATTT AACCG7VAGGT TCTTTGCATG 
3051 AGCCAGGGAG GCAAAGTGTC ACACTGAGAC AGAATAACCT GCCAGCTCAG 
3101 AGTGGATCTC ATGCTGCTGA GAAAGGCAAC AGCGACTGGC CAGAGAGGCC 
3151 TGGACTGACA CAGACTGGCC CTGGACACAG ACGGATGCTG CGGAGACACA 
3201 CGGTAGAGGA TGCGGTCGTG TCCCAGGGCC CGGAGGCTGC TGGCCCCCTC 
3251 TCCACTCCCC AAGAGGTTAG TGCTTCCCGG TCCTAACAGG AATGAGGAGT 
3301 TGTTGAACCC ACTGCTAGGA AGCAAGGATG CAACAAGATG ATGCTGAGCG 
3351 TGAACACATC TGAGAACTAA ATGTGCTTCC ATGAGACTGG CTTGAGAAGT 
3401 CTTCAGCACC AAGTTCCTGA AAGCTTTTCT GTGGCAGGAA AGAATGCAAC 
34 51 AAAAAAGTTA ACCACCACCA TCTCTCTCCT CTTCAAAGCT AATGAATACA 
3501 ATTGAAACAG ACAAAAATTC CAGTAGCATC CAGATCCTTA AGCCAGAGGT 
3551 GCATGCTTCT TTTTAAGTAT GAGGGTTTGT TGGTCACAGT GGGAGAGGTT 
3601 TCACCACCGC ATTCTGACCT CCTCCTCCCA AAAGGTGCTA AACCTCTCTG 
3651 ACCTGTGTAC ATTCACAAAC CACAGCTAGA ATTCCTCCAC CTAGGATTAA 
3701 GCTGGAGAGA AGTAAGTAAT TTAGGTTTCA TGGTACTGTA GAGGCCAGGC 
3751 TGAAATGTCA TATCTGAAGG AAGAAAGCAG CAGCTGGACA ATGTTTCTTT 
3801 GCAAAGCAAC ACTCGAACCA AAAGATGCCT CAATCCCATT TTGATATTCA 
3851 TTTTAGTGAA AGGATGCATC AGACCTGTTC CACATCATGC ACATGGGAAA 
3901 GGGTGGTTAT CATTTTCCTT CTAACAAGTA GGTACAGATA TTCGGTTACT 
3951 ACACGTGCAC CTGTAGCAGT ATTTCTAGAA ACATCCCTTT TTGTTGAGAA 
4001 CCTCCCTTGA ATGTCTGTCA CACTCACACC TGACGGGATG GTTACTGGAT 
4051 TAGAGAGTAG ATTTGGCACA TCTTTTCTTA GTCTTTTGAT TCAAATTCAA 
4101 AACTTAACAG CACAAACCAG GTCAGAGTTA CTTTCGGTTA GAATTTATTG 
4151 CCATTTATTC CTTTTTATAA ATTTCTATAG ATTATACTGT TATTTTTATG 
4201 TTATTGGCCT AGAGCTACAC GTATATGGGT TTGTCCTGAG TCCGTTTTCA 
4251 AATGACCTTG TGATAGGGAA ATGGTTTTGT CCATGTTCTT GGAAATACTT 
4 301 GTGTATGTAC AGAAGGAAGG GAGGGATTAT TTTTCTACAA AGTAATTTAT 
4351 GATTTCTAAT TTTCTAATGT GCCTTGGATA TGTGCCAAAT GATGGAAAAG 
4401 AAACAGTAAA CTTTATGATT CTTAAAAAAA AAAAAAAAAA AAAAAAAAAA 
4451 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAG 
4501 G 


BLAST Results 


No BLAST result 


"Medline entries 


No Medline entry 


Peptide information for frame 2 


ORF from 134 bp to 3283 bp; peptide length: 1050 
Category: similarity to known protein 
Classification: Cell structure/motility 
Prosite motifs: ATP_GTP A (945-953) 


1 MALYDEDLLK NPFYLALQKC 
51 CQFESYILIP VEEHFQTLNG 
101 YNEKEESFSI LCIAHPLEKR 
151 RFDRNIASFH RTFRECERKS 
201 QEAQMNLMKQ AVEIYVHHEI 
251 QKDIGVKPEF SFNIPRAKRE 
301 RVNLCTMCAD DLLSVLLYLL 
351 CLTSFEAAIE YIRQGSLSAK 
401 LFKHIASGNQ KEVERLLSQE 
4 51 PSWTPFSRD DRGHTPLHVA 
501 HLACQKGYQS VTLLLLHYKA 
551 YDVESCRLDI GNEKGDTPLH 
601 PLKCALNSKI LSVMEAYHLS 
651 FSSMSAGSRQ EETfCKDYREV 
701 DTVSAADPEF CHPLCQCPKC 
751 AALHGRADLI RLLLKHGANA 
801 AKPNKKDLSG NTPLIYACSG 
851 VIEKHVFWE LLLLHGASVQ 


RPDLCSKVAQ IHGIVLVPCK GSLSSSIQST 

KDVFIQGNRI KLGAGFACLL SVPILFEETF 

ESSEEPLAPS DPFSLKTIED VREFLGRHSE 

LRHHIDSANA LYTKCLQQLL RDSHLKMLAK 

YNLIFKYVGT MEASEDAAFN KITRSLQDLQ 

LAQLNKCTSP QQKLVCLRKV VQLITQSPSQ 

VKTEIPNWMA NLSYIKNFRF SSLAKDELGY 

PPESEGFGDR LFLKQRMSLL SQMTSSPTDC 

DHDKDTVQKM CHPLCFCDDC EKLVSGRLND 

AVCGQASLID LLVSKGAMVN ATDYHGATPL 

SAEVQDNNGN TPLHLACTYG HEDCVKALVY 

lAARWGYQGV lETLLQNGAS TEIQNRLKET 

FERRQKSSEA PVQSPQRSVD SISQESSTSS 

EKLLRAVADG DLEMVRYLLE WTEEDLEDAE 

APAQKRLAKV PASGLGVNVT SQDGSSPLHV 

GARNADQAVP LHLACQQGHF QWKCLLDSN 

GHHELVALLL QHGASINASN NKGNTALHEA 

VLNKRQRTAV DCAEQNSKIM ELLQVVPSCV 
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901 ASLDDVAETD RKEYVTVKIR KKWNSKLYDL PDEPFTRQFY FVHSAGQFKG 
951 KTSREIMARD RSVPNLTEGS LHEPGRQSVT LRQNNLPAQS GSHAAEKGNS 
1001 DWPERPGLTQ TGPGHRRMLR RHTVEDAVVS QGPEAAGPLS TPQEVSASRS 

BLASTP hits 

No BLASTP hits available 


Alert BLASTP hits for DKF2phtes3_1817, frame 2 

TREMBL:HSU43965_1 gene: -ANK3"; product: "ankyrin G119"; Human ankyrin 
G119 (ANK3) RiRNA, complete cds., N = 2, Score = 287, p = 3.7e-21 

PIR: 149502 ankyrin - mouse, N - 3, Score « 365, P « 2.2e-27 

TREMBL:HSANKY_2 product: "alt. ankyrin (variant 2.2)"; Human mRNA for 

ankyrin (variant 2.1), N « 2, Score « 380, P » 7.3e-31 

SWISSPR0T:ANK1_H0MAN ANKYRIN R (ANKYRINS 2.1 AND 2.2) (ERYTHROCYTE 
ANKYRIN)., N = 2, Score = 380, P = 8.2e-31 

PIR:SJHUK ankyrin 1, erythrocyte splice form 1 - human, N « 2, Score = 
380, P « 8.2e-31 


>TREMBL:HSANKY_2 product: "alt. ankyrin (variant 2.2)"; Human mRNA for 
ankyrin (variant 2.1) 
Length = 1,719 

HSPs : 

Score = 380 (57.0 bits). Expect = 7.3e-31, Sum P(2) = 7.3e-31 
Identities = 139/447 (31%), Positives = 207/447 (46%) 


Query: 462 RGHTPLHVAAVCGQASLIDLLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHYKAS 521 

+G+T LH+AA+ GQ ++ LV+ GA VNA G TPL++A Q-h + V LL A+ 
Sbjct: 77 KGNTALHIAALAGQDEWRELVNYGANVNAQSQKGFTPLYMAAQENHLEVVKFLLENGAN 136 

Query: 522 AEVQDNNGNTPLHLACTYGHEDCVKALVYYDVES-CRL 558 

V +G TPL +A GHE+ V L+ Y + RL 
Sbjct: 137 QNVATEDGFTPLAVALQQGHENWAHLINYGTKGKVRLPALHIAARNDDTRTAAVLLQND 196 

Query: 559 — DIGNEKGDTPLHIAARWGYQGVIETLLQNGASTEIQNRLKETPLKCALNSKILSVME 615 

D+ ++ G TPLHIAA + V + LL GAS + TPL A S+ +V+ 

Sbjct: 197 PNPDVLSKTGFTPLHIAAHYENLNVAQLLLNRGASVNFTPQNGITPLHIA— SRRGNVIM 254 

Query: 616 AYHLSFERRQKSSEAPVQSPQRSVDSISQESSTS-SFSSMSAGSR-QEETKKDYREVEKL 673 

L +R + E + + ++ S + G+ Q +TK + 

Sbjct: 255 V-RLLLDRGAQI-ETKTKDELTPLHCAARNGHVRISEILLDHGAPIQAKTKNGLSPIHM- 311 

Query: 674 LRAVADGD-LEMVRYLLEWTEEDLEDAEDTVSAADPEFCHPLCQCPKCAPAQKRLAKVPA 732 

A GD L+ VR LL++ E ++D T+ P H C R+AKV 
Sbjct: 312 — -AAQGDHLDCVRLLLQYDAE-IDDI— TLDHLTP— LHVAAHC GHHRVAKVLL 358 

Query: 733 S-GLGVNVTSQDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQGHFQ 791 

G N + +G +PLH+A ++ LLLK GA+ A PLH+A GH 

Sbjct: 359 DKGAKPNSRALNGFTPLHIACKKNHVRVMELLLKTGASIDAVTESGLTPLHVASFMGHLP 418 

Query: 792 VVKCLLDSNAKPNKKDLSGNTPLIYACSGGHHELVALLLQHGASINASNNKGNTALHEAV 851 

+VK LL A PN ++ TPL A GH E+ LLQ+ A +NA T LH A 

Sbjct: 419 IVKNLLQRGASPNVSNVKVETPLHMAARAGHTEVAKYLLQNKAKVNAKAKDDQTPLHCAA 478 

Query: 852 lEKHVFWELLLLHGASVQVLNKRQRTAVDCAEQNSKIMELLQW 896 
H +V+LLL + A+ + T + A + + +L ++ 

Sbjct: 479 RIGHTNMVKLLLENNANPNLATTAGHTPLHIAAREGHVETVLALL 523 

Score « 378 (56.7 bits). Expect » 1.2e-30, Sum P{2) = 1.2e-30 
Identities = 130/447 (29%), Positives « 195/447 (43%) 

465 TPLHVAAVCGQASLIDLLVSKGAMVHATDYHGATPLHLACQKGYQSVTLLLLHYKASAEV 524 

TPLH AA G + ++L+ GA + A +G +P+H+A Q + LLL Y A + 

274 TPLHCAARNGHVRISEILLDHGAPIQAKTKNGLSPIHMAAQGDHLDCVRLLLQYDAEIDD 333 

Query: 525 QDNNGNTPLHLACTYGHEDCVKALVYYDVE SCR 557 

+ TPLH+A GH K L+ + +c+ 
Sbjct: 334 ITLDHLTPLHVAAHCGHHRVAKVLLDKGAKPNSRALNGFTPLHIACKKNHVRVMELLLKT 393 

Query: 558 — LDIGNEKGDTPLHIAARWGYQGVIETLLQNGASTEIQNRLKETPLKCALNSKILSVM 614 

+D EG TPLH+At- G+ +++ LLQ GAS + N ETPL A + V 
Sbjct: 394 GASIDAVTESGLTPLHVASFMGHLPIVKNLLQRGASPNVSNVKVETPLHMAARAGHTEVA 453 


Query: 
Sbjct: 
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Query. 

615 

EAYHLSFERRQKSSEAPVQSPQRSVDSISQESSTSSFSSMSAGSRQEETKKDYREVEKLL 

a 

O IH 



+ Y L + + + Q+p I + +A T L 


Sbjct: 

454 

K-YLLQNKAKVNAKAKDDQTPLHCAARIGHTNMVKLLLENNANPNLATTAGH TPLH 

508 

Query. 

675 

RAVADGDLEMVRYLLEWTEEDLEDAEDTVSAADPEFCHPLCQCPKCAPAQKRLAKVPASG 

734 



A +G +E V LLE ++AT PH+KA+Ii + 


Sbjct: 

509 

lAAREGHVETVLALLE KEASQACMTKKGFTP--LHVAAKYGKVRVAELLLER D 

559 

Query : 

735 

LGVNVTSQDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQGHFQVVK 

794 



N ++G +PLHVA H D+++LLL G + + + PLH+A +Q +V + 


Sbjct: 

560 AHPNAAGKNGLTPLHVAVHHNNIiDI VKLLLPRGGS PHS PAWNGYT PLH I AAKQNQVEVAR 

619 

Query: 

795 

CLLDSNAKPNKKDLSGNTPLIYACSGGHHELVALLLQHGASINASNNKGNTALHEAVIEK 

854 



LL N + + G TPL A GH E+VALLL A+ N N G T LH E 


Sbjct: 

6i20 

SLLQYGGSANAESVQGVTPLHLAAQEGHAEMVALLLSKQANGNLGNKSGLTPLHLVAQEG 

679 

Query: 

855 

HVFVVELLLLHGASVQVLNKRQRTAVDCAEQ— NSKIMELL 893 




HV V ++L+ HG V + T + A N K+++ L 


Sbjct: 

680 

HVPVADVLIKHGVMVDATTRMGYTPLHVASHYGNIKLVKFL 720 


Score 

= 367 

(55.1 bits). Expect = 1.8e-29, Sum P(2) = 1.8e-29 


Identities = 

= 131/489 (26%), Positives = 210/489 (42%) 


Query: 

404 

iilAiJ — bNyjvLVERijl»SQEDHUKDTVQKMCHPL-CFCDDCEKLVSGRLNDPSW 

460 



HIAS GN V LL + + + PL C + +S L D ++ 


Sbjct: 

244 

H I ASRRGKV IMVRL LLDRGAQI ETKTKDELT PLHC AARNGH VRI SEI LLDHGAPI Q- AKT 

302 

Query: 

461 

UKUn 1 rLH VAAVCGUASiil DLLVSKGAMVNATDYHGATPLHLA 

520 



G +P+H+AA + LL+ A ++ TPLH+A G+ V +LL A 


Sbjct: 

303 

KNGLSPI HMAAQGDHLDCVRLLLQYDAEI DDITLDHLTPLHVAAHCGHHRVAKVLLDKGA 

362 

Query: 

521 

SAEVQDNNGNTPLHLACTYGHEDCVKALVYYDVESCRLDIGNEKGDTPLHIAARWGYQGV 

580 



+ NG TPLH+AC H ++ L+ +D E G TPLH+A+ G+ + 


Sbjct: 

363 

KPNSRALNGFTPLHIACKKNHVRVMELLLK TGASIDAVTESGLTPLHVASFMGHLPI 

419 

Query: 

581 

lETLLQNGASTEIQNRLKETPLKCAL NSKILSVMEAYHLSFERRQKSSEAPVQSPQR 

637 



++ LLQ GAS + N ETPL A ++++ + + K + P+ R 


Sbjct: 

420 

VKNLLQRGASPNVSNVKVETPLHMAARAGHTEVAKYLLQNKAKVNAKAKDDQTPLHCAAR 

479 

Query: 

638 

SVDSISQESSTSSFSSMSAGSRQEETKKDYREVEKLLRAVADGDLEMVRYLLEWTE 

693 



++ + E++ + + +AG VE +L + + +T 


Sbjct: 

480 

I GHTNMVKLLLENNAN PNLATTAGHTPLH I AAREGH VETVLALLEKEASQACMTKKGFTP 

539 

Query: 

694 

EDLEDAEDTVSAAD PEFCHPLCQ CP-KCAPAQKRLAKVPA SGLGVNVTS 

741 



+ VA+ HP PALV G+ + 


Sbjct: 

540 

LHVAAKYGKVRVAELLLERDAHPNAAGKNGLTPLHVAVHHNNLDIVKLLLPRGGSPHSPA 

599 

Query; 

742 

QDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQGHFQVVKCLLDSNA 

801 



+G +PLH+AA + ++ R LL++G +A A + PLHLA Q+GH ++V LL A 


Sbjct: 

600 

WNGYTPLHIAAKQNQVEVARSLLQYGGSANAESVQGVTPLKLAAQEGHAEMVALLLSKQA 

659 

Query: 

802 

KPNKKDLSGNTPLIYACSGGHHELVALLLQHGASINASNNKGNTALHEAVIEKHVFVVEL 

861 



N + SG TPL GH + +L++HG ++A+ G T LH A ++ +V+ 


Sbjct: 

660 

NGNLGNKSGLTPLHLVAQEGHV PVADVLI KHGVMVDATTRMGYT PLHVASH YGNI KLVKF 

719 

Query: 

862 

LLLHGASVQVLNK 874 




LL H A V K 


Sbjct: 

720 

LLQHQADVNAKTK 732 


Score 

= 345 

(51.8 bits). Expect = 4.2e^27, Sum P(2) - 4.26-27 



Identities - 146/506 (28%), Positives > 233/506 (46%) 

Query: 404 HIAS—GNQKEVERLLSQEDHDKDTVQK MCHPLCFCDDCEKLVSGRLNDPSVVTPFS 458 

H+AS G+ K V LL +E + T +K H +++V +N + V + 

Sbjct: 50 HLASKEGHVKMVVELLHKEIILETTTKKGNTALHIAALAGQ-DEVVRELVNYGANVN— A 106 

Query: 459 RDDRGHTPLHVAAVCGQASLIDLLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHY 518 

+ +G TPL++AA ++• L+ GA N G TPL +A Q+G+++V L++Y 

Sbjct: 107 QSQKGFTPLYMAAQENHLEVVKFLLENGANQNVATEDGFTPLAVALQQGHENVVAHLINY 166 

Query: 519 KASAEVQDNNGNTP-LHLACTYGHEDCVKALVYYDVESCRLDIGNEKGDTPLHIAARWGY 577 

+V+ P LH+A ++D A V + D+ ++ G TPLHIAA + 

Sbjct: 167 GTKGKVR LPALHIAAR — NDDTRTAAVLLQNDP-NPDVLSKTGFTPLHIAAHYEN 218 

Query: 578 QGVIETLLQNGASTEIQNRLKETPLKCAL— NSKILSVMEAYHLSFERRQKSSEAPVQS 634 

V + LL GAS + TPL A N ++ ++ E + K P+ 

Sbjct: 219 LNVAQLLLNRGASVNFTPQNGITPLHIASRRGNVIMVRLLLDR<3AQIETKTKDELTPLHC 278 

Query: 635 PQRSVDSISQESSTSSFSSMSAGSRQEETKKDYREVEKLLRAVADGD-LEMVRYLLEWTE 693 

R+ E + + A +TK + A GD L+ VR LL++ 

Sbjct: 279 AARNGHVRISEILLDHGAPIQA KTKNGLSPIHM AAQGDHLDCVRLLLQYDA 329 


643 


wo 01/12659 


PCT/IBOO/01496 


Query: 

694 

Sbjct : 

330 

Query: 

730 

Sbjct: 

389 

Query: 

789 

Sbjct: 

449 

Query: 

849 

Sbjct: 

509 

Score 

= 243 

Identities = 

Query: 

404 

Sbjct: 

541 


EDLEDAE-DTVSAAD-PEFC—HPLCQC PK CAPAQKRLAK 729 

E ++D D++ CH++ P C R+ + 


+G ++ ++ G +PLHVA+ G +++ LL+ GA+ N PLH+A + G 


H -t-V K LL + AK N K TPL A GH +V LLL++ A+ N + G+T LH 


A E HV V LL AS + K+ T + A + K+ ELL 


■■ 64/199 (32%), Positives = 97/199 (48%) 

HIAS—GNQKEVERLLSQEDHDKDTVQKMCHPLCFCDDCEKLVSGRLNDPSVVTPFSRDD 461 
H+A+ G + E LL ++ H + PL L +L P +P S 

HVAAKYGKVRVAELLLERDAHPNAAGKNGLTPLHVAVHHNNLDIVKLLLPRGGSPHSPAW 600 

Query: 462 RGHTPLHVAAVCGQASLIDLLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHYKAS 521 

G+TPLH+AA Q + L+ G NA G TPLHLA Q+G+ + LLL +A+ 
Sbjct: 601 NGYTPLHIAAKQNQVEVARSLLQYGGSANAESVQGVTPLHLAAQEGHAEMVALLLSKQAN 660 

Query: 522 AEVQDNNGNTPLHLACTYGHEDCVKALVYYDVESCRLDIGNEKGDTPLHIAARWGYQGVI 581 

+ + +G TPLHL GH L+ + V +D G TPLH+A+ +G ++ 

Sbjct: 661 GNLGNKSGLTPLHLVAQEGHVPVADVLIKHGV— MVDATTRMGYTPLHVASHYGNIKLV 717 

Query: 582 ETLLQNGASTEIQKRLKETPL 602 

+ LLQ+ A + +L +PL 
Sbjct: 718 KFLLQHQADVNAKTKLGYSPL 738 

Score * 242 (36,3 bits). Expect » 5.0e-29, Sum P(2) « 5.0e-29 
Identities « 63/176 (35%), Positives = 92/176 (52%) 

Query: 734 GLGVNVTSQDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQGHFQW 793 

G VN T Q+G +PLH+A+ G ++RLLL GA + D+ PLH A + GH ++ 
Sbjct: 229 GASVNFTPQNGITPLHIASRRGNVIMVRLLLDRGAQIETKTKDELTPLHCAARNGHVRIS 288 

Query: 794 KCLLDSNAKPNKKDLSGNTPLIYACSGGHHELVALLLQHGASINASNNKGNTALHEAVIE 853 

+ LLD A K +G +P+ A G H + V LLLQ+ A 1+ T LH A 

Sbjct: 289 EILLDHGAPIQAKTKNGLSPIHMAAQGDHLDCVRLLLQYDAEIDDITLDHLTPLHVAAHC 348 

Query: 854 KHVFWELLLLHGA — SVQVLNKRQRTAVDCAEQNSKIMELLQVVPSCVASLDDVAET 909 

H V ++LL GA+ + LN +C + + ++MELL AS+D V E+ 

Sbjct: 349 GHHRVAKVLLDKGAKPNSRALNGFTPLHIACKKNHVRVMELLLKTG ASIDAVTES 403 

Score = 242 (36.3 bits). Expect = 3,3e-14, Sum P(2) = 3.3e-14 
Identities = 80/284 (28%), Positives = 129/284 (45%) 

Query: 404 HIAS— GNQKEVERLLSQEDHDKDTVQKMCHPLCFCDDCEKLVSGRLNDPSWTPFSRDD 461 

HIA+ G+ + V LL +E +K PL K+ L P + 

Sbjct: 508 HIAAREGHVETVLALLEKEASQACMTKKGFTPLHVAAKYGKVRVAELLLERDAHPNAAGK 567 

Query: 462 RGHTPLHVAAVCGQASLIDLLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHYKAS 521 

G TPLHVA ++ LL+ +G ++ ++G TPLH+A ++ V LL Y S 

Sbjct: 568 NGLTPLHVAVHHNNLDIVKLLLPRGGSPHSPAWNGYTPLHIAAKQNQVEVARSLLQYGGS 627 

Query: 522 AEVQDNNGNTPLHLACTYGHEDCVKALVYYDVESCRLDIGNEKGDTPLHIAARWGYQGVI 581 

A + G TPLHLA GH + V L+ ++GN+ G TPLH+ A+ G+ V 

Sbjct: 628 ANAESVQGVTPLHLAAQEGHAE^5VALLLSKQANG— NLGNKSGLTPLHLVAQEGHVPVA 684 

Query: 582 ETLLQNGASTEIQNRLKETPLKCAL— NSKILSVMEAYHLSFERRQKSSEAPV-QSPQR 637 

+ L+++G + R+ TPL A N K++ + + + K +P+ Q+ Q+ 

Sbjct: 685 DVLIKHGVMVDATTRMGYTPLHVASHYGNIKLVKFLLQHQADVNAKTKLGYSPLHQAAQQ 744 

Query: 638 S-VDSISQ--ESSTSSFSSMSAGSRQEETiCK— DYREVEKLLRAVAD 679 

D ++ ++ S S G+ K Y V +L+ V D 

Sbjct: 745 GHTDIVTLLLKNGASPNEVSSDGTTPLAIAKRLGYISVTDVLKWTD 791 

Score - 235 (35.3 bits). Expect = 7.9e-34, Sum P(2) = 7.9e-34 
Identities = 58/165 (35%), Positives = 83/165 (50%) 

Query: 734 GLGVNVTSQDGSSPLHVAALHGRADLIRLLLKHGANAGARNAOQAVPLHLACQQGHFQW 793 

G N S G ♦PLH+AA G A+++ LLL AN N PLHL Q+GH V 

Sbjct: 625 GGSANAESVQGVTPLHLAAQEGHAEMVALLLSKQANGNLGNKSGLTPLHLVAQEGHVPVA 684 


644 


wo 01/J2659 


PCT/IBOO/01496 


Query: 

794 

KCLLDSNAKPNKKDLSGNTPLIYACSGGHHELVALLLQHGASINASNNKGNTALHEAVIE 

853 



L+ + G TPL A G+ +LV LLQH A +NA G + LH+A + 


Sbjct: 

685 

DVLIKHGVMVDATTRMGYTPLHVASHYGNIKLVKFLLQHQADVNAKTKLGYSPLHQAAQQ 

744 

Query: 

854 

KHVFVVELLLLHGASVQVLNKRQRTAVDCAEQNS — KIMELLQVV 896 




H +V LLL +GAS ++ T + A++ + ++L+VV 


Sbjct: 

745 

GHTDIVTLLLKNGASPNEVSSDGTTPLAIAKRLGYISVTDVLKW 789 


Score 

= 233 

(35.0 bits). Expect « 7.9e-34, Sum P(2) « 7.9e-34 


Identities ' 

= 67/202 (33%), Positives 100/202 (49%) 


Query: 

404 

UTHc^nMntfPi/cor t cmrnunvrvpi/rktrMr'u— — dt r'cr'nrv—nrT wc^dt kinDCUXT^DPCD 
nino'^uNUnwVCiKl^ijaUC'L'IlUJ^Lri VUJ\nL.n''~rJjCc^UIA«~Cil\JjVoUC\ijNUlroVVi rt bK 

405f 



H+A+ G+ + RLL QD + D+ +HPL C V+LD PSR 


Sbjct: 

310 

HMAAQGDHLDCVRLLLQYDAEIDDIT-LDHLTPLHVAAHCGHHRVAKVLLDKGA-KPNSR 

367 

Query: 

4 60 

DOKGHT rl*n VAAVCGQASLI DLLVSKGAMVNATDYHGAT PLHLACQKG Y QS VTLLLLH YK 

519 



G TPLH+A +++LL+ GA ++A G TPLH+A G+ + LL 


Sbjct: 

368 

ALNGFTPLH I ACKKNHVRVMELLLKTGAS I DAVTESGLTPLHVAS FMGHLP I VKNLLQRG 

427 

Query: 

520 

ASAEVQDNNGNTPLHLACTYGHEDCVKALVYYDVESCRLDIGNEKGDTPLHIAARWGYQG 

579 



AS V + TPLH+A GH + K L+ +++ + TPLH AAR G+ 


Sbjct: 

428 

ASPNVSNVKVETPLHMAARAGHTEVAKYLLQ NKAKVNAKAKDDQTPLHCAARIGHTN 

484 

Query: 

580 

VIETLLQNGASTEIQNRLKETPLKCA 605 




+++ LL+N A+ + TPL A 


Sbjct: 

485 

MVKLLLENNANPNLATTAGHTPLHIA 510 


Score 

= 226 

(33.9 bits). Expect = 7.0e-33, Sum P(2) = 7.0e-33 


Identities = 

= 53/153 (34%), Positives = 83/153 (54%) 


Query: 

743 

DGSS PLH VAALHGRAOLZ RLLLRHGANAGARNADQAVPLK L ACQQGH rQVVKCLLDSNAK 

802 



+G +PLH+AA + ++ R LL++G +A A + PLHLA Q+GH ++V LL A 


Sbjct: 

601 

NGYTPLHIAAKQNQVEVARSLLQYGGSANAESVQGVTPLHLAAQEGHAEMVALLLSKQAN 

660 

Query: 

803 

PNKKDLSGNTPLIYACSGGHHELVALLLQHGASINASNNKGNTALHEAVIEKHVFVVELL 

862 



N + SG TPL GH + +L++HG ++A+ G T LH A ++ +V+ L 


Sbjct: 

661 

GNLGNKSGLTPLHLVAQEGHVPVADVLIKHGVMVDATTRMGYTPLHVASHYGNIKLVKFL 

720 

Query: 

863 

LLHGASVQVLNKRQRTAVDCAEQ — NSKIMELL 893 




LHAV K ++ AQ ++I+LL 


Sbjct: 

721 

LQHQADVNAKTKLGYSPLHQAAQQGHTDIVTLL 753 


Score 

= 198 

(29.7 bits). Expect = 2.5e-ll, Sura P(2) = 2.5e-ll 


Identities - 51/157 (32%), Positives - 82/157 (52%) 


Query: 

737 

VNVTSQDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQGHFQVVKCL 

796 



-♦- T++ G++ LH+AAL G+ +++R L+ +GAN A++ PL++A Q+ H +VVK L 


Sbjct: 

71 

LETTTKKGNTALHIAALAGQDEVVRELVNYGANVNAQSQKGFTPLYMAAQENHLEWKFL 

130 

Query: 

797 

LDSNAKPNKKDLSGNTPLIYACSGGHHELVALLLQHGASINASNNKGNTALHEAVIEKHV 

856 



L++ AN G TPL A GH +VA L+ +G ALH A 


Sbjct: 

131 

LENGANQNVATEDGFTPLAVALQQGHENWAHLINYGTK GKVRLPALHIAARNDDT 

186 


Query: 857 FWELLLLHGASVQVLNKRQRTAVDCAE— QNSKIMELL 893 

+LL + + VL+K T + A +N + +LL 
Sbjct: 187 RTAAVLLQNDPNPDVLSKTGFTPLHIAAHYENLMVAQLL 225 

Score = 186 (27.9 bits). Expect - 6.6e-29, Sum P(2) = 6.6e-29 
Identities = 55/143 (38%), Positives « 68/143 (47%) 


Query: 

463 

GHTPLHVAAVCGQASLIDLLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHYKASA 

522 



GHTPLH+AA G + L+ K A G TPLH+A + G V LLL A 


Sbjct: 

503 

ghtplhiaareghvetvlallekeasqacmtkkgftplhvaakygkvrvaelllerdahp 

562 

Query: 

523 

EVQDNNGNTPLHLACTYGHEDCVKALVYYDVESCRLDIGNEKGDTPLHIAARWGYQGVIE 

582 



NG TPLH+A + + D VK L+ S N G TPLHIAA+ V 


Sbjct: 

563 

naagkngltplhvavhhnnldivklllprg-gsphspawn—gytplhiaakqnqvevar 

619 

Query: 

583 

TLLQNGASTEIQNRLKETPLKCA 605 




+LLQ G S ++ TPL A 


Sbjct: 

620 

SLLQYGGSANAESVQGVTPLHLA 642 


Score 

= 182 

(27.3 bits), Expect = 2.9e-28, Sum P(2) = 2.9e-28 


Identities = 

= 54/185 (29%), Positives = 89/185 (48%) 


Query: 

738 

nvtsqdgssplhvaalhgradlirlllkhganagarnadqavplhlacqqghfqvvkcll 

797 



N+ ++ G +PLH+ AG + +L+KHG A PLH+A G+ ++VK LL 


Sbjct: 

662 

nlgnksgltplhlvaqeghvpvadvlikhgvmvdattrmgytplhvashygniklvkfll 

721 

Query: 

798 

dsnakpnkkdlsgntpliyacsgghhelvalllqhgasinasnnkgntalheaviekhvf 

857 



A N K G +PL A GH ++V LLL++GAS N ++ G T L A ++ 




645 



wo 01/12659 


PCT/fBOO/01496 


Sbjct: 722 QHQADVNAKTKLGYSPLHQAAQQGHTDIVTLLLKNGASPNEVSSDGTTPLAIAKRLGYIS 781 

Query: 858 VVELLLLHGASVQVLNKRQRTAVDCAEQNSKIMELLQVVPSCVASLDDVAETDRKEYVTV 917 

V ++L + V++ V+S PV+ DV+E + +E ++ 

Sbjct: 782 VTDVLKV VTDETSFVLVSDKHRMS FPETVDEILDVSEDEGEELISF 827 

Query: 918 KIRKK 922 
K ++ 

Sbjct: 828 KAERR 832 

score « 180 (27.0 bits). Expect ^ 5.0e-29, Sum P(2) « 5.0e-29 
Identities « 41/121 (33%), Positives = 67/121 (55%) 

Query: 486 GAMVNATDYHGATPLHLACQKGYQSVTLLLLHYKASAEVQDNNGNTPLHLACTYGHECCV 545 

G +N + +G LHLA ++G+ + + LLH + E GNT LH+A G ++ V 

Sbjct: 35 GVDINTCNQNGLNGLHLASKEGHVKMVVELLHKEIILETTTKKGNTALHIAALAGQDEW 94 

Query: 546 KALVYYDVESCRLDIGNEKGDTPLHIAARWGYQGVIETLLQNGASTEIQNRLKETPLKCA 605 

+ LV Y ++ ++KG TPL++AA+ + V++ LL+NGA+ + TPL A 

Sbjct: 95 RELVNY GANVNAQSQKGFTPLYMAAQENHLEVVKFLLENGANQNVATEDGFTPLAVA 151 

Query: 606 L 606 
L 

Sbjct: 152 L 152 

Score = 166 (24.9 bits). Expect = 3.4e-06, Sum P(2) = 3.4e-06 
Identities = 89/318 (27%), Positives = 140/318 (44%) 

Query: 448 LNDPSVVTPFSRDDRGHTPLHVAAVCGQASLIDLLVSKGAMVNATDYHGATPLHLACQKG 507 

L + + V ++DD+ TPLH AA G ++•»- LL+ AN G TPLH+A ++G 

Sbjct: 457 LQNKAKVNAKAKDDQ — TPLHCAARIGHTNMVKLLLENNANPNLATTAGHTPLHIAAREG 514 

Query: 508 YQSVTLLLLHYKASAEVQDNNGNTPLHLACTYGHEDCVKALVYYD 552 

+ L LL +AS G TPLH+A YG + L+ D 

Sbjct: 515 HVBTVLALLEKEASQACMTKKGFTPLHVAAKYGKVRVAELLLERDAHPNAAGKNGLTPLH 574 

Query: 553 — VESCRLDZ GNE KGDTPLHIAARWGYQGVIETLLQMGASTEIQNRL 597 

V LDI G+ G TPLHIAA+ V +LLQ G S ++ 

Sbjct: 575 VAVHHNNLDIVKLLLPRGGSPHSPAWNGYTPLHIAAKQNQVEVARSLLQYGGSANAESVQ 634 

Query: 598 KETPLKCALNSKILSVMEAYHLSFERRQKSSEAPVQSPQRSVDSISQESSTSSFSSM-SA 656 

TPL A M A LS +Q + +S + ++QE + 

Sbjct: 635 GVTPLHLAAQEGHAE-MVALLLS KQANGNLGNKSGLTPLHLVAQEGHVPVADVLIKH 690 

Query: 657 GSRQEETKKDYREVEKLLRAVADGDLEMVRYLLEWTEEDLEDAEDTVSAADPEFCHPLCQ 716 

G + T + LA G++++V++LL+ + D+ +A+ + + PL Q 

Sbjct: 691 GVMVDATTR — MGYTPLHVASHYGNIKLVKFLLQH-QADV-NAKTKLGYS PLHQ 740 

Query: 717 CPKC:APAQKRLAKVPASGLGVNVTSQDGSSPLHVA 751 

+ + + +G N S DG++PL +A 

Sbjct: 741 AAQQGHTDI-VTLLLKNGASPNEVSSDGTTPLAIA 774 

Score = 162 (24.3 bits). Expect = 1.8e-07, Sum P(2) = 1.8e-07 
Identities » 48/149 (32%), Positives = 71/149 (47%) 

Query: 737 VNVTSQDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQGHFQWKCL 796 

V D ++ TUV G 0 L++G + N + LHLA ++GH ++V L 

Sbjct: 5 VGFREADAATSFLRAARSGNLDKALDHLRNGVDINTCNQNGLNGLHLASKEGHVKMVVEL 64 

Query: 797 LDSNAKPNKKDLSGNTPLI YACSGGHHELVALLLQHGASINASNNKGNTALHEAVIEKHV 856 

L GNT LAG E+V L+ +GA++NA + KG T L+ A E H+ 

Sbjct: 65 LHKEIILETTTKKGNTALHIAALAGQDEVVRELVNYGANVNAQSQKGFTPLYMAAQENHL 124 

Query: 857 FVVELLLLHGASVQVLNKRQRTAVDCAEQ 885 

W+ LL +GA+ V + T + A Q 
Sbjct: 125 EWKFLLENGANQNVATEDGFTPLAVALQ 153 

Score = 158 (23.7 bits). Expect = 5.7e-26, Sum P(2) « 5.7e-26 
Identities = 38/135 (28%), Positives = 65/135 (48%) 

Query: 460 DDRGHTPLHVAAVCGQASLIDLLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHYK 519 

+ G LH+A+ G ++ L+ K ++ T G T LH+A G V L++Y 
Sbjct: 42 NQNGLNGLHLASKEGHVKMVVELLHKEIILETTTKKGNTALHIAALAGQDEVVRELVNYG 101 

Query: 520 ASAEVQDNNGNTPLHLACTYGHEDCVKALVYYDVESCEU*DIGNEKGDTPLHIAARWGYQG 579 

A+ Q G TPL++A H + VK L+ ++ E G TPL +A + G++ 

Sbjct: 102 ANVNAQSQKGFTPLYMAAQENHLEVVKFLLE NGANQNVATEDGFTPLAVALQQGHEN 158 

Query: 580 VIETLLQNGASTEIQ 594 

V+ L+ G +++ 
Sbjct: 159 VVAHLINYGTKGKVR 173 


646 


wo 01/12659 


PCT/IBOO/01496 


Score = 115 (17.3 bits). Expect - 1.8e-21, Sum P(2) = 1.8e-21 
Identitie. ■= 37/119 (31%), Positives - 58/119 (48%) 

Query: 497 ATPLHLACQKGYQSVTLLLLHYKASAEVQ— DNNGNTPLHLACTYGHEDCVKALVYYDVE 554 

AT A + G ++ L H + ++ + KG LHLA GH V L++ ++ 
Sbjct: 13 ATSFLRAARSG— NLDKALDHLRNGVDINTCNQNGLNGLHLASKEGHVKMVVELLHKEII 70 

Query: 555 SCRLDIGNEKGDTPLHIAARWGYQGVIETLLQNGASTEIQNRLKETPLKCALNSKILSVM 614 

L+ +KG+T LHIAA G V+ L+ GA+ Q++ TPL A L V+ 

Sbjct: 71 LETTTKKGNTALHIAALAGQDEVVRELVNYGANVNAQSQKGFTPLYMAAQENHLEW 127 

Query: 615 E 615 

Sbjct: 128 K 128 

Score = 106 (15.9 bits). Expect = 1.8e-01, Sum P(2) = 1.6e-01 
Identities = 34/121 (28%), Positives «= 54/121 (44%) 

Query: 769 NAGARNADQAVPLHLACQQGHFQWKCLLDSNAKPNKKDLSGNTPLIYACSGGHHELVAL 828 

+ GRADA A + G+ L+ N++G LA GH ++V 

Sbjct: 4 SVGFREADAATSFLRAARSGNLDKALDHLRNGVDINTCNQNGLNGLHLASKEGHVKMVVE 63 

Query: 829 LLQHGASINASNNKGNTALHEAVIEKHVFWELLLLHGASVQVLNKRQRTAVDCAEQNSK 888 

LL + + KGKTALH A + W L+ +GA+V +++ T + A Q + 

Sbjct: 64 LLHKEIILETTTKKGNTALHIAALAGQDEWRELVNYGANVNAQSQKGFTPLYMAAQENH 123 

Query: 889 I 889 
+ 

Sbjct: 124 L 124 

Score = 40 (6.0 bits). Expect - 1.6e-14, Sum P(2) = 1.6e-14 
Identities = 11/56 (19%), Positives = 23/56 (41%) 

Query: 622 ERRQKSSEAPVQSPQRSVDSISQESSTSSFSSMSAGSRQEETKKDYREVEKLLRAV 677 

+RRQ+ EVQ++ +Q+ + Q++ +K++R V 

Sbjct: 1614 DRRQQGQEEQVQEAKNTFTQVVQGNEFQNIPGEQVTEEQFTDEQGNIVTKKIIRKV 1669 

Score =» 38 (5.7 bits). Expect = 2.6e-14, Sum P(2) » 2.6e-14 
Identities = 6/12 (50%), Positives - 10/12 (83%) 

Query: 806 kdlSGNTPLIYA 817 

+D++G T L+YA 
Sbjct: 1186 EDITGTTKLVYA 1197 


Pedant information for DKFZphtes3_1817, frame 2 


Report for DKFZphtes3_1817.2 


[LENGTH) 
[MW] 
Epi) 
[HOMOLJ 
complete 
[FUNCAT] 
(FUNCAT] 
3e-12 
[FUNCAT] 


cds. 


[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
3e-08 
(FUNCAT) 
[FUNCAT) 
5e-05 
[FUNCAT] 
[FUNCAT] 
5e-05 
[ FUNCAT ] 
[FUNCAT] 
[BLOCKS] 
fSCOP] 
[EC] 
fPIRKW) 
[PIRKW] 


1050 

117013.72 
6.47 

TREMBL:DMANKY_1 product: "ankyrin"; Drosophila melanogaster ankyrin mRNA, 
2e-45 

08.19 cellular import [S. cerevisiae, YOR034cl 5e-13 
• 10.05.99 other pheromone response activities (S. cerevisiae, YDR264c] 


03.07 pheromone response, mating-type determination, sex-specific proteins 
[S. cerevisiae, YDR264c] 3e-12 

99 unclassified proteins (S. cerevisiae, YlL112w] 2e-ll 

06.13.01 cytoplasmic degradation (S. cerevisiae, YGR232w] 8e-10 

30.10 nuclear organization [S. cerevisiae, YIR033w] 2e'08 
04.05.01.07 chromatin modification [S. cerevisiae, YIR033w] 2e-08 


01.04.04 regulation of phosphate utilization 


(S. cerevisiae, YGR233c) 


08.13 vacuolar transport [S, cerevisiae, YML097cl 5e-05 

06.04 protein targeting, sorting and translocation (S. cerevisiae, YML097c] 

30.03 organization of cytoplasm [S. cerevisiae, YML097c] 5e-05 

08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YML097c] 

03.22 cell cycle control and mitosis (S. cerevisiae, YERlllc] 3e-04 
04.05.01.04 transcriptional control [S. cerevisiae, YERlllc] 3e-04 
BL00901A Cysteine synthase/cystathionine beta-synthase P-phosphate att 
dlawcb_ 1-91.3.1.2 GA binding protein (-GABP) alpha GA bindini 4e-12 
3.1.3.53 Myosin-light-chain-phosphatase le-12 
phosphotransferase le-19 
nucleus le-13 
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[PIRKW] 

potassium channel 5e-15 

[PIRKW] 

early protein 2e-13 

JPIRKWJ 

tumor suppressor le-09 

tPIRKWJ 

duplication le-14 

(PIRKW] 

tandem repeat le-19 

(PIRKW) 

heterodimer le-14 

(PIRKW) 

potassium transport 5e-15 

(PIRKW) 

cell cycle control le-10 

(PIRKWJ 

serine/threonine-specific protein Icinase le-19 

(PIRKW] 

transmembrane protein 5e-15 

(PIRKW) 

transport protein 5e-15 

(PIRKW) 

DNA binding 2e-ll 

(PIRKW) 

oncogene le-08 

(PIRKW) 

ATP lG-19 

(PIRKWJ 

protein )cinase inhibitor le-09 

[PIRKW] 

voitage-gated ion channel 5e-15 

(PIRKW) 

phosphoprotein 4e-38 

(PIRKW) 

apoptosis le-19 

(PIRKWJ 

liver 4e-09 

(PIRKW) 

integrin binding 3e-16 

(PIRKW) 

differentiation 2e-12 

(PIRKWJ 

transforming protein le-08 

(PIRKW) 

alternative splicing le-40 

(PIRKW) 

coiled coil le-14 

(PIRKW) 

peripheral membrane protein 2e-38 

[PIRKWJ 

transcription factor 4e-16 

[PIRKWJ 

transcription regulation 2e-16 

(PIRKWJ 

nucleotide binding 5e-15 

[PIRKWJ 

phosphoric monoester hydrolase le-12 

(PIRKW) 

cytos]celeton 8e-39 

(PIRKW) 

calmodulin binding le-19 

[PIRKW] 

smooth muscle le-12 

[SUPFAM] 

ankyrin le-40 

(SUPFAM) 

death-associated protein kinase le-19 

[SUPFAM] 

ankyrin repeat homology le-40 

[SUPFAM] 

protein kinase homology le-19 

(SDPFAM) 

vaccinia virus 27. 4K Hindlll-C protein homology 3e-07 

(SUPFAM) 

int-3 transforming protein le-08 

[SUPFAM] 

unassigned ankyrin repeat proteins 2e-38 

[SUPFAM) 

notch protein 2e-12 

(SUPFAM) 

fowlpox virus BamHI-0RF7 protein 2e-13 

(SUPFAM) 

rel homology 2e-ll 

(SUPFAM) 

EGF homology 2e-12 

[PROSITE] 

ATP_GTP_A 1 

[PFAM] 

Ank repeat 

(KW) 

Irregular 

(KW) 

3D 

(KW) 

LOW_COMPLEXITY 3.05 % 


SEQ MALYDEDLLKMPFYLALQKCRPDLCSKVAQIHGIVLVPCKGSLSSSIQSTCQFESYILIP 

SEG 

lawcB 

SEQ VEEHFQTLNGKDVFIQGNRIKLGAGFACLLSVPILFEETFYNEKEESFSILCIAHPLEKR 

SEG 

lawcB 

SEQ ESSEEPLAPSDPFSLKTIEDVREFLGRHSERFDRNIASFHRTFRECERKSLRHHIDSANA 

SEG 

lawcB 

SEQ LYTKCLQQLLRDSHLKMLAKQEAQMNLMKQAVEI YVHHEI YNLI FKYVGTMEASEDAAFN 

SEG 

lawcB 

SEQ KITRSLQDLQQKDIGVKPEFSFNI PRAKRELAQLNKCTSPQQKLVCLRKVVQLITQSPSQ 

SEG 

lawcB 

SEQ RVNLETMCADDLLSVLLYLLVKTEIPNWMANLSYIKNFRFSSLAKDELGYCLTSFEAAIE 

SEG xxxxxxxxxx 

lawcB 

SEQ YIRQGSLSAKPPESEGFGDRLFLKQRMSLLSQMTSSPTDCLFKHIASGNQKEVERLLSQE 

SEG 

lawcB 

SEQ DHDKDTVQKMCHPLCFCDDCEKLVSGRLNDPSWTPFSRDDRGHTPLHVAAVCGQASLID 

SEG 

lawcB 
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SEQ LLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHYKASAEVQDNNGNTPLHLACTYG 

SEG \ 

lawcB 1 ! 

SEQ HEDCVKALVYYDVESCRLDIGNEKGDTPLHIAARWGYQGVIETLLQNGASTEIQNRLKET 

SEG 

lawcB 


SEQ 
SEG 
lawcB 


PLKCALNSKILSVMEAYHLSFERRQKSSEAPVQSPQRSVDSISQESSTSSFSSMSAGSRQ 
xxxxxxxxxxxxxxxxxxxxxx . 


SEQ EETKKDYREVEKLLRAVADGDLEMVRYLLEWTEEDLEDAEDTVSAADPEFCHPLCQCPKC 

SEG 

lawcB 

SEQ APAQKRLAKVPASGLGVNVTSQDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVP 

SEG 

lawcB CHHHHHHHHHHHCCHHHHHHHHHHCCCC-CCTT^ 

SEQ LHLACQQGHFQWKCLLDSNAKPNKKDLSGNTPLIYACSGGHHELVALLLQHGASINASN 

SEG 

lawcB HHHHHHHCCHHHHHHHHHCCCTTTTCTTTTCCHHHHHHHHTTHHH^ 

SEQ NKGNTALHEAVIEKHVFVVELLLLHGASVQVLNKRQRTAVDCAEQNSKIMELLQWPSCV 

SEG 

lawcB TTTEEHHHHHHHHCCHHHHHHHHHHCCTTTTCBT'n'BCHHHHH^ 


SEQ ASLDDVAETDRKEYVTVKIRKKWNSKLYDLPDEPFTRQFYFVHSAGQFKGKTSREIMARD 

SEG 

lawcB !!!!!!!!!!!!!!! 

SEQ RSVPNLTEGSLHEPGRQSVTLRQNNLPAQSGSHAAEKGNSDWPERPGLTQTGPGHRRMLR 

SEG 

lawcB 

SEQ RHTVEDAWSQGPEAAGPLSTPQEVSASRS 

SEG 

lawcB ' . ' 


Prosite for DKFZphtes3_1817 .2 
PS00017 945->953 ATP_GTP_A PDOC00017 


Pfam for DKFZphtes3_1817 .2 
HMM_NAME Ank repeat 

HMM *GyTPLHIAARyNNvEMVrlLLQHGADIN* 

G+TPLH+AA ++ +.+++LL+++GA +N 
Query 463 GHTPLHVAAVCGQASLIDLLVSKGAMVN 490 

32.12 (bits) f: 496 t: 523 Target: dkfzphtesB 1817.2 similarity to ankyrins 

Alignment to HMM consensus: 
Query *GyTPLHIAARyNNvEMVrlLLQHGADIN* 
G TPLH+A++ + ++ LLL + A+ 

dkfzphtesB 496 GATPLHLACQKGYQSVTLLLLHYKASAE 523 

Q"«ry f- 529 t: 556 Target: dkf zphtes3_1817.2 similarity to ankyrins 

Alignment to HMM consensus: 
HMM *GyTPLHIAARyNNvEMVrlLLQHGADIN* 

G+TPLH+A+ Y+++++V+ L+ + 
Query 529 GNTPLHLACTYGHEDCVKALVYYDVESC 556 

42.65 (bits) f : 565 t: 592 Target: dkfzphtes3 1817,2 similarity to ankyrins 

Alignment to HMM consensus: 
Query *GyTPLHIAARyNNvEMVrlLLQHGADIN* 
G+TPLHIAAR + +++ LLQ+GA+ 

dkfzphtesS 565 GDTPLHIAARWGYQGVIETLLQNGASTE 592 

Que^y ''44 t: 771 Target: dkfzphtes3 1817.2 similarity to ankyrins 

Alignment to HMM consensus: 
HMM *GyTPLHIAARyNNvEMVrlLLQHGADIN* 

G +PLH+AA +++ +-I-+RLLL+HGA+ 
Query 744 GSSPLHVAALHGRADLIRLLLKHGANAG 771 
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36.38 (bits) f: 777 t: 804 Target: dkf 2phtes3_1817 . 2 similarity to ankyrins 

Alignment to HMM consensus: 
Query *GyTPLHIAARyNNvEMVrlLLQHGADIN* 
PLH+A+++++ ++V+ LL+ +A +N 

dkfzphtes3 777 QAVPLHLACQQGHFQVVKCLLDSNAKPN 804 

Query f: 810 t: 837 Target: dkfzphtesB 1817.2 similarity to ankyrins 

Alignment to HMM consensus: ~" 
HMM *GyTPLHIAARyNNvEMVrlLLQHGADIN* 

G+TPL++A+ ++ E+V LLLQHGA+IN 
Query 810 GNTPLIYACSGGHHELVALLLQHGASIN 837 

44.62 (bits) f: 843 t: 870 Target: dkf zphtes3_1817 . 2 similarity to ankyrins 

Alignment to HMM consensus: 
Query *GyTPLHIAARyNNvEMVrlLLQHGADIN* 
G+T+LH A+++ +V +V+LLL HGA++ 

dkfzphtes3 843 GNTALHEAVIEKHVFVVELLLLHGASVQ 870 
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group: testes derived 

DKFZphtes3_19f 19 encodes a novel 254 amino acid protein with weak similarity to S. cerevisiae 
protein ■YFL04 6w. 

The protein contains a RGD cell attachment site. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes. 


similarity to YFL046w 

localisation: 3 STS match perfect but HS1292427 matches to chromosome 4 
Sequenced by MediGenomix 

Locus: /map='*405.0/.3 cR from top of Chrll linkage group" 
Insert length: 1395 bp 

Poly A stretch at pos. 1367, no polyadenylation signal found 


1 GGGACCACGG TGGCGCCTGC GCTGGGAGGT GAGCTTGTGA CAGAGCGAAA 
51 ACTACAATTC CCAGCATTCC TGTGGTGCCA GAACTACCTT GCCCGAAAGC 
101 CTGTGCGAGA TTTACCCCGT CTTCCGCCTC CCTCCCACCG GAAAACTCTG 
151 AGGACATGAA TAGTCGCCAG GCTTGGCGGC TCTTTCTCTC CCAAGGCAGA 
201 GGAGATCGTT GGGTTTCAAG GCCCCGCGGG CATTTCTCGC CGGCCCTGCG 
251 GAGAGAGTTC TTCACTACCA CAACCAAGGA GGGATATGAT AGGCGGCCAG 
301 TGGATATAAC TCCTTTAGAA CAAAGGAAAT TAACTTTTGA TACCCATGCA 
351 TTGGTTCAGG ACTTGGAAAC TCATGGATTT GACAAAACAC AAGCAGAAAC 
401 AATTGTATCA GCGTTAACTG CTTTATCAAA TGTCAGCCTG GATACTATCT 
451 ATAAAGAGAT GGTCACTCAA GCTCAACAGG AAATAACAGT ACAACAGCTA 
501 ATGGCTCATT TGGATGCTAT CAGGAAAGAC ATGGTCATCC TAGAGAAAAG 
551 TGAATTTGCA AATCTGAGAG CAGAGAATGA GAAAATGAAA ATTGAATTAG 
601 ACCAAGTTAA GCAACAACTA ATGCATGAAA CCAGTCGAAT CAGAGCAGAT 
651 AATAAACTGG ATATCAACTT AGAAAGGAGC AGAGTAACAG ATATGTTTAC 
701 AGATCAAGAA AAGCAACTTA TGGAAACAAC TACAGAATTT ACAAAAAAGG 
751 ATACTCAAAC CAAAAGTATT ATTTCAGAGA CCAGTAATM AATTGACGCT 
801 GAAATTGCTT CCTTAAAAAC ACTGATGGAA TCTAACAAAC TTGAGACAAT 
851 TCGTTATCTT GCAGCTTCGG TGTTTACTTG CCTGGCAATA GCATTGGGAT 
901 TTTATAGATT CTGGAAGTAG TATTAATGCT CATCCTGCTG TGGCTGTTGG 
951 CTTCTTAGAA CACCAAACCG GGAGAGATTT ACTTTGAACA TTGTCAGTTG 
1001 CAGCAAAAAT TTACTACACA AGATTATTCG AAGTGTATAC GGACTAAAAG 
1051 AGGAAGTGTT TTAGAATGAG AAGAGATACT GTGTCTTTAT TGTGTGTGTG 
1101 TGAGTGCAGG TGTGTGTCTT TATTATATTG AAAAGCTGTC ACTCAGACCT 
1151 GGTTTGAGAT AGAAGAGCAT TTTGTCCTTT TGATAGTTAA TAGAAATTGA 
1201 ACCAGAGTTT TCTTATGTTT GCTTGAACAG TTGTGTAAAT CATACAGGAT 
1251 TTTGTGGGTA TTGGTTGAAT ATTTGTAAAC CATTCCCTAG CCTACATATT 
1301 TATTACTGAA TTAACTTTCC TGATAACCAT TGCATAATTA CATTTTTCTA 
1351 TAAAATGAAA GATTATTACA ACAAAAAAAA AAAAAAAAAA AAAAA 


BLAST Results 


Entry HS419346 from database EMBL: 
human STS WI-13569. 
Score « 2154, P = 8.6e-91, identities - 446/459 

Entry HS1292427 from database EMBL: 
human STS SHGC-50338. 
Score = 1737, p » 7.2e-72, identities » 359/369 

Entry HS253344 from database EMBL: 
human STS WI-13893. 
Score « 1578, P = l.Oe-64, identities = 358/397 


Medline entries 


No Medline entry 
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Peptide information for frame 3 


ORF from 156 bp to 917 bp; peptide length: 254 
Category: similarity to unknown protein 
Classification: no clue 
Prosite motifs: RGD (15-18) 


1 MNSRQAWRLF LSQGRGDRWV SRPRGHFSPA LRREFFTTTT KEGYDRRPVD 
51 ITPLEQRKLT FDTHALVQDL ETHGFDKTQA ETIVSALTAL SNVSLDTIYK 
101 EMVTQAQQEI TVQQLMAHLD AIRKDMVILE KSEFANLRAE NEKMKIELDQ 
151 VKQQLMHETS RIRADNKLDI NLERSRVTDM FTDQEKQLME TTTEFTKKDT 
201 QTKSIISETS NKIDAEIASL KTLMESNKLE TIRYLAASVF TCLAIALGFY 
251 RFWK 


BLAST? hits 

No BLASTP hits available 

Alert BLASTP hits for DKr2phtes3_19f 19, frame 3 

SWISSPROT: YAN8_SCHP0 HYPOTHETICAL 24.6 KD PROTEIN C3H1.08 IN CHROMOSOME 
I,, N = 1, Score = 144, P = 8.4e-09 

PIR:S56209 probable membrane protein YFL046w - yeast (Saccharomyces 
cerevisiae), N = 1, Score = 138, P = 5.4e-08 


>SWISSPROT: YAN8_SCHP0 HYPOTHETICAL 24.6 KD PROTEIN C3H1.08 IN CHROMOSOME I. 
Length » 211 

HSPs: 

Score = 144 (21.6 bits). Expect = 8.4e-09, P = 8.4e-09 

Identities = 34/121 (28%), Positives = 67/121 (55%) 


Query: 70 LETHGFDKTQAETIVSALTALSNVSLDTIYKEMVTQAQQE-ITVQQLMAHLDAIEIKDMVI 128 

LE G+ AETI + + ++ +L + K + +A+QE ++ QQ L IRK + 
Sbjct: 4 6 LEQAGYSVKNAETITNLMRTITGEALTELEKNIGFKAKQESVSFQQKRTFLQ-IRKYLET 104 

Query: 129 LEKSEFANLRAENEKMKIELDQVKQQLMHETSRIRADNKLDINLERSRVTDMFTDQEKQL 188 

+E++EF +R ++K+ E+++ K L + ++ +L++NLE+ R+ D T + + 

Sbjct: 105 lEENEFDKVRKSSDKLINEIEKTKSSLREDVKTALSEVRLNLNLEKGRMKDAATSRNTNI 164 

Query: 189 ME 190 
E 

Sbjct: 165 HE 166 


Pedant information for DKFZphtes3__l9f 19, frame 3 


Report for DKrZphtes3_19f 19.3 


[LENGTH] 254 

[MW] 29505.73 

IpIJ 6.99 

(HOMOL} PIR:S56209 probable membrane protein YFL046w - yeast (Saccharomyces cerevisiae) 
2e-10 

(FUNCAT) 99 unclassified proteins {S. cerevisiae, YFL046w] 8e-12 

t PROSITE] RGD 1 

(KW] TRANSMEMBRANE 1 

(KW] LOW_COMPLEXITY 5.12 % 

(KW) COILED_COIL 11.02 % 


SEQ MNSRQAWRLFLSQGRGDRWVSRPRGHFSPALRREFFTTTTKEGYDRRPVDITPLEQRKLT 

SEG 

PRD ccchhhhhhhhhccccceeeeccccccchhhhhhheeeeccccccccccccchhhhhhcc 

COILS 

MEM 

SEQ FDTHALVQDLETHGFDKTQAETIVSALTALSNVSLDTIYKEMVTQAQQEITVQQLMAHLD 

SEG 

PRD chhhhhhhhhhhcccccchhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS 
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MEM 

SEQ AIRKDMVILEKSEFANLRAENEKMKIELDQVKQQLMHETSRIRADNKLDINLERSRVTDM 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh^^ 

COI CCCCCCCCCCCCCCCCCCCCCCCCCCCC 

MEM ] 

SEQ FTDQEKQLMETTTEFTKKDTCn"KSlISETSNKIDAEIASLKTLMESNKLETIRYLAASVr 

SEG xxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhcccccccceeeeehhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS 

MEM ' 


. MMMMMMM 


SEQ TCLAIALGFYRFWK 

SEG 

PRD hhhhhhhhhhhccc 

COILS 

MEM MMMMMMMMMM 


Prosite for DKFZphtes3_19fl9.3 
PS00016 15->18 RGD PDOC00016 


(No Pfam data available for DKrZphtes3_19f 19. 3) 
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DKF2phtes3_19jl7 


group: testes derived 

DKF2phtes3_l9jl7 encodes a novel 436 amino acid protein with partial similarity to C.elegans 
Y40B1A.2 protein. 

The novel protein contains two Prosite WW/rsp5/WWP domain signatures. 

The WW domain (or rsp5 or WWP domain) has been originally discovered as a short conserved 
region in a number of unrelated proteins, such as dystrophin, utrophin, vertebrate YAP 

protein, mouse NEDD-4 and yeast RSP5. 

The domain is repeated up to 4 times in some proteins. It has been shown to bind proteins with 
particular proline-raotifs, I AP] -P-P- [AP) -Y, and thus resembles somewhat SH3 domains. It 
appears to contain beta-strands grouped around four conserved aromatic positions; generally 
Trp. The name WW or WWP derives from the presence of these Trp as well as that of a conserved 
Pro, It is frequently associated with other domains typical for proteins in signal 
transduction processes . 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes. 


similarity to C.elegans Y40B1A.2 

there are two long ORFs in this cDNA according to EST: 
HS12146/HS75086/AA923755/MMAA17335 remaining intron at Bp 1506-1733 

Sequenced by MediGenomix 

Locus; unknown 

Insert length: 2762 bp 

Poly A stretch at pos. 2740, no polyadenylation signal found 


1 ATTCTCAGCC AAATTTTTTT ATTTTTTGCA GAATCAGTGT GCAAGGTGGT 
51 TTATAAGATA ATGGAGTGGT TTTTTTTTGT GTTTAGTGTG ATTTGTTATC 
101 AGGAGTCTTA TTGTAACGCT TAAGCATTAG GTTTTTTGTC TGAGAAACTT 
151 TAAAGAGTAA AGCAGAATTG AAAGTGGAAA TTTTAATTTT GTAAGTTCAT 
201 AAAATTTAAT GATAATACAC CAAAGTTTAT GTTTAAATTA GGGAGTTTAA 
251 GGTTTCAATT CTTTCTCTTT TTTTTTGGGG GGGTGATGTT TTACAGGCAC 
301 TTAAGTATTC ATCGAAGAGT CACCCCAGTA GCGGTGATCA CAGACATGAA 
351 AAGATGCGAG ACGCCGGAGA TCCTTCACCA CCAAATT^AAA TGTTGCGGAG 
401 ATCTGATAGT CCTGAAAACA AATACAGTGA CAGCACAGGT CACAGTAAGG 
451 CCAAAAATGT GCATACTCAC AGAGTTAGAG AGAGGGATGG TGGGACCAGT 
501 TACTCTCCAC AAGAAAATTC ACACAACCAC AGTGCTCTTC ATAGTTCAAA 
551 TTCACATTCT TCTAATCCAA GCT^ATAACCC AAGCAAAACT TCAGATGCAC 
601 CTTATGATTC TGCAGATGAC TGGTCTGAGC ATATTAGCTC TTCTGGGAAA 
651 AAGTACTACT ACAATTGTCG AACAGAAGTT TCACAATGGG AAAAACCAAA 
701 AGAGTGGCTT GAAAGAGAAC AGAGACAAAA AGAAGCAAAC AAGATGGCAG 
751 TCAACAGCTT CCCAAAAGAT AGGGATTACA GAAGAGAGGT GATGCAAGCA 
801 ACAGCCACTA GTGGGTTTGC CAGTGGAATG GAAGACAAGC ATTCCAGTGA 
851 TGCCAGTAGT TTGCTCCCAC AGAATATTTT GTCTCAAACA AGCAGACACA 
901 ATGACAGAGA CTACAGACTG CCAAGAGCAG AGACTCACAG TAGTTCTACG 
951 CCAGTACAGC ACCCCATCAA ACCAGTGGTT CATCCAACTG CTACCCCAAG 
1001 CACTGTTCCT TCTAGTCCAT TTACGCTACA GTCTGATCAC CAGCCAAAGA 
1051 AATCATTTGA TGCTAATGGA GCATCTACTT TATCAAAACT GCCTACACCC 
1101 ACATCTTCTG TCCCTGCACA GAAAACAGAA AGAAAAGAAT CTACATCAGG 
1151 AGACAAACCC GTATCACATT CTTGCACAAC TCCTTCCACG TCTTCTGCCT 
1201 CTGGACTGAA CCCCACATCT GCACCTCCAA CATCTGCTTC AGCGGTCCCT 
1251 GTTTCTCCTG TTCCACAGTC GCCAATACCT CCCTTACTTC AGGACCCAAA 
1301 TCTTCTTAGA CAATTGCTTC CTGCTTTGCA AGCCACGCTG CAGCTTAATA 
1351 ATTCTAATGT GGACATATCT AAAATAAATG AAGTTCTTAC AGCAGCTGTG 
1401 ACACAAGCCT CACTGCAGTC TATAATTCAT AAGTTTCTTA CTGCTGGACC 
1451 ATCTGCTTTC AACATAACGT CTCTGATTTC TCAAGCTGCT CAGCTCTCTA 
1501 CACAAGATAT CCCTCTTCAT GAAGGTATCC AAATGGAGAG AGATACACAT 
1551 AGGAGCAAAT GGGAAGTGAA AGGGTCACTT TGTCAGAAAG CTGATAAACA 
1601 GCAGGAATGC CTTGTCTGGA ATGGT^GTAT AATGGTGCAA AGACTCTTGC 
1651 AACCCTCTGG CTAGCCTCAT GAGCAGGAGA CTGCGTGGGA TACCTGGGCC 
1701 TAAATGTAGA ATAAGAAAGA AGAAATAAGG ATGCCCAGCC ATCTAATCAG 
1751 TCTCCGATGT CTTTAACATC TGATGCGTCA TCCCCAAGAT CATATGTTTC 
1801 TCCAAGAATA AGCACACCTC AAACTAACAC AGTCCCTATC AAACCTTTGA 
1851 TCAGTACTCC TCCTGTTTCA TCACAGCCAA AGGTTAGTAC TCCAGTAGTT 
1901 AAGCAAGGAC CAGTGTCACA GTCAGCCACA CAGCAGCCTG TAACTGCTGA 
1951 CAAGCAGCAA GGTCATGAAC CTGTCTCTCC TCGAAGTCTT CAGCGCTCAA 
2001 GCCAGAGAAG TCCATCACCT GGTCCCAATC ATACTTCTAA TAGTAGTAAT 
2051 GCATCAAATG CAACAGTTGT ACCACAGAAT TCTTCTGCCC GATCCACGTG 
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2101 TTCATTAACG CCTGCACTAG CAGCACACTT CAGTGAAAAT CTCATAAAAC 
2151 ACGTTCAAGG ATGGCCTGCA GATCATGCAG AGAAGCAGGC ATCAAGATTA 
2201 CGCGAAGAAG CGCATAACAT GGGAACTATT CACATGTCCG AAATTTGTAC 
2251 TGAATTAAAA AATTTAAGAT CTTTAGTCCG AGTATGTGAA ATTCAAGCAA 
2301 CTTTGCGAGA GCAAAGGATA CTATTTTTGA GACAACAAAT TAAGGAACTT 
2351 GAAAAGCTAA AAAATCAGAA TTCCTTCATG GTGTGAAGAT GTGAATAATT 
2401 GCACATGGTT TTGAGAACAG GAACTGTAAA TCTGTTGCCC AATCTTAACA 
2451 TTTTTGAGCT GCATTTAAGT AGACTTTGGA CCGTTAAGCT GGGCAAAGGA 
2501 AATGACAAGG GGACGGGGTC TGTGAGAGTC AATTCAGGGG AAAGATACAA 
2551 GATTGATTTG TAAAACCCTT GAAATGTAGA TTTCTTGTAG ATGTATCCTT 
2601 CACGTTGTAA ATATGTTTTG TAGAGTGAAG CCATGGGAAG CCATGTGTAA 
2651 CAGAGCTTAG ACATCCAAAA CTAATCAATG CTGAGGTGGC TAAATACCTA 
2701 GCCTTTTACA TGTAAACCTG TCTGCAAAAT TAGCTTTTTT AAAAAAAAAA 
2751 AAAAAAAAAA AA 


BLAST Results 


Entry AC00587 6 from database EMBLNEW: 

Homo sapiens chromosome 10 clone CIT987SK-1188I5 map lOpll .2-10pl2 . 1, 
complete sequence. 

Score » 2130, P « O.Oe+00, identities ^ 426/426 
12 exons matching Bp 492-2740 


Medline entries 


No Medline entry 


Peptide information for frame 2 


ORF from 1757 bp to 2383 bp; peptide length: 209 
Category: questionable ORF 
Classification: no clue 

1 MSLTSDASSP RSYVSPRIST PQTNTVPIKP LISTPPVSSQ PKVSTPWKQ 
51 GPVSQSATQQ PVTADKQQGH EPVSPRSLQR SSQRSPSPGP NHTSNSSNAS 
101 NATVVPQNSS ARSTCSLTPA LAAHFSENLI KHVQGWPADH AEKQASRLRE 
151 EAHNMGTIHM SEICTELKNL RSLVRVCEIQ ATLREQRILF LRQQIKELEK 
201 LKNQNSFMV 


BLAST P hits 
No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_19j 17, frame 2 
No Alert BLASTP hits found 


Peptide information for frame 3 


ORF from 354 bp to 1661 bp; peptide length: 436 
Category: similarity to unknown protein 
Classification: unclassified 
Prosite motifs: WW_D0MAIN_1 (90-116) 
WW_DOMAIN 1 (90-116) 


1 MRDAGDPSPP NKMLRRSDSP ENKYSDSTGH SKAKNVHTHR VRERDGGTSY 
51 SPQENSHNHS ALHSSNSHSS NPSNNPSKTS DAPYDSADDW SEHISSSGKK 
101 YYYNCRTEVS QWEKPKEWLE REQRQKEANK MAVNSFPKDR DYRREVMQAT 
151 ATSGFASGME DKHSSDASSL LPQNILSQTS RHNDRDYRLP RAETHSSSTP 
201 VQHPIKPVVH PTATPSTVPS SPFTLQSDHQ PKKSFDANGA STLSKLPTPT 
251 SSVPAQKTER KESTSGDKPV SHSCTTPSTS SASGLNPTSA PPTSASAVPV 
301 SPVPQSPIPP LLQDPNLLRQ LLPALQATLQ LNNSNVDISK INEVLTAAVT 
351 QASLQSIIHK FLTAGPSAFN ITSLISQAAQ LSTQDIPLHE GIQMERDTHR 
401 SKWEVKGSLC QKADKQQECL VWNGSIMVQR LLQPSG 


BLASTP hits 
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No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_19j 17, frame 3 

TREMBL:CEY40B1A_2 gene: "Y40B1A.2"; Caenorhabditis elegans cosmid 
Y40B1A, N = 1, Score = 144, p « 1.8e-09 


>TREMBL:CEY40B1A_2 gene: "Y40B1A.2"; Caenorhabditis elegans cosmid Y40B1A 
Length = 120 

HSPs: 

Score = 144 (21.6 bits). Expect = 1.8e-09, P = 1.8e-09 
Identities = 30/67 (44%), Positives - 43/67 (64%) 

Query: 90 WSEHISSSGKKYYYNCRTEVSQWEKPKEW-LEREQRQKEANKMAVNSFPK DRDYRRE 145 

W+E +SSSGK YYYN +TE+SQW+KP EW E +++ K VN P+ DR Y 
Sbjct: 11 WTEQMSSSGKMYYYNKKTEISQWDKPAEWPAEGGSAERDKPKGGVNEKPRFAEDR-YNEY 69 

Query: 14 6 VMQATATS 153 

+ Q +++S 
Sbjct: 70 IGQLSSSS 77 


Pedant information for DKFZphtes3_19j 17, frame 2 


Report for DKFZphtes3_19jl7 .2 


[LENGTH J 209 

(MW) 22873.85 

[pi} 9.95 

[KWJ All_Alpha 

[KWl LOWCOMPLEXITY 13.40 % 

SEQ MSLTSDASSPRSYVSPRISTPQTNTVPIKPLISTPPVSSQPKVSTPWKQGPVSQSATQQ 

SEG 

PRD ccccccccccccccccccccccceeeeccccccccccccccccccceeeccccccccccc 

SEQ PVTADKQQGHEPVSPRSLQRSSQRSPSPGPNHTSNSSNASNATVVPQNSSARSTCSLTPA 

SEG xxxxxxxxxxxxxxx . .xxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccceeeeccccccccccchhh 

SEQ LAAHFSENLIKHVQGWPADHAEKQASRLREEAHNMGTIHMSEICTELKNLRSLVRVCEIQ 

SEG 

PRD hhhhhhcchhhhhhccccchhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhhh 

SEQ ATLREQRI LFLRQQI KELEKLKNQNS FMV 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhcccccc 


(No Prosite data available for DKFZphtes3_19j 17 .2) 
<No Pfam data available for DKFZphtes3_19jl7.2) 

Pedant information for DKFZphtes3_19jl7, frame 3 


Report for DKFZphtes3_19jl7.3 


[LENGTH) 436 

IMWJ 47716.62 

(pi] 8.71 

(HOMOL] TREMBL:CEY40B1A_2 gene: "Y40B1A.2"; Caenorhabditis elegans cosndd Y40B1A 6e-08 

(FUNCATl 04.05.03 mrna processing (splicing) [S. cerevisiae, YKL012wJ 2e-04 

(FUNCAT] 30.10 nuclear organization (S. cerevisiae, YKL012w] 2e-04 

[FONCATJ 99 unclassified proteins (s. cerevisiae, YPR152cJ 6e-04 

[ BLOCKS 1 BL01159 WW/rsp5/WWP domain proteins 

I PROSITE) WWDOMAINl 2 

[PFAMJ ww/rsp5/wwp domain containing proteins 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 22.48 % 
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SEQ 
SEG 
PRD 


MRDAGDPSPPNKMLRRSDSPENKYSDSTGHSKAKNVHTHRVRERDGGTSYSPQENSHNHS 
''''''•'■•''^^••'•mt,,,..,,,,,,,^^^^^^^^^^^^^^^^^^^^^^^ xxxxxx 

ccccccccccccccccccccccccccccccccccccceeeeeeccccccccccccccccc 

SEQ ALHSSNSHSSNPSNNPSKTSDAPYDSADDWSEHISSSGKKYYYNCRTEVSQWEKPKEWLE 
SEG xxxxxxxxxxxxxxxxx 

ccccccccccccccccccccccccccccccceeeccccceeeeeeccccccccccchhhh 

REQRQKEANKMAVNSFPKDRDYRREVMQATATSGFASGMEDKHSSDASSLLPQNILSQTS 

hhhhhhhhhhhhcccccccchhhhhhhhhhccccccccccccccccccccccccccc^ 

SEQ RHNDRDYRLPRAETHSSSTPVQHPIKPVVHPTATPSTVPSSPFTLQSDHQPKKSFDANGA 

xxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccceeeeccccccccccccccccccccccccccccccc 

SEQ STLSKLPTPTSSVPAQKTERKESTSGDKPVSHSCTTPSTSSASGLNPTSAPPTSASAVPV 

^ xxxxxxxxxxxx xxxxxxxxxxxxxx 

cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 


PRD 

SEQ 
SEG 
PRD 


PRD 


PRD 

SEQ 
SEG 
PRD 


SEQ SPVPQSPIPPLLQDPNLLRQLLPALQATLQLNNSNVDISKINEVLTAAVTQASLQSIIHK 
SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

cccccccccccccccchhhhhhhhhhhhhhhcccccccccchhhhhhhhhhhhhhhhhhh 
FLTAGPSAFKITSLISQAAQLSTQDIPLHEGIQMERDTHRSKWEVKGSLCQKADKQQECL 

hhcccccceeehhhhhhhhhhhccccccccccccccccccceeeecccchhhhhhhcc^ 

SEQ VWNGSIMVQRLLQPSG 

SEG 

PRD eeccchhhhhhccccc 


Prosite for DKFZphtes3_19j 17.3 

PS01159 90->116 WW_D0MAIN_1 PDOC50020 

PS01159 90->n6 WW_D0MAIN_1 PDOC50020 


Pfam for DKFZphtes3_19j 17. 3 

HMM_NAME WW/rsp5/WWP domain containing proteins 

HMM * LPsGWEeHWDpsGRpWYYWNHETkTTQWEpP* 

+ ++W EH++ SG+ YY+N T+ +QWE+P 
Query 86 SADDWSEHISSSGKK-YYYNCRTEVSQWEKP 115 
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DKFZphtes3_lcl 


group: signal transduction 

DKFZphtes3_lcl encodes a novel 632 amino acid putative GTPase-activating protein, related to 
drosophila rotund transcript and human n-chimaerin . ^ procein, rexated to 

rfLf^-L^^K^^^ associated with type-I phosphatidylinositol 4-phosphate 5-kinase and 
f^I*^?!H ? P^°f"^tion Of phosphatidylinositol 4, 5-bisphosphate. The new protein is 
expected to activate p21rac-related small GTPases. f 

receptor!*''^^*^" application in modulating/blocking the response to a cellular 


similarity to GTPase-activating proteins 
complete cDNA, complete cds, EST hits 
Sequenced by DKFZ 
Locus: unknown 


Insert length: 3237 bp 

Poly A stretch at pos. 3227, no polyadenylation signal found 


1 GCGAAGTGAA GGGTGGCCCA GGTGGGGCCA GGCTGACTGA ATGTATCTCC 
51 TAGCTATGGA CTAAATAATA CATGGGGGGA AATAAACAAG TATTCATGAG 
101 GGTGAAAATG TGACCCAGCA GGAAAATTAC AACTATTTTC AATTGACGTT 
151 GAATAGGATG AGTCATGGAA TTTAAGTGAT TTACTGAAGA TTATACTACT 
201 GGTAGATAGA AGAGCTAAAG AAAGATGGAT ACTATGATGC TGAATGTGCG 
251 GAATCTGTTT GAGCAGCTTG TGCGCCGGGT GGAGATTCTC AGTGAAGGAA 
301 ATGAAGTCCA ATTTATCCAG TTGGCGAAGG ACTTTGAGGA TTTCCGTAAA 
351 AAGTGGCAGA GGACTGACCA TGAGCTGGGG AAATACAAGG ATCTTTTGAT 
401 GAAAGCAGAG ACTGAGCGAA GTGCTCTGGA T6TTAAGCTG AAGCATGCAC 
451 GTAATCAGGT GGATGTAGAG ATCAAACGGA GACAGAGAGC TGAGGCTGAC 
501 TGCGAAAAGC TGGAACGACA GATTCAGCTG ATTCGAGAGA TGCTCATGTG 
551 TGACACATCT GGCAGCATTC AACTAAGCGA GGAGCAAAAA TCAGCTCTGG 
601 CTTTTCTCAA CAGAGGCCAA CCATCCAGCA GCAATGCTGG GAACAAAAGA 
651 CTATCAACCA TTGATGAATC TGGTTCCATT TTATCAGATA TCAGCTTTGA 
701 CAAGACTGAT GAATCACTGG ATTGGGACTC TTCTTTGGTG AAGACTTTCA 
751 AACTGAAGAA GAGAGAAAAG AGGCGCTCTA CTAGCCGACA GTTTGTTGAT 
801 GGTCCCCCTG GACCTGTAAA GAAAACTCGT TCCATTGGCT CTGCAGTAGA 
851 CCAGGGGAAT GAATCCATAG TTGCAAAAAC TACAGTGACT GTTCCCAATG 
901 ATGGCGGGCC CATCGAAGCT GTGTCCACTA TTGAGACTGT GCCATATTGG 
951 ACCAGGAGCC GAAGGAAAAC AGGTACTTTA CAACCTTGGA ACAGTGACTC 
1001 CACCCTGAAC AGCAGGCAGC TGGAGCCAAG AACTGAGACA GACAGTGTGG 
1051 GCACGCCACA GAGTAATGGA GGGATGCGCC TGCATGACTT TGTTTCTAAG 
1101 ACGGTTATTA AACCTGAATC CTGTGTTCCA TGTGGAAAGC GGATAAAATT 
1151 TGGCAAATTA TCTCTGAAGT GTCGAGACTG TCGTGTGGTC TCTCATCCAG 
1201 AATGTCGGGA CCGCTGTCCC CTTCCCTGCA TTCCTACCCT GATAGGAACA 
1251 CCTGTCAAGA TTGGAGAGGG AATGCTGGCA GACTTTGTGT CCCAGACTTC 
1301 TCCAATGATC CCCTCCATTG TTGTGCATTG TGTAAATGAG ATT6AGCAAA 
1351 GAGGTCTGAC TGAGACAGGC CTGTATAGGA TCTCTGGCTG TGACCGCACA 
1401 GTAAAAGAGC TGAAAGAGAA ATTCCTCAGA GTGAAAACTG TACCCCTCCT 
1451 CAGCAAAGTG GATGATATCC ATQCTATCTG TAGCCTTCTA AAAGACTTTC 
1501 TTCGAAACCT CAAAGAACCT CTTCTGACCT TTCGCCTTAA CAGAGCCTTT 
1551 ATGGAAGCAG CAGAAATCAC AGATGAAGAC AACAGCATAG CTGCCATGTA 
1601 CCAAGCTGTT GGTGAACTGC CCCAGGCCAA CAGGGACACA TTAGCTTTCC 
1651 TCATGATTCA CTTGCAGAGA GTGGCTCAGA GTCCACATAC TAAAATGGAT 
1701 GTTGCCAATC TGGCTAAAGT CTTTGGCCCT ACAATAGTGG CCCATGCTGT 
1751 GCCCAATCCA GACCCAGTGA CAATGTTACA GGACATCAAG CGTCAACCCA 
1801 AGGTGGTTGA GCGCCTGCTT TCCTTGCCTC TGGAGTATTG GAGTCAGTTC 
1851 ATGATGGTGG AGCAAGAGAA CATTGACCCC CTACATGTCA TTGAAAACTC 
1901 AAATGCCTTT TCAACACCAC AGACACCAGA TATTAAAGTG AGTTTACTGG 
1951 GACCTGTGAC CACTCCTGAA CATCAGCTTC TCAAGACTCC TTCATCTAGT 
2001 TCCCTGTCAC AGAGAGTCCG TTCCACCCTC ACCAAGAACA CTCCTAGATT 
2051 TGGGAGCAAA AGCAAGTCTG CCACTAACCT AGGACGACAA GGCAACTTTT 
2101 TTGCTTCTCC AATGCTCAAG TGAAGTCACA TCTGCCTGTT ACTTCCCAGC 
2151 ATTGACTGAC TATAAGAAAG GACACATCTG TACTCTGCTC TGCAGCCTCC 
2201 TGTACTCATT ACTACTTTTA GCATTCTCCA GGCTTTTACT CAAGTTTAAT 
2251 TGTGCATGAG GGTTTTATTA AAACTATATA TATCTCCCCT TCCTTCTCCT 
2301 CAAGTCACAT AATATCAGCA CTTTGTGCTG GTCATTGTTG GGAGCTTTTA 
2351 GATGAGACAT CTTTCCAGGG GTAGAAGGGT TAGTATGGAA TTGGTTGTGA 
24 01 TTCTTTTTGG GGAAGGGGGT TATTGTTCCT TTGGCTTAAA GCCAAATGCT 
2451 GCTCATAGAA TGATCTTTCT CTAGTTTCAT TTAGAACTGA TTTCCGTGAG 
2501 ACAATGACAG AAACCCTACC TATCTGATAA GATTAGCTTG TCTCAGGGTG 
2551 GGAAGTGGGA GGGCAGGGCA AAGAAAGGAT TAGACCAGAG GATTTAGGAT 
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2601 GCCTCCTTCT AAGAACCAGA AGTTCTCATT CCCCATTATG AACTGAGCTA 
2651 TAATATGGAG CTTTCATAAA AATGGGATGC ATTGAGGACA GAACTAGTGA 
2701 TGGGAGTATG CGTAGCTTTG ATTTGGATGA TTAGGTCTTT AATAGTGTTG 
2751 AGTGGCACAA CCTTGTAAAT GTGAAAGTAC 7VACTCGTATT TATCTCTGAT 
2801 GTGCCGCTGG CTGAACTTTG GGTTCATTTG GGGTCAAAGC CAGTTTTTCT 
2851 TTTAAAATTG AATTCATTCT GATGCTTGGC CCCCATACCC CCAACCTTGT 
2901 CCAGTGGAGC CCAACTTCTA AAGGTCAATA TATCATCCTT TGGCATCCCA 
2951 ACTAACAATA AAGAGTAGGC TATAAGGGAA GATTGTCAAT ATTTTGTGGT 
3001 AAGAAAAGCT ACAGTCATTT TTTCTTTGCA CTTTGGATGC TGAAATTTTT 
3051 CCCATGGAAC ATAGCCACAT CTAGATAGAT GTGAGCTTTT TCTTCTGTTA 
3101 AAATTATTCT TAATGTCTGT AAAAACGATT TTCTTCTGTA GAATGTTTGA 
3151 CTTCGTATTG ACCCTTATCT GTAAAACACC TATTTGGGAT AATATTTGGA 
3201 AAAAAAGTAA ATAGCTTTTT CAAAATGAAA AAAAAAA 


BLAST Results 


Entry U82984 from database EMBLEST: 

Homo sapiens ORES 56 mRNA sequence. 

Score = 8775, P = O.Oe+00, identities « 1757/1758 

matches 3' end 


Medline entries 


93074974: 

Developmental regulation and neuronal expression of the mRNA of rat 
n-chimaerin, a 

p21rac GAP:cDNA sequence. 

93024458: 

A Drosophila rotund transcript expressed during spermatogenesis and 
imaginal disc 

morphogenesis encodes a protein which is similar to human Rac 
GTPase-activating 

< racGAP ) proteins . 


Peptide information for frame 3 


ORF from 225 bp to 2120 bp; peptide length: 632 
Category: similarity to known protein 


1 MDTMMLNVRN LFEQLVRRVE ILSEGNEVQF IQLAKDFEDF RKKWQRTDHE 
51 LGKYKDLLMK AETERSALDV KLKHARNQVD VEIKRRQRAE ADCEKLERQI 
101 QLIREMLMCD TSGSIQLSEE QKSALAFLNR GQPSSSNAGN KRLSTIDESG 
151 SILSDISFDK TDESLDWDSS LVKTFKLKKR EKRRSTSRQF VDGPPGPVKK 
201 TRSIGSAVDQ GNESIVAKTT VTVPNDGGPI EAVSTIETVP YWTRSRRKTG 
251 TLQPWNSDST LNSRQLEPRT ETDSVGTPQS NGGMRLHDFV SKTVIKPESC 
301 VPCGKRIKFG KLSLKCRDCR VVSHPECRDR CPLPCIPTLI GTPVKIGEGM 
351 LADFVSQTSP MIPSIVVHCV NEIEQRGLTE TGLYRISGCD RTVKELKEKF 
401 LRVKTVPLLS KVDDIHAICS LLKDFLRNLK EPLLTFRLNR AFMEAAEITD 
451 EDNSIAAMYQ AVGELPQANR DTLAFLMIHL QRVAQSPHTK MDVANLAKVF 
501 GPTIVAHAVP NPDPVTMLQD IKRQPKVVER LLSLPLEYWS QFMMVEQENI 
551 DPLHVIENSN AFSTPQTPDI KVSLLGPVTT PEHQLLKTPS SSSLSQRVRS 
601 TLTKNTPRFG SKSKSATNLG RQGNFFASPM LK 

BLASTP hits 

Entry CEK08E3_4 from database TREMBLNEW: 

gene: "K08E3.6"; Caenorhabditis elegans cosmid K08E3 

Score = 452, P = 2.6e-48, identities 126/377, positives = 189/377 

Entry A48122 from database PIR: 

GTPase-activating protein Rac homolog, splice form clone pel. 7 - fruit 
fly (Drosophila melanogaster) (fragment) 

Score « 480, P « 9.2e-46, identities - 111/270, positives « 155/270 
Entry B48122 from database PIR: 

GTPase-activating protein Rac horaolog, splice form clone pcl.7d - fruit 
fly (Drosophila melanogaster) 

Score - 480, P » 9.2e-46, identities « 111/270, positives « 155/270 


659 


wo 01/12659 


PCT/IBOO/01496 


Entry DM22539_1 from database TREMBL: 

gene: "rotund"; product: "rnracGAP'*; Drosophila melanogaster rnracGAP 
(rotund) gene, complete cds. 

Score = 480, P = 9.2e>46, identities « 111/270, positives = 155/270 

Entry S29128 from database PIR: 
N-chimerin - rat 

Score = 336, P = 8.8e-30, identities = 86/253, positives = 128/253 


Alert BLASTP hits for DKrZphtes3_lcl, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_lcl, frame 3 


Report for DKFZphtes3_lcl . 3 


(LENGTH] 632 

(MWJ 71026.84 

(pll 9.08 

[HOMOL] PIR:B48122 GTPase-activating protein Rac homolog, splice form clone pcl.7d - 
fruit fly (Drosophila melanogaster) 2e-46 

tFUNCAT] 10.99 other signal-transduction activities (S. cerevisiae, yBR260cJ 3e-12 

(FUNCAT] 03.22 cell cycle control and mitosis [S, cerevisiae, YER155cJ 2e-ll 

(FUNCAT) 30.03 organization of cytoplasm (S. cerevisiae, YER155cl 2e-ll 

(FUNCAT J 03.04 budding, cell polarity and filament formation (S. cerevisiae, YERlSSc) 
2e-ll 

(FUNCATJ 03.10 sporulation and germination (s. cerevisiae, YDL240w] 3e-09 

(FUNCATJ 30.04 organization of cytoskeleton (S. cerevisiae, YOR134wl 4e-09 

(FUNCAT] 06.10 assembly of protein complexes (S. cerevisiae, YOR134w} 4e-09 

(FUNCATJ 03.07 phieromone response, mating-type determination, sex-specific proteins 

(S. cerevisiae, YOR127w| 5e-09 

(FUNCAT] 09.04 biogenesis of cytoskeleton (S. cerevisiae, YPLllSc) 3e-08 

(FUNCAT] 10.02.09 regulation of g-protein activity (S. cerevisiae, YPLllSc] 3e-08 

(BLOCKS] BL004 79B Phorbol esters / diacylglycerol binding domain proteins 

(BLOCKS] BL00479A Phorbol esters / diacylglycerol binding domain proteins 

(SCOPl dlpbwa_ 1.83.1.1.2 p85 alpha subunit RhoGAP domain (human (Horn le-55 

(SCOP] dlrgp 1.83.1.1.1 p50 RhoGAP domain (human (Homo sapiens) le-49 

(PIRKW] breakpoint cluster region le-19 

(PIRKW] transmembrane protein 7e-08 

(PIRKW) brain 3e-22 

(PIRKW] alternative splicing le-19 

[PIRKW] P-loop 2e'25 

(SUPFAM] CDC24 homology 3e-22 

(SUPFAM] bcr protein 3e-22 

(SUPFAM] myosin motor domain homology 2e-25 

(SUPFAM] pleckstrin repeat homology 4e-10 

(SUPFAM] LIM metal-binding repeat homology 2e-09 

(SUPFAM] protein kinase C zinc-binding repeat homology 5e-29 

(PROSITEJ MYRISTYL 6 

(PROSITE) AMIDATION 1 

(PROSITEJ CAMP_PHOSPHO_SITE 3 

(PROSITEJ CK2_PH0SPH0_SITE 13 

(PROSITEJ TYR_PHOSPHO_SITE 2 

(PROSITEJ PKC_PHOSPHO_SITE 9 

(PROSITE] ASN_GLYCOSYLATION 1 

(PROSITEJ DAG_PE_BINDING_DOMAIN 1 

(PFAMl Phorbol esters / diacylglycerol binding domain 

(KW] Irregular 

(KW] 3D 

(KW) L0W_COMPLEXITY 2.22 % 

(KWJ COILED^COIL 8.54 % 

SEQ MDTMMLNVRNLFEQLVRRVEILSEGNEVQFIQLAKDFEDFRKKWQRTDHELGKYKDLLMK 

SEG 

COILS CCCCCCCCCCCC 

Irgp- 


SEQ AETERSALDVKLKHARNQVDVEIKRRQRAEADCEKLERQIQLIREMLMCDTSGSIQLSEE 

SEG 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

irgp- 

SEQ QKSALAFLNRGQPSSSNAGNKRLSTI DESGS I LS D I S FDKTDES LDWDS S LVKTFKLKKR 

SEG 

COILS 
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Irgp- 


SEQ EKRRSTSRQFVDGPPGPVKKTRSIGSAVDQGNESrVAKTTVTVPNDGGPIEAVSTIETVP 

SEG 

COILS 

irgp- 

SEQ YWTRSRRKTGTLQPWNSDSTLNSRQLEPRTETDSVGTPQSNGGMRLHDFVSKTVIKPESC 

SEG 

COILS 

Irgp- 

SEQ VPCGKRIKFGKLSLKCRDCRVVSHPECRDRCPLPCl PTLIGTPVKIGEGMLADFVSQTSP 

SEG 

COILS . . 

irgp- 

S EQ MI PS I VVHCVNEIEQRGLTETGLYRISGCDRTVKELKEKFLRVKTVPLLSKVDDIHAICS 

SEG 

COILS 

1 rgp- . CCHHHHHHHHHHHHHHTTTTTTTTTCCCHHHHHHHHHHHHHCCCCCG-GGCCCCHHHHH 

SEQ LLKDFLRNLKEPLLTFRLNRAFMEAAEITDEDNSIAAMYQAVGELPQANRDTLAFLMIHL 

SEG 

COILS 

Irgp- HHHHHHHHTTTTTTTGGGHHHHHHTTTT-CGGGHHHHHHHHHHHCCHHHHHHHHHHHHHH 

SEQ QRVAQSPHTKMDVANLAKVFGPTIVAHAVPNPDPVTMLQDIKRQPKWERLLSLPLEYWS 

SEG 

COILS 

1 rgp- HHHHHHHHHCCCHHHHHHHHGGGCC 

SEQ QFMMVEQENIDPLHVIENSNAFSTPQTPDIKVSLLGPVTTPEHQLLKTPSSSSLSQRVRS 

SEG xxxxxxxxxxx 

COILS 

Irgp- 

SEQ TLTKNTPRFGSKSKSATNLGRQGNFFASPMLK 

SEG XXX 


COILS 
Irgp- 


Prosite for DKFZphtes3_lcl.3 


PSOOOOl 
PS00004 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00007 
PS00007 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00009 


376- >385 
131->137 
150->156 
276->282 

377- >383 
388->394 
623->629 
303->307 


144->148 
206->210 
234->23e 
270->274 
323->327 
387->391 
392->396 
410->414 
449->453 
489->493 
579->583 


174->177 
186->189 
245->24B 
313->3I6 
392->39S 
435->438 
595->598 
606->609 


212->216 
141->145 
182->186 
246->250 


46->55 


47->51 
66->70 


63->66 


ASN^GLYCOSYLATION 

CAMP_PHOSPHO SITE 

CAMP PHOSPHO'SITE 

CAMP~PHOSPHO"SITE 

PKC_PHOS PHO_S I TE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0S PHO__S I T E 

CK2 PHOSPHO SITE 

CK2~PH0SPH0^SITE 

CK2_PH0SPH0_SITE 

TYR_PH0SPH0_SITE 

TYR^PHOSPHO SITE 

MYRISTYL " 


MYRISTYL 
MYRISTYL 
MYRISTYL 
MYRISTYL 
MYRISTYL 
AMIDATION 


PDOCOOOOl 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00009 
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PS00479 287->336 DAG_PE_BINDING_DOMAIN PDOC00379 


Pfam for DKFZphtes3_lcl . 3 

HMM^NAME Phorbol esters / diacylglycerol binding domain 

HMM ♦HrFmrHTFrqPTWCDHCgeFIWGWgKQGYQCQnCgMNCHKRCHelVPinra 

H+F+ +T + P +C CG +1 +GK ++C +C+++ H +C+ + P 
Query 287 HDFVSKTVIKPESCVPCGKRI-KFGKLSLKCRDCRWSHPECRDRCPLP 334 

HMM C* 
C 

Query 335 C 335 
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DKF2phtes3_lgl3 


group: intracellular transport and trafficking 

DKFZp DKFZphtes3_lgl3 encodes a novel 1007 amino acid protein with similarity to human 256 kD 

golgin. 

The new protein contains 7 leucine zippers and seems to be involved in protein-protein- 
interaction in the golgi apparatus. The very similar rat cpl51 shows 
haploid-specific transcription in mus musculus testis. 

The new protein can find application in modulating protein traffic in the golgi apparatus, 
especially in human haploid germ cells. 


similarity to 256 kD golgi, strong similarity to rat "cplSl" 

21 exons encoded on AC004682 

EST from a testis library, two mouse ESTs of a testis cDNA library, 
rat cplSl shows haploid-specific transcription! 
testis or haploid-specific transcription 

Sequenced by DKFZ 

Locus: map="16q22.2'* 

Insert length: 3405 bp 

Poly A stretch at pos. 3394, polyadenylatlon signal at pos. 3373 


1 GGGATAGGGG ATGTGGTTTG TTACAAAGGA TGAGTATTTT GATAGCTTCT 
51 CATTCCTTGA ACTATTCTGC AGGTTTATAA CAAAGCTCAG AAAATACTAA 
101 AGGTTAAAGG AGAATTGAGA GCTGCCAAGG AAATGAAAGA TGAGGCGGGG 
151 GAGAGAGACA GAGAAGTGAG CAGCCTGAAC AGCAAGCTGT TAAGCCTGCA 
201 ACTTGACATC AAGAATCTGC ACGATGTCTG CAAGAGACAG AGGAAGACCT 
251 TGCAGGACAA TCAGCTCTGC ATGGAGGAGG CAATGAACAG CAGCCACGAC 
301 AAGAAGCAAG CACAGGCATT AGCATTCGAG GAGTCAGAGG TGGAATTTGG 
351 GTCCAGTAAA CAGTGTCATC TGAGACAACT CCAGCAACTG AAGAAAAAAT 
401 TGCTGGTCCT TCAACAAGAA CTGGAGTTTC ACACAGAGGA GTTGCAGACT 
4 51 TCTTACTATT CTCTCCGCCA GTATCAGTCC ATCCTAGAGA AGCAGACTTC 
501 CGACCTGGTT CTTCTGCACC ATCACTGCAA ACTGAAAGAA GATGAGGTGA 
551 TTCTCTATGA GGAGGAAATG GGAAATCACA ACGAGAACAC AGGGGAGAAG 
601 CTCCATTTGG CGCAGGAGCA ACTCGCCTTG GCCGGGGACA AGATCGCCTC 
651 TCTAGAGAGG AGCTTAAACC TCTACAGGGA TAAATACCAG TCTTCCCTGA 
7 01 GCAACATCGA GTTACTAGAA TGCCAAGTGA AGATGTTGCA GGGGGAACTC 
7 51 GGCGGGATCA TGGGTCAGGA GCCTGAGAAC AAGGGTGATC ATTCAAAGGT 
801 ACGGATATAC ACTTCTCCTT GCATGATTCA AGAGCATCAG GAGACTCAGA 
851 AACGACTGTC TGAAGTCTGG CAAAAGGTCT CTCAACAGGA TGATCTCATT 
901 CAAGAACTTC GAAATAAGCT GGCCTGCAGT AACGCTTTGG TTCTGGAGCG 
951 TGAAAAGGCT TTGATAAAAC TACAAGCCGA TTTTGCTTCC TGTACAGCCA 
1001 CCCACAGATA CCCTCCTAGC TCCTCAGAAG AGTGTGAAGA CATCAAAAAG 
1051 ATACTGAAGC ACTTGCAGGA GCAGAAAGAC AGCCAGTGCC TGCATGTGGA 
1101 GGAGTACCAG AACCTGGTGA AGGATCTGCG CGTGGAACTA GAGGCCGTGT 
1151 CGGAACAGAA GAGAAACATC ATGAAGGACA TGATGAAGCT GGAGCTGGAC 
1201 CTGCACGGAC TGCGGGAGGA GACATCTGCC CACATTGAGA GGAAGGATAA 
1251 GGACATCACC ATCCTGCAGT GCCGGCTGCA GGAGCTGCAG CTGGAGTTCA 
1301 CCGAGACCCA AAAGCTCACT TTGAAGAAAG ACAAGTTCCT CCAAGAG7UUV 
1351 GATGAGATGC TGCAAGAGCT GGAGAAGAAA CTGACACAGG TTCAGAACAG 
1401 CCTCCTGAAA AAGGAGAAGG AGCTGGAGAA GCAGCAGTGC ATGGCCACAG 
1451 AACTTGAAAT GACAGTCAAG GAGGCTAAGC AGGACAAGTC CAAGGAGGCG 
1501 GAGTGCAAGG CCCTGCAGGC TGAGGTCCAG AAGCTGAAGA ACAGTCTCGA 
1551 AGAGGCCAAG CAGCAGGAGA GGCTGGCTGC TCAGCAAGCA GCCCAGTGCA 
1601 AAGAAGAGGC TGCACTGGCA GGCTGTCACC TGGAGGACAC CCAGAGGAAA 
1651 CTGCAGAAGG GTCTCCTCCT GGACAAGCAG AAGGCAGACA CCATCCAGGA 
1701 ACTACAGAGA GAACTTCAGA TGCTGCAGAA GGAGTCCTCG ATGGCTGAGA 
17 51 AGGAACAAAC CTCCAACAGA AAACGGGTGG AGGAGCTGTC ATTAGAACTC 
1801 TCTGAAGCCC TGAGGAAGCT TGAAAATTCA GACAAGGAAA AGAGGCAGCT 
1851 TCAGAAGACA GTGGCTGAGC AGGATATGAA AATGAATGAC ATGCTTGATC 
1901 GTATCAAGCA CCAGCACAGG GAGCAAGGCT CCATCAAATG CAAGTTAGAA 
1951 GAAGATCTTC AGGAGGCCAC AAAGCTTCTG GAGGACAAAC GGGAGCAGTT 
2001 GAAGAAGAGC AAAGAGCATG AGAAGCTGAT GGAGGGAGAA CTTGAAGCTT 
2051 TGCGGGAGGA ATTTAAAAAG AAAGACAAGA CGTTGAAAGA GAATTCCAGA 
2101 AAGTTGGAGG AAGAAAATGA GAATCTCCGA GCAGAGCTAC AGTGTTGTTC 
2151 TACACAACTG GAATCCTCTC TCAACAAATA CAACACCAGC CAGCAAGTCA 
2201 TCCAAGACTT GAATAAAGAG ATAGCCCTTC AGAAGGAGTC CTTAATGAGC 
2251 CTGCAGGCCC AGCTGGAGAA AGCTCTGCAG AAGGAGAAGG ACTATCTCCA 
2301 GACTACCATC ACCAAAGAAG CCTATGATGC ATTATCCCGG AAGTCAGCCG 
2351 CCTGCCAGGA TGACCTGACA CAAGCCCTCG AGAAGCTCAA TCACGTGACC 
2401 TCAGAGACAA AGAGCCTGCA GCAAAGCTTG ACACAGACCC AAGAGAAGAA 
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2451 AGCTCAGCTG GAAGAGGAAA TCATTGCTTA TGAGGAAAGG ATGAAAAAGC 
2501 TCAATACGGA ATTAAGAAAA CTGCGGGGCT TCCACCAGGA GAGTGAGCTG 
2551 GAGGTGCACG CCTTTGACAA GAAGCTAGAG GAGATGAGCT GCCAGGTGCT 
2601 GCAGTGGCAG AAGCAACACC AGAATGACCT CAAGATGCTG GCAGCCAAAG 
2651 AGGAGCAGCT CAGGGAGTTC CAGGAGGAGA TGGCCGCCTT AAAAGAGAAC 
2701 CTCCTTGAGG ACGATAAGGA GCCCTGCTGC CTGCCCCAGT GGTCTGTGCC 
2751 CAAAGACACC TGTAGGCTCT ACCGAGGGAA TGATCAGATT ATGACCAACT 
2801 TGGAGCAATG GGCAAAACAG CAGAAGGTCG CCAATGAGAA ACTAGGAAAC 
2851 CAGCTCCGAG AGCAGGTGAA CTACATTGCC AAGCTGAGTG GCGAAAAGGA 
2901 CCACCTCCAC AGTGTAATGG TCCACTTGCA GCAGGAAAAC AAGAAGCTGA 
2951 AGAAGGAGAT AGAAGAGAAG AAGATGAAAG CCGAGAACAC AAGGCTATGC 
3001 ACCAAAGCCC TAGGCCCGAG CAGAACGGAG TCCACACAGA GGGAGAAAGT 
3051 GTGCGGCACC TTGGGCTGGA AGGGGTTGCC CCAGGATATG GGTCAAAGAA 
3101 TGGACCTCAC CAAGTACATC GGGATGCCCC ACTGCCCGGG TTCCTCATAC 
3151 TGCTAGAATC CACATCTAGC CCTGAGCAGC ATTTCCACGG GTGTTTCTTC 
3201 AGAGGACAGT GAGTTCCCAG CCCTCCCTCT CTCTTGACCT GGATCAGCTC 
3251 TTACAGGAGT ATATCACGGT CCCAGCCTAT TTTGCAAGAC ACTAACTTTT 
3301 GTTGAGTTTT GTCCACTTCC TGCCATGGAG TGAGCTTTAG AACCATACTA 
3351 CCATCTCCAG GCCCAAACTC TGAAATAAAG ACATGAGCAT GAGCAAAAAA 
3401 AAAAA 


BLAST Results 


Entry AC004682 from database EMBLNEW: 

Homo sapiens Chromosome 16 BAG clone CIT987SK-A-259H10, complete 
sequence . 

Score = 1291, P = O.Oe+00, identities - 265/272 


Medline entries 


No Medline entry 


Peptide information for frame 1 


ORF from 133 bp to 3153 bp; peptide length: 1007 

Category: similarity to known protein 

Prosite motifs: LEUCINE_ZIPPER (83-105) 

LEUCINE_ZIPPER (90-112) 

LEUCINE^ZIPPER (97-119) 

LEUCINE_ZIPPER (104-126) 

LEUCINE_ZIPPER (403-425) 

LEUCINEZIPPER (410-432) 

LEUCIN£_ZIPP£R (918-940) 


1 MKDEAGERDR EVSSLNSKLL SLQLDIKNLH DVCKRQRKTL QDNQLCMEEA 
51 MNSSHDKKQA QALAFEESEV EFGSSKQCHL RQLCX2LKKKL LVLQQELEFH 
101 TEELQTSYYS LRQYQSILEK (JTSDLVLLHH HCKLKEDEVI LYEEEMGNHN 
151 ENTGEKLHLA QEQLALAGDK lASLERSLNL YRDKYQSSLS MIELLECQVK 
201 MLQGELGGIM GQEPENKGDH SKVRIYTSPC MIQEHQETQK RLSEVWQKVS 
251 QQDDLIQELR NKLACSNALV LEREKALIKL QADFASCTAT HRYPPSSSEE 
301 CEDIKKILKH LQEQKDSQCL HVEEYQNLVK DLRVELEAVS EQKRNIMKDM 
351 MKLELDLHGL REETSAHIER KDKDXTILQC RLQELQLEFT ETQKLTLKKD 
401 KFLQEKDEML QELEKKLTQV QNSLLKKEKE LEKCXJCMATE LEMTVKEAKQ 
451 DKSKEAECKA LQAEVQKLKN SLEEAKQQER LAAQQAAQCK EEAALAGCHL 
501 EDTQRKLQKG LLLDKQKAOT IQELQRELQM LQKESSMAEK EQTSNRKRVE 
551 ELSLELSEAL RKLENSDKEK RQLQKTVAEQ DMKMNDMLDR IKHQHREQGS 
601 IKCKLEEDLQ EATKLLEDKR EQLKKSKEHE KLMEGELEAL RQEFKKKDKT 
651 LKENSRKLEE ENENLRAELQ CCSTQLESSL NKYNTSQQVI QDLNKEIALQ 
701 KESLMSLQAQ LDKALQKEKH YLQTTITKEA YDALSRKSAA CQDDLTQALE 
751 KLNHVTSETK SLQQSLTQTQ EKKAQLEEEI lAYEERMKKL NTELRKLRGF 
801 HQESELEVHA FDKKLEEMSC QVLQWQKQHQ NDLKMLAAKE EQLREFQEEM 
851 AALKENLLED DKEPCCLPQW SVPKDTCRLY RGNDQIMTNL EQWAKQQKVA 
901 NEKLGNQLRE QVNYIAKLSG EKDHLHSVMV HLQQENKKLK KEIEEKKMKA 
951 ENTRLCTKAL GPSRTESTQR EKVC(?rLGWK GLPQDMGQRM DLTKYIGMPH 
1001 CPGSSYC 

BLAST? hits 
Entry HS417401_1 from database TREMBL: 

product: "trans-Golgi p230"; Human trans-Golgi p230 mRNA, complete 
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cds. 

Score = 411, P = 3.9e-34, identities = 212/862, positives-^ 420/862 
Entry SCINTANA_1 from database TREMBL: 

Saccharomyces cerevisiae integcin analogue gene, complete cds. 
Score =404, P = 6.2e-34, identities = 199/891, positives « 423/897 

Entry HS6802_2 from database TREMBL: 

gene: "MYH9"; product: *'dJ6802.2"; Homo sapiens DNA sequence from PAC 
6802 on chromosome 22. Contains apolipoprotein L, myosin heavy chain, 
ESTs, CA repeat, STS and GSS- 

Score = 404, P = 1.9e-33, identities = 231/1028, positives = 469/1028 
Entry AF092090_1 from database TREMBL: 

product: "cpl51"; Rattus norvegicus cpl51 mRNA, partial cds. 

Score 2523, P « 3.0e-262, identities *= 506/733, positives 611/733 


Alert BLASTP hits for DKF2phtes3_lgl3, frame 1 

TREMBL :HSG0LGIN_1 product: "256 kD golgin"; H. sapiens mRJNA for golgin, 
N = 1, Score = 411, P = 4.4e-34 

TREMBL :HS4 174 Oi l product: "trans-Golgi p230"; Human trans-Golgi p230 

mRNA, complete cds., N = 1, Score =411, P = 4.5e-34 

TREMBL: SCI NTANA_1 Saccharomyces cerevisiae integrin analogue gene, 
complete cds., N = 1, Score = 404, P = 7.1e-34 


>TREMBL : HSGOLGIN_l product : 
Length = 2,185 

HSPs: 


"256 kD golgin"; H. sapiens mRNA for golgin 


Score = 411 (61.7 bits). Expect = 4.4e-34, P = 4.4e-34 
Identities - 212/816 (25%), Positives = 420/816 (51%) 


Query: 145 EMGKHNEN-TGEKLHLAQEQLALAGDKIASLERSLMLYRDKYQSSLSNIELLECQVKMLQ 203 

+M + E+ G L +EQL ++ +ERSL4- YR KY ++ ++L+ + K LQ 

Sbjct: 119 OMDSEAEDLVGNSDSLNKEQLI QRLRRMERSLSSYRCKYSELVTAYQMLQREKKKLQ 175 

Query: 204 GELGGIMGQEPENKGDHSKVRI YTSPCMIQEHQETQKRLSEVWQ-KVSQQDDLIQELRNK 262 

G 1+ Q D S RI +Q Q+ +K L E + + ++D I L+ + 

Sbjct: 176 G ILSQSQ DKSLRRIAELREELQMDQQAKKHLQEEFDASLEEKDQYISVLQTQ 227 

Query: 263 LAC SNALVLEREKALIKLQADFASCTATHRYPPSSSEEC-ED--IKKILKHLQE 313 

++ + + ++ K L +L+ A P S E ED K L+ LQ+ 

Sbjct: 228 VSLLKQRLRNGPMNVDVLKPLPQLEPQ-AEVFTKEENPESDGEPVVEDGTSVKTLETLQQ 286 

Query: 314 QKDSQ CLH-VEEYQNLVKDLRVELEAVSEQKRNIMKDMMKLELDLHGLREETSA 366 

+ Q C +■»- ++ L E EA+ EQ ++++ K++ DLH + E+T 

Sbjct: 287 RVKRQENLLKRCKETIQSHKEQCTLLTSEKEALQEQLDERLQELEKIK-DLH-MAEKTKL 344 

Query: 367 HIERKDKDITILQCRLQELQLEFTETQKLTLKKDKFLQEKDEMLQELEKKLTQV--QNSL 424 

+ +D I 0 Q+ + ET++ + + L+ K+E + +L ++ Q+ Q 
Sb j c t : 345 ITQLRDAKNLIEQLE-QDKGMVI AETKR QMHETLEMKEEEIAQLRSRIKQMTTQGEE 4 00 

Query: 425 LKKEKELEKQQCMATELEMTVKEAKQDKSKEAECKALQAEVQKLKNSLEEAKQQERLAAQ 484 

L+++KE + ++ ELE + A+ K++EA K L+AE+ + ++E+ ++ER++ Q 
Sbjct: 401 LREQKE-KSERAAFEELEKALSTAQ--KTEEARRK-LKAEMDEQIKTIEKTSEEERISLQ 4 56 

Query: 485 QA-AQCKEEAA-LAGCHLEDTQRKLQKGLLLDKQKADTIQELQRELQMLQKESSMAEKEQ 542 

Q ++ K+E + E+ KLQK L +K+ A QEL ++LQ ++E E+ + 

Sbjct: 457 QELSRVKQEVVDVMKKSSEEQIAKLQK— LHEKELARKEQELTKKLQTRERE— FQEQMK 512 

Query: 543 TSNRKRVEELSLELSEALRKLENSDKEKRQLQKT— VAEQDMKMNDMLDRIKHQHREQGS 600 

+ K E L++S+ + E+ E+ +LQK + E + K+ D+ + 
Sbjct: 513 VALEKSQSEY-LKISQEKEQQESLALEELELQKKAILTESENKLROLQQEAETYRTRILE 571 

Query: 601 IKCKLEEDLQEATKLLED KREQLKKSKEHEKLMEG ELEALR-QEFKKKDKTL 651 

++ LE+ LQE +D + E+ K +KE ++E ELE+L+ Q+ + L 

Sbjct: 572 LESSLEKSLQENKNQSKDLAVHLEAEKNKHNKEITVMVEKHKTELESLKHQQDALWTEKL 631 

Query: 652 KENSRKLEEENENLRAELQCCSTQLESSL-NKYNTSQQVIQDLNKE lALQKESLMS 706 

+ ++ + E E LR + C E+ L +K Q I+++N++ + +++ L S 
Sbjct: 632 QVLKQQYQTEMEKLREK CEQEKETLLKDKEIIFQAHIEEMNEKTLEKLDVKQTELES 688 

Query: 707 LQAQLDKALQKEKHYLQT— TITKEAYDALSRKSAACQDDLTQALEKLNHVTSETKSLQQ 764 
L ++L + L K +H L+ ++ K+ D + ++ A D+ Q V S K + 
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Sbjct: 689 LSSELSEVL-KARHKLEEELSVLKDQTDKMKQELEAKMDE— QKNHHQQQVDSIIKEHEV 745 

Query: 765 SLTQTQEKKAQLEEEIIAYEERMKKLNTELRKLRGFHQESELEVHAFDKKLEEMSCQVLQ 824 

S+ +T+ KA L+++I E +K+ + L++ + + E ++ + +L++ S ++ 
Sbjct: 746 SIQRTE— KA-LKDQINQLELLLKERDKHLKEHQAHVENLEADIKRSEGELQQASAKLDV 802 

Query: 825 WQKQHQNDLKMLAAKEEQLREFQEEMAALKENLLEDDKEPCCLPQW SVPKDTC-R 878 

+Q +Q+ A EQ + ++E++A L++ LL+ + E L + + KD C 

Sbjct: 803 FQS-YQS ATHEQTKAYEEQLAQLQQKLLDLETERILLTKQVAEVEAQKKDVCTE 855 

Query: 879 LYRGNDQIMTNLEQWAKQQKVANEKLGNQLREQVNYIAKLS-GEKDHLHSVMVHLQQENK 937 

L Q+ ++Q KQ +K+ + QV Y +KL G K+ + + -f-l-fCN 

Sbjct: 856 LDAHKIQVQDLMQQLEKQNSEMEQKVKSLT— QV-YESKLEDGNKEQEQTKQILVEKENM 912 

Query: 938 KLK-KEIEEKKMKAENTRLCTK 958 

L+ +E ++K+++ +L K 
Sbjct: 913 ILQMREGQKKEIEILTQKLSAK 934 

Score = 338 (50.7 bits). Expect = 3.1e-26, P = 3.1e-26 
Identities = 216/953 (22%), Positives = 468/953 (49%) 

Query: 2 KDEAGERDRE~VSSLNS-KLL-SLQLDIKNLHDVCKRQRKTLQDN-QLCM EEAM 51 

K+E E D E V S K L +LQ +K ++ KR ++T+Q + + C +EA+ 
Sbjct: 260 KEENPESDGEPWEDGTSVKTLETLQQRVKRQENLLKRCKETIQSHKEQCTLLTSEKEAL 319 

Query: 52 NSSHDKKQAQALAFEESEVEFGSSKQCHLRQ LQQLK— KKLLVLQQELEFHTEELQ 105 

D++ + ++ -I- + LR ++QL+ K +++ + + + H E L+ 

SbjCt: 320 QEQLDERLQELEKIKDLHMAEKTKLITQLRDAKNLIEQLEQDKGMVIAETKRQMH-ETLE 378 

Query: 106 TSYYSLRQYQSILEKQTSDLVLLHHHCKLKEDEVILYEEEMGNHNENTGEKLHLAQEQL- 164 

+ Q +3 +++ T+ L K K + EE +T +K A+ +L 

Sbjct: 379 MKEEEIAQLRSRIKQMTTQGEELREQ-KEKSERAAFEELEKAL STAQKTEEARRKLK 434 

Query: 165 ALAGDKIASLERSLNLYRDKYQSSL3NI — ELLECQVKMLQGELGGIMGQEPENKGDHSK 222 

A ++I ++E++ R Q LS + E+++ K + ++ + Q+ K K 
Sbjct: 435 AEMDEQIKTIEKTSEEERISLQQELSRVKQEWDVMKKSSEEQIAKL — QKLHEKELARK 492 

Query: 223 VRIYTSPCMIQEHQETQKRLSEVWQKVSQQDDLIQELRNKLACSNALVLEREKALIKLQA 282 

+ T +E +E Q+++ +K SQ + L ++ + +L LE ++LQ 
Sbjct: 493 EQELTKKLQTRE-REFQEQMKVALEK-SQSEYL— KISQEKEQQESLALEE LELQK 544 

Query: 283 DFASCTATHRYPPSSSEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEAV-SE 341 

AT+ +EE + + L+ + ++E +N KDL V LEA ++ 

Sbjct: 54 5 K-AILTESENKLRDLQQEAETYRTRILELESSLEKS LQENKNQSKDLAVHLEAEKNK 600 

Query: 342 QKRNIMKDMMKLELDLHGLREETSAHIERKDKDITI-LQCRLQELQLEFTETQKLTLKKD 400 

+1 +K++LL++A K++ Q +++L+ E E +K TL KD 
Sbjct: 601 HNKEITVMVEKHKTELESLKHQQDALWTEKLQVLKQQYQTEMEKLR-EKCEQEKETLLKD 659 

Query: 401 K FLQEKDEM-LQELEKKLTQVQNSLLKKEKELEKQQCMATELEMTVKEAKQDKS 453 

K ++E +E L++L+ K T+++ SL + E+ K + E E++V + + DK 

Sbjct: 660 KEIIFQAHIEEMNEKTLEKLDVKQTELE-SLSSELSEVLKARHKLEE-ELSVLKDQTDKM 717 

Query: 454 K-EAECKALQAEVQKLKNSLEEAKQQERLAAQQAAQC-KEEAALAGCHLEDTQRKLQKGL 511 

K E E K + + + ++ ++•♦•+ Q+ + K++ L++ + L++ 

Sbjct: 718 KQELEAK-MDEQKNHHQQQVDSIIKEHEVSIQRTEKALKDQZNQLELLLKERDKHLKEHQ 776 

Query: 512 L-LDKQKADTIQELQRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLENSDKEK 570 

++ +AD 1+ + ELQ + + + Q++ ++ + +L++ +KL + + E+ 
Sbjct: 777 AHVENLEAD-IKRSEGELQQASAKLDVFQSYQSATHEQTKAYEEQLAQLQQKLLDLETER 835 

Query: 571 RQLQKTVAEQDMKMNDM LD--RIKHQHREQGSIK— CKLEEDLQEATKLLEDKREQL 623 

L K VAE + + D+ LD +1+ Q Q K ++E+ ++ T++ EKE 
Sbjct: 836 ILLTKQVAEVEAQKKDVCTELDAHKIQVQDLMQQLEKQNSEMEQKVKSLTQVYESKLEDG 895 

Query: 624 KKSKEHEK--LMEGELEALRQEFKKKDKTLKENSRKLEEENENLRAELQCCSTQLESSLN 681 

K +E K L+E E L+ +K K ++ ++KL + +++ + T+ ++ 
Sbjct: 896 NKEQEQTKQILVEKENMILQMREGQK-KEIEILTQKLSAKEDSIHILNEEYETKFKNQEK 954 

Query: 682 KYNTSQQVIQDLNKEIALQKESLMSLQAQLDKALQKEKHYLQTTITKEAYDALSRKSAAC 741 

K +Q +++ + + K+ L+ +A+L K L E L+ + ++ ++A + A 
Sbjct: 955 KMEKVKQKAKEMQETL KKKLLDQEAKLKKEL— ENTALELSQKEKQFNAKMLE^4AQA 1009 

Query: 742 QD-DLTQAIiEKLNHVTSETKSLQQSLTQTQEKKAQLEEEIIAYEERMKKLNTELRKLRGF 800 

++ A+ +L T++ + ++ SLT+ + +L + I +E KKLN + +L+ 
Sbjct: 1010 NSAGISDAVSRLE — TNQKEQIE-SLTEVHRR— ELNDVISIWE KKLNQQAEELQEI 1061 

Query: 801 HQESELEVHAFDKKLEEMSCQVM?W— QKQHQNDLKMLAAKEEQLREFQEEMAALKENLL 858 

H E+++ ++++ E+ ++L + +K+ N ++ KEE +++ + L+E L 
Sbjct: 1062 H EIQLQEKEQEVAELKQKILLFGCEKEEMNK-EITWLKEEGVKQ-DTTLNELQEQLK 1116 
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Query: 859 EDDKEPCCLPQWSVPKDTCRLYRGNDQIMTNLEQ--WAKQQKVANEKLGNQLREQVNYI- 915 

+ L Q K L + + +L++ + ++Q V + L + + +V+ + 

Sbjct: 1117 QKSAHVNSLAQ-DETKLKAHLEKLEVDLNKSLKENTFLQEQLVELKMLAEEDKRKVSELT 117 5 

Query: 916 AKLSGEKDHLHSVMVHLQQENKKLK-KEIEEKKMKAE 951 

+KL + S+ ++ NK L+ K +E KK+ E 
Sbjct: 1176 SKLKTTDEEFQSLKSSHEKSNKSLEDKSLEFKKLSEE 1212 

Score = 337 (50.6 bits). Expect = 4.0e-26, P « 4.0e-26 
Identities = 215/951 (22%), Positives = 433/951 (45%) 

Query: 10 REVSSLNSKLLSLQLDIKNLHDVCKRQRKTLQDNQLCMEEAMNSSHDKKQAQALAFEESE 69 

+E + +++L L+++ KQKL+ EA+ H+K+ + E+ + 

Sbjct: 560 QEAETYRTRILELESSLEKSLQENKNQSKDLAVHL EAEKNKHNKEIT— VMVEKHK 613 

Query: 70 VEFGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELQTSYYSLRQYQSILEKQTSDLVLLH 129 

E S K H +Q +KL VL+Q+ + E+L+ Q + L K +++ 

Sbjct: 614 TELESLK— H-QQDALWTEKLQVLKQQYQTEMEKLREK CEQEKETLLKD-KEI 1 FQA 666 

Query: 130 HHCKLKE DEVILYEEEMGNHNENTGEKL HLAQEQLALAGDKIASLERSLNLYRD 183 

H ++ E +++ ++E+++ EL H +E+L++ D+ +++ L D 
Sbjct: 667 HIEEMNEKTLEKLDVKQTELESLSSELSEVLKARHKLEEELSVLKDQTDKMKQELEAKMD 726 

Query: 184 K YQSSLSNIELLECQVKMLQGE— LGGIMGQEPENKGDHSKVRIYTSPCMIQEHQE 237 

+ +Q++I+E+V + + EL +Q +K+ ++ + 

Sbjct: 727 EQKNHHQQQVDSI-IKEHEVSIQRTEKALKDQINQLELLLKERDK-HLKEHQAHVENLEA 784 

Query: 238 TQKRLSEVWQKVSQQDDLIQELRNKLACSNALVLEREKALIKLQADFASCTATHRYPPSS 297 

KR Q+ S + D+ Q ++ ++ E+ L +LQ T R 
Sbjct: 785 DIKRSEGELQQASAKLDVFQSYQS ATHEQTKAYEEQLAQLQQKLLDLE-TERIL 837 

Query: 293 SEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEAVSEQKRNIMKDMMKL-ELD 356 

+ K + ++ QK C ++ ++ V+DL +LE + + +K + ++ E 
Sbjct: 838 LTKQVAEVEAQKKDVCTELDAHKIQVQDLMQQLEKQNSEMEQKVKSLTQVYESK 891 

Query: 357 LH-GLREETSAHIERKDKDITILQCRL-QELQLEFTETQKLTLKKDKF — LQEKDEM-LQ 411 

L G +E+ +K+ ILQ R Q+ ++E TQKL+ K+D L E+ E + 

Sbjct: 892 LEDGNKEQEQTKQILVEKENMILQMREGQKKEIEIL-TQKLSAKEDSIHILNEEYETKFK 950 

Query: 412 ELEKKLTQVQNSLLK KEKELEKQQCMATELEMTVKEAKQDKSKEAECKALQAEVQ 466 

EKK+ +V+ + K+K L+++ + ELE T E Q K K+ K L+ Q 

Sbjct: 951 NQEKKMEKVKQKAKEMQETLKKKLLDQEAKLKKELENTALELSQ-KEKQFNAKMLEH-AQ 1008 

Query: 467 KLKNSLEEAKQQERLAAQQAAQCKE£AALA(^HLEDTQRKI.QKGLLLDKQKADTIQEl.QR S26 

+ +A RL Q Q + + L D +K L Q+A+ +QE+ 
Sbjct: 1009 ANSAGISDAVS--RLETNQKEQIESLTEVHRRELNDVISIWEKKL NQQAEELQEIH- 1062 

Query: 527 ELQMLQKESSMAEKEQT SNRKRV EELSLELSEALRKLENSDKEKRQLQ 574 

E+Q+ +KE +AE +Q K + +E ++ L +L+ K+K 

Sbjct: 1063 EIQLQEKEQEVAELKQKILLFGCEKEEMNKEITWI.KEEGVKQDTTLNELQEQIiKQKSAHV 1122 

Query: 575 KTVAEQDMKMNDMLDRIKHQHREQGSIKCKLEEDLQEATKLLEDKREQLKKSKEHEKLME 634 

++A+ + K+ L++++ + L+E LE LE+++++ K+ 

Sbjct: 1123 NSLAQDETKLKAHLEKLEVDLNKSLKENTFLQEQLVELKMLAEEDKRKVSELTSKLKTTD 1182 

Query: 635 GELEALRQEFKKKDKTLKENSRKLEEENENLRAELQCCSTQLESSLNKYNTSQQVIQDLN 694 

E ++L+ +K +K+L++ S + ++ +E L +L C + E+ L T++ + + 
Sbjct: 1183 EEFQSLKSSHEKSNKSLEDKSLEFKKLSEELAIQLDICCKKTEALLEA-KTNELINISSS 1241 

Query: 695 KEIALQKESLMSLQAQLDKALQKEKHYLQTTITKEAYDALSRKSAACQDDLT QALE 750 

K A+ + Q + K KE ++T E +A R+ Q+ L QA 
Sbjct: 1242 KTNAILSR-ISHCQHRTTKV--KEALLIKTCTVSEL-EAQLRQLTEEQNTLNISFQQATH 1297 

Query: 751 KLNHVTSETKSLQQSLTQTQEKKAQLEEEIIAYEERMKKLN TELRK— LRGFHQESE 805 

+L ++ KS++ + +K L++E ++ + T+L+K + + 

Sbjct: 1298 QLEEKENQIKSMKADIESLVTEKEALQKEGGNQQQAASEKESCITQLKKELSENINAVTL 1357 

Query: 806 LEVHAFDKKLE — EMSCQVLQWQKQHQNDLKMLAAKEEQLREFQEEMAALKENLLEDDKE 863 

++ +KK+E +S Q+ Q QN + L+ KE + +++ K LL D + 

Sbjct: 1358 MKEELKEKKVEISSLSKQLTDLNVQLQNSIS-LSEKEAAISSLRKQYDEEKCELL-DQVQ 1415 

Query: 864 PCCLPQWSVPKDTCRLYRGNDQIMTNLEQWAKQQKVANEKLGNQLRE— QVNYIAKLSG 920 

++ K+ D +W K+ + + N ++E Q+ +K + 

Sbjct: 1416 DLSFKVDTLSKEKISALEQVDDWSNKFSEWKKKAQSRFTQHQNTVKELQIQLELKSKEAY 1475 

Query: 921 EKDH-LHSVMVHLQQENKK LKKEIEEKKMKAE 951 

EKD ++ + L Q+NK+ LK E+E+ K K E 
Sbjct: 1476 EKDEQINLLKEELDQQNKRFDCLKGEMEDDKSKME 1510 

Score = 332 (49.8 bits). Expect = 1.4e-25, P = 1.4e-25 
Identities » 209/953 (21%), Positives - 438/953 (45%) 
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Query: 1 MKDEAGERDREVSSLNSKLLSLQLDIKNLHDVCKRQRKTLQDNQLCMEEAMNS SHD 56 

MK + E+ ++ L+ K L+ + + + + R+R+ + ++ +E++ + S + 

Sbjct: 470 MKKSSEEQIAKLQKLHEKELARK-EQELTKKLQTREREFQEQMKVALEKSQSEYLKISQE 528 

Query: 57 KKQAQALAFEESEVEFGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELQTSYYSLRQYQS 116 

K+Q ++LA EE E++ K+ L + + KL LQQE E + SL + 

Sbjct: 529 KEQQESLALEELELQ KKAILTESEN— KLRDLQQEAETYRTRILELESSLEKSLQ 581 

Query: 117 ILEKQTSDLVLLHHHCKLKEDE— VILYEE EMGNHNEKT— GEKLHLAQEQLALA 167 

+ 0+ DL + K K ++ ++ E+ E H ++ EKL + ++Q 

Sbjct: 582 ENKNQSKDLAVHLEAEKNKHNKEITVMVEKHKTELESLKHOQDALWTEKLQVLKQQYQTE 641 

Query: 168 GDKIASL--ERSLNLYRDK YQSSLS--NIELLECQVKMLQGELGGIMGQEPENKGDH 220 

+K+ + L +DK +Q+ + N + LE ++ + Q EL + + E 

Sbjct: 642 MEKLREKCEQEKETLLKDKEIIFQAHIEEMNEKTLE-KLDVKQTELESLSSELSEVLKAR 700 

Query: 221 SKVRIYTSPCMIQEHQETQKRLSEVWQKVSQQDDLIQELRNKLACSNALVLEREKALIKL 280 

K+ S ++++ +T K E+ K+ +Q + Q+ + + + + ++R + +K 
Sbjct: 701 HKLEEELS— VLKD--QTDKMKQELEAKMDEQKNHHQQQVDSIIKEHEVSIQRTEKALKD 756 

Query: 281 QADFASCTATHR—YPPSSSEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEA 338 

Q + R + E+++ +K + + ++ +Q+ + +A 

Sbjct: 757 QINQLELLLKERDKHLKEHQAHVENLEADIKRSEGELQQASAKLDVFQSYQSATHEQTKA 816 

Query: 339 VSEQKRNIMKD^4MKLELDLHGLREETSAHIERKDKDITILQCRLQELQLE^^ETQKLTLK 398 

EQ + + ++ LE + L ++ A +E + KD+ C EL + Q L + 

Sbjct: . 817 YEEQLAQLQQKLLDLETERILLTKQV-AEVEAQKKDV CT — ELDAHKIQVQDLMQQ 869 

Query: 399 KDKFLQEKDEMLQELEKKLTQVQNSLLKK-EKELEKQQCMATELEMTVKEAKQDKSKEAE 457 
+K + EM Q++ K LTQV S L+ KE E+ + + EE + + ++ + KE E 

Sbjct: 870 LEK QNSEMEQKV-KSLTQVYESKLEDGNKEQEQTKQILVEKENMILQMREGQKKEIE 925 

Query: 458 C— KALQAEVQKLKNSLEEAKQQERLAAQQAAQCKEEAALAGCHLEDTQRK—LQKGLLL 513 

+ L A+ + EE + + + ++ + K++A +++T +K L + L 

Sbjct: 926 ILTQKLSAKEDSIHILNEEYETKFKNQEKKMEKVKQKAK EMQETLKKKLLDQEAKL 981 

Query: 514 DKQKADTIQEL-QRELQMLQKESSMAEKBQTSNRKRVEELSLELSEALRKLENSDKEKRQ 572 

K+ +T EL Q+E Q K MA+ V L E + L ++ +R+ 

Sbjct: 982 KKELENTALELSQKEKQFNAKMLEMAQANSAGISDAVSRLETNQKEQIESL— TEVHRRE 1039 

Query: 573 LQKTVAEQDMKMNDMLDRIKHQHREQGSIKCKLEEDLQEATKLLEDKREQLKKS KE 628 

L ++ + K+N + ++ H Q K + +L++ L ++E++ K KE 
Sbjct: 1040 LNDVISIWEKKLNQQAEELQEIHEIQLQEKEQEVAELKQKILLFGCEKEEMNKEITWLKE 1099 

Query: 629 HEKLMEGELEALRQEFKKKDKTLKENSRKLEEENENLRAELQCCSTQLESSLNKYNTSQQ 688 

+ L L+++ K+K + NS L ++ L+A L+ L SL + Q+ 

Sbjct: 1100 EGVKQDTTLNELQEQLKQKSAHV— NS— LAQDETKLKAHLEKLEVDLNKSLKENTFLQE 1155 

Query: 689 VIQDLNKEIALQKESLMSLQAQL DKALQ— KEKHYLQTTITKEA YDALSRKSAA 740 

+ +L K + L ++L D+QKH ++ +LS+A 

Sbjct: 1156 QLVELKMLAEEDKRKVSELTSKLKTTDEEFQSLKSSHEKSNKSLEDKSLEFKKLSEE-LA 1214 

Query: 741 CQDDL TQAL EKLNHVTSETKSLQQSLTQTQEKKAQLEEEIIAYEERMKKL 790 

Q D+ T+AL E +N +S+T ++ ++ Q + +++E ++ + +L 

Sbjct: 1215 IQLDICCKKTEALLEAKTNELINISSSKTNAILSRISHCQHRTTKVKBALLIKTCTVSEL 1274 

Query: 791 NTELRKLRGFHQESELEVHAFDKKLEEMSCQVLQWQKQHQNDLKMLAAKEEQLREFQEEM 850 

+LR+L + +LEE Q+ K + D++ L ++E L Q+E 
Sbjct: 1275 EAQLRQLTEEQNTLNISFQQATHQLEEKENQI KSMKADIESLVTEKEAL QKEG 1327 

Query: 851 AALKENLLEDDKEPCCLPQWSVPKDTCRLYRGNDQIMTNLEQWAKQQKVANEKLGNQLRE 910 

+ +KE C + Q + K+ N +T +++ K++KV L QL + 

Sbjct: 1328 G— NQQQAASEKESC-ITQ— -LKKELSE NINAVTLMKEELKEKKVEISSLSKQLTD 1378 

Query: 911 QVNYIAKLSGEKDHLHSVMVHLQQENKKLKKEIEEKKMKAE 951 

Q+ LS ++ + S+ +E +L ++++ K + 

Sbjct: 1379 LNVQLQNSISLSEKEAAISSLRKQYDEEKCELLDQVQDLSFKVD 1422 

Score = 329 (49.4 bits). Expect = 2.9e-25, P * 2.9e-25 
Identities « 226/941 (24%), Positives = 444/941 (47%) 

Query: 61 QALAFEESEVE — FGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELQTSYYSLRQYQSIL 118 

Q L E+ +++ S+ LR++ +L+++L + QQ + EE S QY S+L 

Sbjct: 165 QMLQREKKKLQGILSQSQDKSLRRIAELREELQMDQQAKKHLQEEFDASLEEKDQYISVL 224 

Query: 119 EKQTSDLVLLHHHCKLKEDEV ILYEEEMGNHNENT GEKL HLAQEQLALA 167 

+ QSL + + D+ ++E+ EN GE+ + + L 

Sbjct: 225 QTQVSLLKQRLRNGPMNVDVLKPLPQLEPQAEVFTKEENPESDGEPVVEDGTSVKTLETL 284 

Query: 168 GDKIASLERSLNLYRDKYQSSLSNIELLECQVKMLQGELGGIMGQEPENKGDHSKVRIYT 227 
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+ + EL ++ QS LL + + LQ +L + QE E D + + 
Sbjct: 285 QQRVKRQENLLKRCKETIQSHKEQCTLLTSEKEALQEQLDERL-QELEKIKD LHMAE 340 

Query: 228 SPCMIQEHQETQKRLSEVWQKVSQQDDLIQELRNKLACSNALVLEREKALIKLQADFASC 287 

+1 +++++++ Q +1 E + ++ L ++ E+ + +L++ 

Sbjct: 341 KTKLITQLRDAKNLIEQLEQDKGM VIAETKRQM--HETLEMKEEE-IAQLRSRIKQM 394 

Query: 288 TATH RYPPSSSEEC--EDIKKILKHLQEQKDSQCLHVEEYQNLVKDL RVE 335 

T R SE E+++K L Q+ ++++ E +K + R+ 

Sbjct: 395 TTQGEELREQKEKSERAAFEELEKALSTAQKTEEARRKLKAEMDEQIKTIEKTSEEERIS 454 

Query: 336 LEA-VSEQKRNIMKDMMKL— ELDLHGLREETSAHIERKDKDITILQCRLQELQLEFTET 392 

L+ +S K+ ++ D+MK E + L++ + RK++++T +LQ + EF E 
Sbjct: 455 LQQELSRVKQEW-DVMKKSSEEQIAKLQKLHEKELARKEQELTK KLQTREREFQEQ 510 

Query: 393 QKLTLKKDKFLQEKDEMLQELEKKLTQVQNSLLKKEKELEKQQCMATELEMTVKEAKQDK 452 

K+ L+K + E ++ QE E+ Q SL +E EL+K+ + TE E +++ +Q+ 

Sbjct: 511 MKVALEKSQ— SEYLKISQEKEQ QESLALEELELQKKAIL-TESENKLRDLQQE- 561 

Query: 453 SKEAECKALQAEVQKLKNSLEEAKQQER LAAQQAAQCKEEAALAGCHLEDTQR-K 506 

++ + L+ E L+ SL+E K Q + L A++ KE + H + + K 

Sbjct: 562 AETYRTRILELE-SSLEKSLQENKNQSKDLAVHLEAEKNKHNKEITVMVEKHKTELESLK 620 

Query: 507 LQKGLLLDKQKADTIQELQRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRK-LEN 565 

Q+ L ++ Q+ Q E++ L +E EKE K + + E K LE 

Sb j Ct : 62 1 HQQDALWTEKLQVLKQQYQTEMEKL-REKCEQEKETLLKDKEI I -FQAHIEEMNEKTLEK 678 

Query: 566 SDKEKRQLQKTVAEQDMKMNDMLDRIKHQHREQGSI-KCKLEEDLQEA-TKLLEDKR--E 621 ~ 

D ++ +L+ +E ++++L + +H+ E+ S+ K + ++ QE K+ E K + 
Sbjct: 679 LDVKQTELESLSSE LSEVL-KARHKLEEELSVLKDQTDKMKQELEAKMDEQKNHHQ 733 

Query: 622 QLKKS—KEHEKLMEGELEALRQEFKKKDKTLKENSRKLEEEN ENLRAELQCCSTQL 676 

Q S KEHE ++ +AL+ + + + LKE + L+E ENL A+++ +L 

Sbjct: 734 QQVDSIIKEHEVSIQRTEKALKDQINQLELLLKERDKHLKEHQAHVENLEADIKRSEGEL 793 

Query: 677 ESSLNKYNTSQQVIQDLNKEIALQKESLMSLQAQLDECALQKEKHYLQTTITKEAYDALSR 736 

+ + K + Q ++■»- +E L LQ +L 1,+ E+ L TK+ + ++ 
Sbjct: 794 QQASAKLDVFQSYQSATHEQTKAYEEQLAQLQQKL-LDLETERILL TKQVAEVEAQ 848 

Query: ' 737 KSAACQD DLTQALEKLNHVTSETKSLQQSLTQTQEKKAQ--LEEEIIAYEE 785 

K C + DL Q LEK N SE + +SLTQ E K + ♦£+ + 
Sb j ct : 849 KKDVCTELDAHK IQVQDLMQQLEKQN SEMEQKVKSLTQVYESKLEDGNKEQEQTKQI 905 

Que r y : 786 RMKKLNTELRKLRGFHQESELEVHAFDKKLEEMSCQVL — QWQKQHQNDLKMLAAKEEQL 843 

++K N L+ G Q+ E+E+ +E S +L +++ + +N K + +++ 

Sbjct: 906 LVEKENMILQMREG--QKKEIEILTQKLSAKEDSIHILNEEYETKFKNQEKKMEKVKQKA 963 

Query: 844 REFQEEMAALKENLLEDDKEPCCLPQWSVPKDTCRLYRGNDQIMTNLEQWAKQQKV 899 

+E QE LK+ LL+ + + L++ L+ Q ++A+ 

Sbjct: 964 KEMQE TLKKKLLDQEAK LKK-ELENTALELSQKEKQFNAKMLEMAQANSAGISD 1016 

Query: 900 ANEKLGNQLREQVNYIAKLSG-EKDHLHSVMVH-LQQENKKLKK— EIEEKKMKAENTRL 955 

A +L +EQ+ + ++ E + + S+ L Q+ ++L++ EI+ ++ + E L 
Sbjct: 1017 AVSRLETNQKEQIESLTEVHRRELNDVISIWEKKLNQQAEELQEIHBIQLQEKEQEVAEL 1076 

Query: 956 CTKALGPSRTESTQREKVCGTLGWKGLPQD 985 

K L E + K L +G+ QD 

Sbjct: 1077 KQKIL-LFGCEKEEMNKEITWLKEEGVKQD 1105 

Score - 326 (48.9 bits). Expect - 6.0e-25, P » 6.0e-25 
Identities = 220/907 (24%), Positives = 444/907 (48%) 

Query: 67 ESEVEFGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELQTSYYSLRQYQSILE KQTS 123 

E+E G+S + QL Q +++ EL T+Y L++ + L+ Q+ 

Sbjct: 123 EAEDLVGNSDSLNKEQLIQRLRRMERSLSSYRGKYSELVTAYQMLQREKKKLQGILSQSQ 182 

Query: 124 DLVLLHHHCKLKEDEVILYEEEMGNHNENTGEKLHLAQEQLALAGDKIASLERSLNLYRD 183 

D L +L+E+ + +++ H + E+ + E+ 1+ L+ ++L + 

Sbjct: 183 DKSL-RRIAELREE— LQMDQQAKKHLQ EEFDASLEE KDQYISVLQTQVSLLKQ 233 

Query: 184 KYQSSLSNIELLECQVKMLQGELGGIMGQE-PENKG DHSKVR-IYTSPCMIQEHQ 236 

+ ++ N+++L+ + L+ + +E PE+ G D + V+ + T ++ + 

Sbjct: 234 RLRNGPMNVDVLK-PLPQLEPQAEVFTKEENPESDGEPWEDGTSVKTLETLQQRVKRQE 292 

Query: 237 ETQKRLSEVWQKVSQQDDLIQELRNKLACSNALVLEREKALIKLQADFASCTATHRYPPS 296 

KR E Q +Q L+ K A L ER + L K++ D T 

Sbjct: 293 NLLKRCKETIQSHKEQCTLLTS— EKEALQEQLD-ERLQELBKIK-DLHMAEKTKLIT— 346 

Query: 297 SSEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEAVSEQKRNIMKDMMKLELD 356 

+ D K +++ L++ K + E + + + L++E++QR++KM + 
Sbjct: 347 QLRDAKNLIEQLEQDKGM— VIAETKRQMHETLEMKEEEIA-QLRSRIKQMTTQGEE 400 


669 


wo 01/12659 


PCT/rBOO/01496 


Query: 357 LHGLREETS-AHIERKDKDITILQCRLQE LQLEFTETQKLTLKKDKFLQEKDEMLQ 411 

L +E++ A E +K ++ Q + +E L+ E E K T++K +E+ + Q 

Sbjct: 401 LREQKEKSERAAFEELEKALSTAQ-KTEEARRKLKAEMDEQIK-TIEKTSE-EERISLQQ 457 

Query: 412 ELEKKLTQVQNSLLKK-EKELEKQQCMATELEMTVKEAKQDKSKEAECKALQAEVQKLKN 470 

EL + +V + + K E+++ K Q + E E+ KE Q+ +K+ + + + + Q +K 
Sbjct: 458 ELSRVKQEWDVMKKSSEEQIAKLQKLH-EKELARKE— QELTKKLQTREREFQEQ-MKV 513 

Query: 471 SLEEAKQQERLAAQQAAQCKEEAALAGCHLEDTQRKLQ-KGLLLD-KQKADTIQELQREL 528 

+LE++ QEL Q++EAL L+ ++LD +Q+A+T + EL 

Sbjct: 514 ALEKS-QSEYLKISQEKEQQESLALEELELQKECAILTESENKLRDLQQEAETYRTRILEL 572 

Query: 529 QMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLENS-DKEKRQLQKTVAEQDMKMNDM 587 

+ ES+E+S VLE++ +++ +K K +L+ +QD + 
Sbjct: 573 ES-SLEKSLQENKNQSKDLAVH-LEAEKNKHNKEITVMVEKHKTELESLKHQQDALWTEK 630 

Query: 588 LDRIKHQHR-EQGSIKCKLEEDLQEATKLLEDKRE— QLKKSKEHEKLMEGELEALRQEF 644 

L +K Q++ E ++ K E QE LL+DK Q + +EK +E +L+ + E 
Sbjct: 631 LQVLKQQYQTEMEKLREKCE QEKETLLKDKEIIFQAHIEEMNEKTLE-KLDVKQTEL 686 

Query: 645 KKKDKTLKE— NSR-KLEEENENLRAELQCCSTQLESSLNKY-NTSQQVIQDLNKE— lA 698 

+ L E +R KLEEE L+ + +LE+ +++ N QQ + + KE ++ 

Sbjct: 687 ESLSSELSEVLKARHKLEEELSVLKDQTDKMKQELEAKMDEQKNHHQQQVDSIIKEHEVS 746 

Query: .699 LQK-ESLMSLQA-QLDKAL-QKEKHYLQTTITKEAYDALSRKS AACQDDLTQAL 749 

+Q+ E + Q QL+ L +++KH + E +A ++S A+ + D+ Q+ 

Sbjct: 747 IQRTEKALKDQINQLELLLKERDKHLKEHQAHVENLEADIKRSEGELQQASAKLDVFQSY 806 

Query: 750 EKLNHVTSETKSLQQSLTQTQEKKAQLEEEIIAYEERMKKLNTELRKLRGFHQESELEVH 809 

+ H +TK+ ++ L Q Q+K LE E I +++ ++ + + + +++V 
Sbjct: 807 QSATH—EQTKAYEEQLAQLQQKLLDIiETERILLTKQVAEVEAQKKDVCTELDAHKIQVQ 864 

Query: 810 AFDKKLEEMSCQVLQWQKQHQN— DLKMLAAKEEQLREFQEEMAALKENLL EDDKE 863 

++LE+ + ++ Q K + K+ +EQ E +++ KEN++ E K+ 

Sbjct: 865 DLMQQLEKQNSEMEQKVKSLTQVYESKLEDGNKEQ— EQTKQILVEKENMILQMREGQKK 922 

Query: 864 PC-CLPQ-WSVPKDTCRLYRGNDQIMTNLE-QWAKQQKVANE— KLGNQLREQV-NYIAK 917 

L Q S +D+ + N++ T + Q K +KV + ++ L++++ + AK 
Sbjct: 923 EIEILTQKLSAKEDSIHIL— NEEYETKFKNQEKKMEKVKQKAKEMQETLKKKLLDQEAK 980 

Query: 918 LSGEKDHLHSVMVHLQQENKKLKKEIEEKKMKAENTRLCTKALGPSRTESTQREKV 973 
L K L + + L Q+ K+ ++ E M N+ + A+ SR E+ Q+E++ 

Sbjct: 981 L KKELENTALELSQKEKQFNAKMLE—MAQANSAGISDAV— SRLETNQKEQI 1029 

Score = 318 (47.7 bits). Expect = 4,4e-24, P = 4.4e-24 
Identities * 184/827 (22%), Positives = 405/827 (48%) 

Query: 1 MKDEAGERDREVSSLNSKLLSLQLDIKNLHDVCKRQRKTLQDNQLCMEEAMNSSHDKK-Q 59 

++ E G + + S S + L+ ++ + ++ L++ ++ + D Q 

Sbjct: 1323 LQKEGGNQQQAASEKESCITQLKKELSENINAVTLMKEELKEKKVEISSLSKQLTDLNVQ 1382 

Query: 60 AQ-ALAFEESEVEFGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELQTSYYS-LRQYQS- 116 

Q +++ EE S + +Q + K +LL Q+L F + L S L Q 

Sbjct: 1383 LQNSISLSEKEAAISSLR KQYDEEKCELLDQVQDLSFKVDTLSKEKISALEQVDDW 1438 

Query: 117 ILE-KQTSDLVLLHHHCKLKEDEVILYEEEMGNHNENTGEKLHLAQEQLALAGDKIA 172 

E K+ + H +KE ++ L + + ++ E+++L +E+L + 

Sbjct: 1439 SNKFSEWKKKAQSRFTQHQNTVKELQIQLELKSKEAYEKD— EQINLLKEELDQQNKRFD 1496 

Query: 173 SLERSLNLYRDKYQSSLSNIEL-LECQVKMLQGELGGIMGQEP-ENKGDHSKVRIYTSPC 230 

L+ + + K + SN+E L+ Q + EL + Q+ E + + ++ Y 
Sbjct: 1497 CLKGEMEDDKSKMEKKESNLETELKSQTARIM-ELEDHITQKTIEIESLNEVLKNYNQQK 1555 

Query: 231 MIQEHQETQKRLSEVWQKVSQQDDLIQELRNKLACSNALVLEREKALIKLQADFASCTAT 290 

I EH+E ++L + ++D+ ++E K+ L LE + +K + + 

Sbjct: 1556 DI-EHKELVQKLQHFQELGEEKDNRVKEAEEKI LTLENQVYSMKAELETKKKELE 1609 

Query: 291 HRYPPSSSEECEDIKKILKHLQEQKDSQCLHVE-EYQNLVKDLRVELEAVSEQKRNIMKD 349 

H S+E E++K + L+ + ++ ++ + + + ++ +L + E+K ++ 
Sbjct: 1610 HVNLSVKSKE-EELKALEDRLESESAAKLAELKRKAEQKIAAIKKQLLSQMEEK EE 1664 

Query: 350 MMKLELDLHGLREETSAHIERKDKDITILQCRLQELQLEFTETQKL— TLKKDKFLQEKD 407 

K + H E + ++ +++++ IL+ +L+ ++ +ET + + K E++ 
Sbjct: 1665 QYKKGTESH—LSELNTKLQEREREVHILEEKLKSVESSQSETLIVPRSAKNVAAYTEQE 1722 

Query: 408 EM LQEL-EKKLTQVQNSLLKKEKEL EKQQCMATELEMTVK-EAKQDKSKE 455 

E +Q+ E+K++ +Q +L +KEK L EK++ +++ EM + + + K + 

Sbjct: 1723 EADSQGCVQKTYEEKISVLQRNLTEKEKLLQRVGQEKEETVSSHFEMRCQYQERLIKLEH 1782 

Que r y : 456 AECKAL- -QAEVQKLKNSLEEAKQQERLAAQQAAQCK— EEAALAGCHLEDTQRKLQKGL 511 
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Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 


AE K Q+ + L+ LEE ++ L Q + + + A +LE+ +QK L 
1783 AEAKQHEDQSMIGHLQEELEEKNKKYSLIVAQHVEKEGGKNNIQAKQNLENVFDDVQKTL 1842 

512 LLDKQKADTIQELQRELQMLQKESSMAEKEQTSNRKRVEELS— LELSEALRKLENSDKE 569 
++K T Q L+++ + + L +S + +++ +R +EEL+ E +AL++++ +K 
1843 — QEKELTCQI LEQKI KEL— DSCLVRQKEV-HRVEMEELTS KY EKLQALQQMDGRNKP 1896 

570 KRQLQKTVAEQD— MKMNDMLDRIKHQHREQGSIKCKLEEDLQEATKLLEDKREQLKK- 625 
+ +L ++ QH + E + Q+ K + ++ L+ 

1897 TELLEENTEEKSKSHLVQPKLLSNMEAQHNDLEFKLAGAEREKQKLGKEIVRLQKDLR^4L 1956 

626 SKEHEKLMEGELEALRQEFKKKDKTLKENSRKLEEENENLRAELQCCSTQLESSLNKYMT 685 
KEH++ ELE L++E+ + E K+++E E+L EL+ ST L+ + ++NT 

1957 RKEHQQ ELEILKKEYDQ EREEKIKQEQEDL— ELKHNST-LKQLMREFNT 2003 

686 S-QQVIQDLNKEIALQKESLMSLQAQLDKALQKEKHYLQTTITKEAYDALSRKSAACQDD 744 
Q Q+L I ++A+L ++ Q+E + L lEDLR+A ++ 

2004 QLAQKEQELEMTIKETINKAQEVEAELLESHQEETNQLLKKIA-EKDDDLKR-TAKRYEE 2061 

745 LTQALEKLNHVTSETKSLQQSLTQTQEKKAQ-LEEEIIAYEERMK— KLNTELRKLRGFH 801 
+ A E+ +T++ + LQ L + Q+K Q LE+E + + +L T+L + 

2062 ILDAREE— EMTAKVRDLQTQLEELQKKYQQKLEQEENPGNDNVTIMELQTQLAQKTTLI 2119 

802 QESELEVHAFDKKLEEMSCQVLQWQK 827 
+S+L+ F +++ + ++ +++K 
2120 SDSKLKEQEFREQIHNLEDRLKKYEK 2145 


Score - 316 (47.4 bits). Expect = 7.1e-24, p = 7.1e-24 
Identities = 213/977 (21%), Positives = 454/977 (46%) 

Query: 4 EAGERD-REVSSLNSKLLSLQLD-IKNLHDVCKRQRKTLQDNQLCMEEAMNSSHDKKQAQ 61 

E R+ +V S+ K L+ Q + ++ +H++ + QK + +L + + ++ + 

Sbjct: 1034 EVHRRELNDVISIWEKKLNQQAEELQEIHEI-QLQEKEQEVAELKQKILLFGCEKEEMHK 1092 

Query: 62 ALAFEESEVEFGSSKQCHLRQLQ-QLKKKLL VLQQE— LEFHTEELQTSYYSLRQY 114 

+ + + E G + L +LQ QLK+K + Q E L+ H E+L+ + 

Sbjct: 1093 EITWLKEE— -GVKQDTTLNELQEQLKQKSAHVNSLAQDETKLKAHLEKLEVDLNKSLKE 1149 

Query: 115 QSILEKQTSDLVLLHHHCKLKEDEV- — ILYEEEMGNHNENTGEKLHLAQEQLALAGDKI 171 

+ L++Q +L +L K K E+ + +E +++ EK + + E +L K+ 

Sbjct: 1150 NTFLQEQLVELKMLAEEDKRKVSELTSKLKTTDEEFQSLKSSHEKSNKSLEDKSLEFKKL 1209 

Query: 172 AS-LERSLNLYRDKYQSSLS~NIELLECQVKMLQGELGGIMGQEPENKGDHSKVRIYTS 228 

+ L K ++ L EL+ LI +++ K + 

Sbjct: 1210 SEELAIQLDICCKKTEALLEAKTNELINISSSKTNAILSRI— SHCQHRTTKVKEALLIK 1267 

Query: 229 PCMIQEHQ ETQKRLSEVWQKVSQQ-DDLIQELRNKLACSNALVLEREKALIKL 280 

C + E + E Q L+ +Q+ + Q ++ ++++ A +LV E+E L 
Sbjct: 1268 TCTVSELEAQLRQLTEEQNTLNISFQQATHQLEEKENQIKSMKADIESLVTEKEA L 1323 

Query: 281 QADFASCTATHRYPPSSSEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEAVS 340 

Q + + + S E C I K L E ++ L EE +K+ +VE+ ++S 
Sbjct: 1324 QKEGGN QQQAASEKESC — ITQLKKELSENINAVTLMKEE LKEKKVEISSLS 1373 

Query: 341 EQKRNIMKDMMKLELDLHGLREETSAHIERKDKDITILQCRLQEL— QLEFTETQKLT-L 397 

+Q ++ + + L S+ ++ D++ L ++Q+L +++ +K++ L 

Sbjct: 1374 KQLTDLNVQLQN-SISLSEKEAAISSLRKQYDEEKCELLDQVQDLSFKVDTLSKEKISAL 1432 

Query: 398 KK-DKFLQEKDEMLQELEKKLTQVQNSLLKKEKELEKQQCMATELEMTV— KEAKQDKS 453 

++ D + + E ++ + + TQ QN++ + + +LE + A E + + KE ++ 
Sbjct: 1433 EQVDDWSNKFSEWKKKAQSRFTQHQNTVKELQIQLELKSKEAYEKDEQINLLKEELDQQN 1492 

Query: 454 KEAECKALQAEVQKLKNSLEEAKQQERLAAQQAAQCKEEAALAGCHLE-DTQRKLQKGLL 512 
K +C + E K K +E+ + L +Q A + E + +E ++ ++ K 

Sbjct: 1493 KRFDCLKGEMEDDKSKMEKKESNLETELKSQTARIMELEDHITQKTIEIESLNEVLKNY- 1551 

Query: 513 LDKQKADTIQELQRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLENSDKEKRQ 572 

++QK +EL ++LQ Q+ + +++ l ++ +LE KE 

Sbjct: 1552 -NQQKDIEHKELVQKLQHFQELGEEKDNRVKEAEEKILTLENQVYSMKAELETKKKELEH 1610 

Query: 573 LQKTVAEQDMKMNDMLDRIKHQHREQ-GSIKCKLEEDLQEATKLL EDKREQLKKSK 627 

+ +V ++ ++ + DR++ + + +K K E+ + K L E+K EQ KK 
Sbjct: 1611 VNLSVKSKEEELKALEDRLESESAAKLAELKRKAEQKIAAIKKQLLSQMEEKEEQYiUCGT 1670 

Query: 628 EHEKLMEGELEALRQEFKKKDKTLKENSRKLEE-ENENL RAELQCCSTQLESSLNK 682 

E EL QE +++ L+E + +E ++E L A+ T+ E + ++ 

Sbjct: 1671 ESHL SELNTKLQEREREVHILEEKLKSVESSQSETLIVPRSAKNVAAYTEQEEADSQ 1727 

Query: 683 —YNTSQQVIQDLNKEIALQKESLMSLQAQLDKALQKEKHYLQTTITKEAYDALSRKSA 739 

T ++ I L + + +KE L4- Q+K H+ +E L A 

Sbjct: 1728 GCVQKTYEEKISVLQRNLT-EKEKLLQRVGQ-EKEETVSSHFEMRCQYQERLIKLEHAEA 1785 
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Query: 740 ACQDDLTQALEKLNHVTSET--KSLQQSLTQTQEKKAQLEEEIIAYEERMKKLNTELRKL 797 

+D Q++ + H+ E K+ + SL Q + + + I ++ ++ + +++K 
Sbjct: 1786 KQHED--QSM--IGHLQEELEEKNKKySLIVAQHVEKEGGKNNIQAKQNLENVFDDVQKT 1841 

Query: 798 RGFHQESELEVHAFDKKLEEM-SCQVLQWQKQHQNDLKMLAAKEEQLREFQEEMAALKEN 856 

QE EL ++K++E+ SC V Q ++ H+ +++ L +K E+L+ Q+ K 

Sbjct: 1842 L— QEKELTCQILEQKIKELDSCLVRQ-KEVHRVEMEELTSKyEKLQALQQMDGRNKPT 1897 

Query: 857 -LLEDDKEPCCLPQWSVPKDTCRLYRGNDQIMTNLEQWAKQQKVANEKLGNQLREQVNYI 915 

LLE++ E pk + ++ + l A+++K +KLG ++ + 

Sbjct: 1898 ELLEENTEEKSKSHLVQPKLLSNMEAQHNDLEFKLAG-AEREK— QKLGKEIVRLQKDL 1953 

Query: 916 AKLSGE-KDHLHSVMVHLQQENK-KLKKEIEEKKMKAENTRLCTKALGPSRTESTQREK 972 

L E + L + QE + K+K+E E+ ++K +T + + T+ Q+E+ 

Sbjct: 1954 RMLRKEHQQELEILKKEYDQEREEKIKQEQEDLELKHNSr — LKQLMREFNTQLAQKEQ 2010 

Score = 301 (45.2 bits), Expect = 2.9e-22, P = 2.9e-22 
Identities ^ 221/952 (23%), Positives « 441/952 (46%) 


Query: 1 MKDEAGERDREVSSLNSKLLSLQLDIKNLHDVCKRQRKTLQDNQL CMEE7VMNSSHD- 56 

+K A E R+VS L SKL + + ++L ++ K+L+D L + E + D 
Sbjct: 1160 LKMLAEEDKRKVSELTSKLKTTDEEFQSLKSSHEKSNKSLEDKSLEFKKLSEELAIQLDI 1219 

Query: 57 --KKQAQALAFEESE-VEFGSSK-QCHLRQLQQLKKKLLVLQQELEFHT EELQTSYY 109 

KK L + +E + SSK L ++ + + +++ L T EL+ 
Sbjct: 1220 CCKKTEALLEAKTNELINISSSKTNAILSRISHCQHRTTKVKEALLIKTCTVSELEAQLR 1279 

Query: 110 SLRQYQSILEKQTSDLVLLHHHCKLKEDEVILYEEEMGNHNENTGEKLHLAQE— QLAL 166 

L + Q+ L H + KE+++ + ++ EK L +E Q 

Sbjct: 1280 QLTEEQNTLNISFQQAT HQLEEKENQIKSMKADI ESLVTEKEALQKEGGNQQQA 1333 

Query: 167 AGDKIASLERSLNLYRDKYQSSLSNIELLECQVKMLQGELGGIMGQEPENKGDHSKVRIY 226 

A +K E + + + +++ + L++ ++K + E+ + Q + V++ 
Sbjct: 1334 ASEK ESCITQLKKELSENINAVTLMKEELKEKKVEISSLSKQLTD LNVQLQ 1384 

Query: 227 TSPCMIQEHQETQKRLSEVWQKVSQQDDLIQELRNKLACSNALVLEREKALIKLQADFAS 286 

S + ++ + ++ + D +Q+L K+ + L E+ AL ++ D+++ 
Sbjct: 1385 NSISLSEKEAAISSLRKQYDEEKCELLDQVQDLSFKV DTLSKEKISALEQVD-DWSN 1440 

Query: 287 CTATHRYPPSS — SEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKD LRVE-LE 337 

+ + S ++ +K++ L E K + +E NL+K+ R + L+ 

Sb j ct : 1441 KFSEWKKKAQSRFTQHQNTVKELQIQL-ELKSKEAYEKDEQINLLKEELDQQNKRFDCLK 1499 

Query: 338 AVSEQKRNIM-KDMMKLELDLHGLRE— ETSAHIERKDKDITILQCRLQEL-QLEFTET 392 

E ++ M K LE +L E HI +K +1 L L+ Q + E 

Sbjct: 1500 GEMEDDKSKMEKKESNLETELKSQTARIMELEDHITQKTIEIESLNEVLKNYNQQKDIEH 1559 

Query: 393 QKLTLKKDKFLQ EKDEMLQELEKKLTQVQNSLLKKEKELEKQQCMATELEMTVKEAK 449 

++L K F + EKD ++E E+K+ ++N + + ELE ++ + ++VK 
Sbjct: 1560 KELVQKLQHFQELGEEKDNRVKEAEEKILTLENQVYSMKAELETKKKELEHVNLSVK 1616 

Query: 4 50 QDKSKEAECKALQAEVQKLKNSLEEAKQQERLAAQQAAQCKEEAALAGCHLEDTQRKLQK 509 

SKE E KAL+ ++ S + + +R A Q+ A K++ +E+ + + +K 

Sbjct: 1617 SKEEELKALEDRLES— ESAAKLAELKRKAEQKIAAIKKQLL SQMEEKEEQYKK 1668 

Query: 510 GLLLDKQKADT-IQELQRELC^ILQKESSMAEKEQTSNRKRVEELSLELSEALRKLENSDK 568 

G + +T +QE +RE+ +L+++ EQ+ + S+ A+E+D 

Sbjct: 1669 GTESHLSELNTKLQEREREVHILEEKLKSVESSQSETL— IVPRSAKNVAAYTEQEEADS 1726 

Query: 569 E KRQLQK-TVAEQDMKMND-MLDRIKHQHREQGSIKCKLEEDLQEATKLLEDKREQ 622 

+ K +K +V ++++ + +L R+ Q +B+ ++ E Q +L+ K E 
Sbjct: 1727 QGCVQKTYEEKISVLQRNLTEKEKLLQRVG-QEKEE-TVSSHFEMRCQYQERLI— KLEH 1782 

Query: 623 LKKSKEHE-KLMEGEL-EALRQEFKKKDKTLKENSRKLEEENENLRAELQCCSTQLESSL 680 

+ +K+HE +MGLEL++KK +++ KE N++A+ LE 
Sbjct: 1783 AE-AKQHEDQSMIGHLQEELEEKNKKYSLIVAQHVEK-EGGKNNIQAK QNLE 1832 

Query: 681 NKYNTSQQVIQDLNKEITILQKESLMSLQAQLDKAL— QKEKHYLQTTITKEAYDALSR-K 737 

H ++ Q+ +Q+ KE+ Q L +LD L QKE H ++ Y+ L + 

Sbjct: 1833 NVFDDVQKTLQE— KELTCQ— ILEQKIKELDSCLVRQKEVHRVEMEELTSKYEKLQALQ 1888 

Query: 738 SAACQDDLTQALEKLNHVTSETKSLQQSLTQTQEKKAQ-LEEEIIAYEERMKKLNTEL— 794 

++ T+ LE+ S++ +Q L E + LE ++ E +KL E+ 

Sbjct: 1889 QMDGRNKPTELLEENTEEKSKSHLVQPKLLSNMEAQHNDLEFKLAGAEREKQKLGKEIVR 1948 

Query: 795 — RKLRGFHQESELEVHAFDKKLEEMSCQVLQWQKQHQNDLKMIAAKEEQLREFQEEMAA 852 

+ LR +E + E+ K+ ++ + ++ Q+Q +LK + ++ +REF ++A 
Sbjct: 1949 LQKDLRMLRKEHQQELEILKKEYDQEREEKIK-QEQEDLELKHNSTLKQLMREFNTQLAQ 2007 

Query: 853 LKENLLEDDKEPCCLPQWSVPKDTCRLYRGNDQIMTNLEQWAKQQKVANEKLGNQLREQV 912 
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++ Ij KE Q V + + Q TN Q K K+A EK + R 
Sbjct: 2008 KEQELEMTIKETINKAQ-EVEAELLESH- QEETN-QLLK--KIA-EKDDDLKRTAK 2057 

Query: 913 NYIAKLSGEKDHLHSVMVHLQQENKKLKKEI^EEKKMKAEN 952 

. ^ ^ ++ + + + LQ + ++L+K+ V+K + EN 

Sbjct: 2058 RYEEILDAREEEMTAKVRDLQTQLEELQKKYQQKLEQEEN 2097 

Score = 300 (45.0 bits). Expect = 3.7e-22, P = 3.7e-22 
Identities = 195/961 (20%), Positives = 435/961 (45%) 

MKDEAGERDREVSSLNSKLLSLQLDIKN— LHDVCKRQRKTLQDNQLCMEEAMNSSHDKK 58 
+KD+ + +N K L +LD+K L + + L+ +EE ++ D+ 

LKDKEIIFQAHIEEMNEKTLE-KLDVKQTELESLSSELSEVLKARHK-LEEELSVLKDQT 714 

QAQALAFEESEVEFGSSKQCHLRQLQQLKKKLLV-LQQELEFHTEELQTSYYSLROYOSI 117 
+E E + K H +Q+ + K+ V +Q+ + +++ L++ ^ 

DKMK QELEAKMDEQKNHHQQQVDSIIKEHEVSIQRTEKALKDQINQLELLLKERDKH 771 

LEKQTSDLVLLHHHCKLKEDEVILYEEEMG-— NHNENTGEKLHLAQEQLALAGDKIASL 174 
L++ + + L K E E+ ++ ++ T E+ +EQLA K+ L 


Query: 

1 

Sbjct: 

657 

Query: 

59 

Sbjct: 

715 

Query: 

118 

Sbjct: 

* f £ 

Query: 

175 

Sbjct: 

832 

Query: 

234 

Sbjct: 

891 

Query: 

290 

Sbjct: 

948 

Query: 

350 

Sbjct: 

1003 1 

Query: 

408 : 

Sbjct: 

] 

1063 1 


+ ++ K + Q + +++++I ++R ++ + +E ++ L ++ + 
KLEDGNKEQEQTKQILVEKENMILQMREGQKKEIEILTQKLSAKEDSIHILNEEYET 947 

THRYPPSSSEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEAVSEQKRNIMKD 349 
+ E +K+ K +QE + L E L K+L +S++++ 

— KFK-NQEKKMEKVKQKAKEMQETLKKKLLDQEA— KLKKELENTALELSQKEKQFHAK 1002 

MMKL-ELDLHGLREETSA-HIERKDKDITILQCRLQELQLEFTETQKLTLKKDKFLQEKD 407 
M+++ + + G+ + S +K++ ++ + +EL + +K ++ + LQE 

MLEMAQANSAGISDAVSRLETNQKEQIESLTEVHRRELNDVISIWEKKLNQQAEELQEIH 1062 

EM-LQELEKKLTQVQNSLLK— -KEKELEKOQCMATE LEMTVKEAKQD-KSKEAEC 458 

E+ LQE E+++ +++ +L +++E+ K+ E + T+ E ++ K K A 

EIQLQEKEQEVAELKQKILLFGCEKEEMNKEITWLKEEGVKQDTTLNELQEQLKQKSAHV 1122 

Query: 459 KALQAEVQKLKNSLEEAKQQERLAAQQAAQCKEEAALAGCHLEDTQRKLQKGLLLDKQKA 518 

+h + KLK LE+ + + ++ +E+ E+ +RK+ + L K K 

Sbjct: 1123 NSLAQDETKLKAHLEKLEVDLNKSLKENTFLQEQLVELKMLAEEDKRKVSE-LTSKLKT 1180 

Query: 519 DTIQELQRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLENSDKEKRQLQKTVA 578 

^ Q +K + E + +K EEL+++L +K E + K + + 

Sbjct: 1181 -TDEEFQSLKSSHEKSNKSLEDKSLEFKKLSEELAIQLDICCKKTEALLEAKTN— ELIM 1237 

Query: 579 EQDMKMNDMLDRIKH-QHREQGSIKCKLEEDLQEATKLLEDKREQLKKSKEHEKLMEGEL 637 

K N +L RI H QHR K++E L T + + QL++ E + + 

Sbjct: 1238 ISSSKTNAILSRISHCQHRTT KVKEALLIKTCTVSELEAQLRQLTEEQNTLNISF 1292 

Query: 638 EALRQEFKKKD KTLKENSRKLEEENENLR AELQCCSTQLESSL 680 

+ + ++K+ K++K + L E E L+ +E + C TOL+ L 

Sbjct: 1293 QQATHQLEEKENQIKSMKADIESLVTEKEALQKEGGNQQQAASEKESCITQLKKELSENI 1352 

Query: 681 NKYNTSQQVIQDLNKEIALQKESLMSLQAQLDKALQ-KEKHYLQTTITKEAYDALSRKSA 739 

N ++ +++ EI+ + L L QL ++ EK +++ K+ YD + 

Sb3ct: 1353 NAVTLMKEELKEKKVEISSLSKQLTDLNVQLQNSISLSEKEAAISSLRKQ-YDEEKCELL 1411 

Query: 740 ACQDDLTQALEKLN-HVTSETKSLQQSLTQTQEKKAQLEEEIIAYEERMKKLNTELR-KL 797 

DL+ ++ L+ S + + + E K + + ++ +K+L +L K 

Sbjct: 1412 DQVQDLSFKVDTLSKEKISALEQVDDWSNKFSEWKKKAQSRFTQHQNTVKELQIQLELKS 1471 

Query: 798 RGFHQESELEVHAFDKKLEEMSCQVLQWQKQHQNDLKMLAAKEEQLR-EFQEEMAALKEN 856 

+ +++ E +++ ++L++ + + + + ++D + KE L E++A+E 
Sbjct: 1472 KEAYEKDE-QINLLKEELDQQNKRFDCLKGEMEDDKSKMEKKESNLETELKSQTARIME- 1529 

Query: 857 LLEDDKEPCCLPQWSVPKDTCRLYRGNDQIMTNLEQWAKQQKVANEKLGNQLREQVNYIA 916 

i'^Xu + + T + N+ ++ N Q QK K +L +++ + 
Sbjct. 1530 -LEDH ITQKTIEIESLKE-VLKNYNQ QKDIEHK ELVQKLQHFQ 1570 

Query: 917 KLSGEKDH LHSVMVHLQQENKKLKKEIEEKKMKAENTRLCTKA 959 

ew . ^^^'^ + E+E KK + E+ L K+ 

Sbjct: 1571 ELGEEKDNRVKEAEEKILTLENQVYSMKAELETKKKELEHVNLSVKS 1617 

Score = 298 (44.7 bits), Expect - 6.1e-22, P « 6 le-22 
Identrties * 207/886 (23%), Positives = 412/886 (46%) 
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Query: 47 MEEAMNSSHDKKQAQALAFEESEVEFGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELQT 106 

+ EN++ Q EEE+SK ++L + LQ+E + 

Sbjct: 1281 LTEEQNTLNISFQQATHQLEEKENQIKSMKA DIESLVTEKEALQKEGGNCXJQAASE 1336 

Query: 107 SYYSLRQYQSILEKQTSDLVLLHHHCKLKEDEVILYEEEMGNHNENTGEKLHLAQEQLAL 166 

+ Q + L + + + L+ K K+ E+ +++ + N + L++++ A 

Sbjct: 1337 KESCITQLKKELSENINAVTLMKEELKEKKVEISSLSKQLTDLMVQLQNSISLSEKEAA- 1395 

Query: 167 AGDKIASLERSLNLYRDKYQSSLSNIELLECQVKMLQGELGGIMGQEPENKGDHSKVRIY 226 

I+SL + Y ++ L ++ L +V L E + Q + S+ + 

Sbjct: 1396 ISSLRKQ YDEEKCELLDQVQDLSFKVDTLSKEKISALEQVDDWSNKFSEWK-K 1447 

Query: 227 TSPCMIQEHQETQKRLS EVWQKVSQQDDLIQEL--RNK-LACSNALVLE 272 

+ +HQ T K L E ++K Q + L +EL +NK C + + 

Sbjct: 1448 KAQSRFTQHQNTVKELQIQLELKSKEAYEKDEQINLLKEELDQQNKRFDCLKGEMEDDKS 1507 

Query: 273 -REKALIKLQADFASCTAT HRYPPSSSEECEDIKKILKHLQEQKDSQCLHVEEYQN 327 

EK L+ + S TA + + E E + ++LK+ +QKD E++ 
Sbjct: 1508 KMEKKESNLETELKSQTARIMELEDHITQKTIEIESLNEVLKNYNQQKDI EHKE 1561 

Query: 328 LVKDLRVELEAVSEQKRNIMKDMMKLELDLHGLREETSAHIERKDKDI— TILQCRLQEL 385 

LV+ L+ + + E+K N +K+ + L L A +E K K++ L + +E 

Sbjct: 1562 LVQKLQ-HFQELGEEKDNRVKEAEEKILTLENQVYSMKAELETKBCKELEHVNLSVKSKEE 1620 

Query: 386 QLEFTETQKLTLKKDKFLQEKDEMLQELEKKLTQVQNSLLKKEKELEKQQCMATELEMTV 445 

+L+ E + L+ + E+ ++ E+K+ ++ LL + +E E+Q TE ++ 

Sbjct: 1621 ELKALEDR LESES-AAKLAELKRKAEQKIAAIKKQLLSQMEEKEEQYKKGTESHLSE 1676 

Query: 446 KEAKQDKSKEAECKALQAEVQKLKNSLEEAKQQERLAAQQAAQCK-EEAALAGCHLEDTQ 504 

K + +E E L+ +++ +++S E R A AA + EEA GC + + 

Sbjct: 1677 LNTKLQE-REREVHILEEKLKSVESSQSETLIVPRSAKNVAAYTEQEEADSQGCVQKTYE 1735 

Query: 505 RKLQKGLLLDKQKADTIQELQRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLE 564 

K+ +L + + + LQR Q +KE +++ + R + +E ++L A K 
Sbjct: 1736 EKIS VLQRNLTEKEKLLQRVGQ— BKEETVSSHFEM— RCQYQERLIKLEHAEAKQH 1788 

Query: 565 NSDKEKRQLQKTVAEQDMKMNDMLDRIKHQHREQG— SIKCK — LE EDLQ E 611 

LQ+ + E++ K + ++ +H +E G +1+ K LE +D+Q E 

Sbjct: 1789 EDQSMIGHLQEELEEKNKKYSLIV— AQHVEKEGGKNNIQAKQNLENVFDDVQKTLQEKE 1846 

Query: 612 AT-KLLEDKREQLKKSKEHEKLMEG-ELEALRQEFKKKDKTLKENSR KLEEENENL 665 

T ++LE K ++L +K + E+E L +++K + + R +L EEN 

Sbjct: 1847 LTCQILEQKIKELDSCLVRQKEVHRVEMEELTSKYEKLQALQQMEXSRNKPTELLEENTEE 1906 

Query: 666 RAELQCCSTQLESSLN-KYNTSQQVIQDLNKEIALQKESLMSLQAQLDKALQKEKHYLQT 724 

+++ +L S++ ++N + + +E + ++ LQ L + L-I-KE H + 

Sbjct: 1907 KSKSHLVQPKLLSNMEAQHNDLEFKLAGAEREKQKLGKEIVRLQKDL-HMLRKE-HQQEL 1964 

Query: 725 TITKEAYDALSRKSAACQDDLTQALEKLNHVTSETKSLQQSLTQTQEKKAQLEEEIIAYE 784 
I K+ YD R+ 0+ + LE L H ++ + +++ TQ +K+ +LE I + 

Sbjct: 1965 EILKKEYDQ-EREEKIKQEQ--EDLE-LKHNSTLKQLMREFNTQLAQKEQELEMTI K 2017 

Query: 785 ERMKKLNTELRKLRGFHQESELEVHAFDKKLEEMSCQVLQWQKQHQNDLKMLAAKEEQLR 844 

E + K +L HQE E + KK+ E + + K+++ ++L A+EE++ 
Sbjct: 2018 ETINKAQEVEAELLESHQE ETNQLLKKIAEKDDDLKRTAKRYE EILDAREEEMT 2071 

Query: 845 EFQEEMAALKENLLEDDKEPCCLPQWSVP-KDTCRLYRGNDQIMTNLEQWAKQQKVANEK 903 

++ EL+++ LQ PD + ++TLQK +++ K 

Sbjct: 2072 AKVRDLQTQLEELQKKYQQK — LEQEENPGNDNVTIM ELQTQLAQ— KTTLISDSK 2123 

Query: 904 LGNQ-LREQVNYIA-KLSGEKDHLHSVMV-HL 932 

L Q REQ++ + +L + ++++ V HL 
Sbjct: 2124 LKEQEFREQIHNLEDRLKKYEKNVYATTVGHL 2155 

Score = 280 {42.0 bits). Expect « 5.2e-20, P = 5.2e-20 
Identities = 209/938 (22%), Positives « 432/938 (46%) 

Query: 3 DEAGERDREVS-SLNSKLLSLQLDIKN-LHDVC-KRQRKTLQDNQLCMEEAM-NSSHDKK 58 

++ ++ +E+ +L KLL + +K L + + +K Q N +E A NS+ 
Sbjct: 957 EKVKQKAKEMQETLKKKLLDQEAKLKKELENTALELSQKEKQFNAKMLEMAQANSAGISD 1016 

Query: 59 QAQALAFEESEVEFGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELQTSYYSLRQYQSrL 118 

L + E + S + H R+L + + + +++L EELQ + ++ + 
Sbjct: 1017 AVSRLETNQKE-QIESLTEVHRRELNDV ISIWEKKLNQQAEELQ-EIHEIQLQEK — 1069 

Query: 119 EKQTSDLV— LLHHHCKLKE-DEVILYEEEMGNHNENTGEKLHLAQEQLALAGDKIASLE 175 

E++ ++L +L C+ +E ++ I + +E G + T +L +Q + + +A E 
Sbjct: 1070 EQEVAELKQKILLFGCEKEEMNKEITWLKEEGVKQDTTLNELQEQLKQKSAHVNSLAQDE 1129 

Query: 176 RSLNLYRDKYQSSLSNIELLECQVKMLQGELGGI-'-MGQEPENKGDHSKVRIYTSPCMIQ 233 
L + +K + L N L E LQ +L + + +E + K ++ T+ Q 
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Sbjct: 

Query: 

Sbjct: 

Query : 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 


1130 TKLKAHLEKLEVDL-NKSLKENT— FLQEQLVELKMLAEEDKRKVSELTSKLKTTDEEFQ 1186 

234 E HQETQKRLSEVWQKVSQQDDLIQELRNKL— AC— SNALVLEREKALIKLQADFA 285 

H+++ K L + K + L +EL +L C + AL+ + LI + + 
1187 SLKSSHEKSNKSLED KSLEFKKLSEELAIQLDICCKKTEALLEAKTNELINISSSKT 1243 

286 SCTATH-RYPPSSSEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEAVSEQKR 344 
+ + + + + + ++Q + E QN + + E+K 

124 4 NAILSRISHCQHRTTKVKEALLIKTCTVSELEAQLRQLTEEQNTLNISFQQATHQLEEKE 1303 

345 NIMKDMMKLELD-LHGLREETSAHIERKDKDITILQCRLQELQLEFTET-QKLTLKKDKF 402 
N +K M K +++ L +E + + + + + +L+ g +£ ^rp^ k++ 

1304 NQIKSM-KADIESLVTEKEALQKEGGNQQQAASEKESCITQLKKELSENINAVTLMKEE- 1361 

403 LQEKDEMLQELEKKLTQVQNSLLKKEKELEKQQCMATELEMTVKEAKQDKSKEAECKALQ 462 
L+EK + L K+LT + NL+ L+++ +L EK+ ++ L 

1 3 62 LKEKKVEISSLSKQLTDL-NVQLQNSI SLSEKEAAI SSLRKQYDEEKCELLDQVQ— DLS 1418 

463 AEVQKLKNSLEEAKQQERLAAQQAAQCKEEAALAGCHLEDTQRKLQKGLLLDKQKA 518 

+V h A +Q + + ++ K++A ++T ++LQ h L ++A 

1419 FKVDTLSKEKISALEQVDDWSNKFSEWKKKAQSRFTQHQNTVKELQIQLELKSKEAYEKD 1478 

519 DTIQELQRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLENSDKEKRQLQKTVA 578 
+ I L+ EL K + E ++ ++E+ L +L++ +L+ + 

1479 EQINLLKEELDQQNKRFDCLKGEMEDDKSKMEKKESNLET ELKSQTARIMELEDHIT 1535 

579 EQDMKMNDMLDRIKHQHREQGSIKCK-LEEDLQEATKLLEDKREOLKKSKEHEKLMEGEL 637 
++ +++ + + +K+ + +Q 1+ K L + LQ +L E+K ++K+++E +E ++ 
1536 QKTIEIESLNEVLKN-YNQQKDIEHKELVQKLQHFQELGEEKDNRVKEAEEKILTLENQV 1594 

638 EALRQEFKKKDKTLKENSRKLEEENENLRAELQCCSTQLES-SLNKYNTSQQVIQDLNKE 696 
+++ E + KKL+ + ++ + E L+A L+ +LES S K ++ + ++ 

1595 YSMKAELETKKKELEHVNLSVKSKEEELKA-LE DRLESESAAKL AELKRKAEQK 1647 

697 lALQKESLMSLQAQLDKALQKEKHYLQTTITKEAYDALSRKSAACQDDLTQALEKLNHVT 756 
lA K+ L+S Q++ +KE+ Y + T + L+ K + ++ EKL V 

1648 lAAIKKQLLS QME EKEEQYKKGT— ESHLSELNTKLQEREREVHILEEKLKSVE 1699 

757 S~ET KSLQQSLTQTQEKKAQLEEEII-AYEERMKKLNTELRKLRGFHQESELEV 808 

S ET +S + T++++A + + YEE++ L L E E + 

1700 SSQSETLIVPRSAKNVAAYTEQEEADSQGCVQKTYEEKISVLQRNLT EKEKLL 1752 

809 HAFDKKLEEMSCQVLQWQKQHQNDLKMLAAKEEQLREFQEEMAALKENLLEDDKEPCCLP 868 
++ EE + + 0+Q L L E + E Q + L+E L E +K+ + 

1753 QRVGQEKEETVSSHFEMRCQYQERLIKLEHAEAKQHEDQSMIGHLQEELEEKNKKYSLIV 1812 

869 QWSVPKDTCRLYRGNDQIMTNLEQ-WAKQQKVANEK-LGNQLREQ-VNYIAKLSGEKDHL 925 
V K+ + N Q NLE + QK EK L Q+ EQ + + + + 

1813 AQHVEKEGGK— -NNIQAKQNLENVFDDVQKTLQEKELTCQILEQKIKELDSCLVRQKEV 1869 

926 HSV-MVHLQQENKKLK 940 
H V M L + +KL+ 
1870 HRVEMEELTSKYEKLQ 1885 


Score = 227 (34.1 bits). Expect = 2.5e-14, P = 2.5e-14 

Identities = 160/716 (22%), Positives = 318/716 (44%) 


Query: 

233 

Sbjct: 

53 

Query: 

290 

Sbjct: 

113 

Query: 

350 

Sbjct: 

171 

Query: 

409 

Sbjct: 

220 

Query: 

466 

Sbjct: 

279 

Query: 

522 

Sbjct: 

339 


+E +TQ 


KL+ 


E ED+ 


G+ ++ 


+ L + 


L +++ Q L 


S++ L R 


L + D S TA 


+DK + I + R +ELQ++ + L + D L+EKD+ 
-QDKSLRRI AELR-EELQMDQQAKKHLQEEFDASLEEKDQ 219 


L+ +++ ++ L 


+ L+ + K+QE L 


++ + + +LE 


++ Q KE+ L 


+■♦•++ E++ + + 


E Q +L + L L+K K 


L+ 


++ E+ + 


+ E ++ E L 


+ R K + 
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Query: 582 MKMNDMLDRIKHQHREQGSIKCKLEEDLQEAT-KLLEDKREQLK KSKEHEKL-MEGE 636 

++++++ E+ + +EA KL + EQ+K K+ E E++ ++ E 

Sbjct: 399 EELREQKEKSERAAFEELEKALSTAQKTEEARRKLKAEMDEQIKTIEKTSEEERISLQQE 458 

Query: 637 LEALRQEFKK-KDKTLKENSRKLEEENENLRAELQCCSTQLESSLNKYNTSQQVIQDLNK 695 

L ++QE K+ +E KL++ +E EL +L L T ++ Q+ K 

Sbjct: 459 LSRVKQEVVDVMKKSSEEQIAKLQKLHEK— ELARKEQELTKKLQ— -TREREFQEQMK 512 

Query: 696 EIALQKESLMSLQAQLDKALQKEKHYLQTTITKEAYDALSRKSAACQDDLTQALEKLN-H 754 

+AL+K L+ +K Q+ + + K+A S DL Q E 

Sbjct: 513 -VALEKSQSEYLKISQEKEQQESIALEELELQKKAILTESENKLR DLQQEAETYRTR 568 

Query: 755 VTSETKSLQQSLTQTQEKKAQLEEEIIAyEERMKKLNTELRKLRGFHQESELEV— HAFD 812 

+ SL++SL QE K Q ++ + E K N E+ + H+ +ELE H D 
Sbjct: 569 ILELESSLEKSL QENKNQSKDLAVHLEAEKNKHNKEITVMVEKHK-TELESLKHOQD 624 

Query: 813 KKLEEMSCQVLQWQKQHQNDLKMLAAKEEQLRE FQEEMAALKENLLED-DK 862 

E QVL+ +Q+Q +++. L K EQ +E FQ + + E LE D 

Sbjct: 625 ALWTE-KLQVLK— QQYQTEMEKLREKCEQEKETLLKDKEIIFQAHIEEMNEKTLEKLDV 681 

Query: 863 EPCCLPQWSVPKDTCRLYRGNDQIMTNLEQWAKQQKVANEKLGNQLREQVNYIAKLSGEK 922 

+ LS++++++L Q ++L ++ EQ N+ + 

Sbjct: 682 KQTELE— SLSSELSEVLKARHKLEEELSVLKDQTDKMKQELEAKMDEQKNHHQQQVDSI 739 

Query: 923 DHLHSVMVHLQQENKKLKKEIEEKKM 948 

H V + Q+ K LK +1 + ++ 
Sbjct: 740 IKEHEVSI--QRTEECALKDQINQLEL 763 

Score = 183 (27.5 bits). Expect = l-3e-09, P = 1.3e-09 
Identities = 132/584 (22%), Positives = 251/584 (42%) 

Query: 409 MLQELEKKLTQVQNSLLKKEKELEKQQCMATELEMTVKEAK-QDKSKEAECKALQAEVQK 467 

M ++L++K+++ QL+ + +TM++++E +Q 

Sbjct: 1 MFKKLKQKISEEQQQLQQALAPAQASSNSSTPTRMRSRTSSFTEQLDEGTPNRESGDTQS 60 

Query: 468 LKNSLE-EAKQQERLAAQQAAQCKEEAALAGCHLEDTQRKLQKGLLLDKQKA — DTIQEL 524 

L+ EL + ++ + + R+ L LD A D ++ 

Sbjct: 61 FAQKLQLRVPSVESLFRSPIKESLFRSSSKESLVRTSSRESLNRLDLDSSTASFDPPSDM 120 

Query: 525 QRELQMLQKESSMAEKEQTSNRKRVEELSL ELSEALRKLENSDKEKRQLQKTVAE 579 

E + L S KEQ R R E SL + SE + + +EK++LQ +++ 

Sbjct: 121 DSEAEDLVGNSDSLNKEQLIQRLRRMERSLSSYRGKYSELVTAYQMLQREKKKLQGILSQ 180 

Query: 580 -QDMKMNDMLDRIKHQHREQGSIKCKLEE DLQEATK LLEDKREQLKKSKEHEKL 632 

QD + + + + +Q + K EE L+E + +L+ + LK+ + + 
Sbjct: 181 SQDKSLRRIAELREELQMDQQAKKHLQEEFDASLEEKDQYISVLQTQVSLLKQRLRNGPM 240 

Query: 633 MEGELEALRQ-EFKKKDKTLKENSRKLEE ENENLRAELQCCSTQLESSLNKYNTSQQ 688 

L+ L Q E + + T +EN E E+ L+ +++ N + + 

Sbjct: 241 NVDVLKPLPQLEPQAEVFTKEENPESDGEPVVEDGTSVKTLETLQQRVKRQENLLKRCKE 300 

Query: 689 VIQDLNKEIALQKESLMSLQAQLDKALQKEKHYLQTTITKEAYDALSRKSAACQDDLTQA 748 

IQ ++ L +LQ QLD+ LQ E ++ E +++ A +L + 

Sbjct: 301 TIQSHKEQCTLLTSEKEALQEQLDERLQ-ELEKIKDLHMAEKTKLITQLRDA — KNLIEQ 357 

Query: 749 LEK-LNHVTSETKSLQQSLTQTQEKKAQLEEEIIAYEERMKKLNTELRKLRGFHQESELE 807 

LE+ V +ETK + + +T E K EEEI R+K++ T+ +LR Q+ + E 

Sbjct: 358 LEQDKGMVIAETK RQMHETLEMK EEEIAQLRSRIKQMTTQGEELR— EQKEKSE 409 

Query: 808 VHAFDKKLEEMSCQVLQWQKQHQNDLKMLAAKEEQLREFQ EEMAALKENLLEDDKE 863 

AF EE+ + QK + K+A +EQ++ + EE +L++ L +E 
Sbjct: 410 RAAF EELEKALSTAQKTEEARRKLKAEMDEQIKTIEKTSEEERISLQQELSRVKQE 465 

Query: 864 PCCLPQWSVPKDTCRLYRGNDQIMTNLEQ-WAKQQKVANEKLGNQLR EQVNYIAK 917 

+ + S + +L + +++ + EQ K+ + + Q++ Q Y+ K 

Sbjct: 466 WDVMKKSSEEQIAKLQKLHEKELARKEQELTKKLQTREREFQEQMKVALEKSQSEYL-K 524 

Query: 918 LSGEKDHLHSVMVH-LQQENKKLKKEIEEK KMKAENTRLCTKALGPSRTESTQREK 972 

+S EK+ S++ L+ + K+ EEK + +AE R L S +S Q K 

Sbjct: 525 ISQEKEQQESLALEELELQKKAILTESENKLRDLQQEAETYRTRZLELESSLEKSLQENK 584 


Pedant information for DKFZphtes3_lgl3, frame 1 
Report for DKFZphtes3_lgl3. 1 


(LENGTH! 

[MWl 

[pl] 


1007 

117480.77 
5.90 
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TREMBL:AF092090_1 product: "cplSl"; Rattus norvegicus cplSl mRNA, partial cds, 

30.03 organization of cytoplasm (S. cerevisiae, YDLOSBw] 5e-15 
08.07 vesicular transport (golgi network, etc.) (S. cerevisiae, YDL058w] 

09.10 nuclear biogenesis [S. cerevisiae, YDR356wI le-11 

30.04 organization of cytoskeleton [S. cerevisiae, YDR356wJ le-11 
03.22 cell cycle control and mitosis [S. cerevisiae, YDR356wj le-11 
30.10 nuclear organization (S. cerevisiae, YKR095w] le-08 
11.04 dna repair (direct repair, base excision repair and nucleotide excision 
(S. cerevisiae, YKR095wl le-08 

99 unclassified proteins (S. cerevisiae, YLR309cl le-08 

1 genome replication, transcription, recombination and repair [M. 
MJ1322] 4e-06 

09.13 biogenesis of chromosome structure (S. cerevisiae, YLR086w) 9e-06 

03.04 budding, cell polarity and filament formation [S. cerevisiae, YHR023w 

(S. cerevisiae, YHR023w MYOl - 


(HOMOLJ 
0.0 

(FUNCAT) 
[FUNCAT) 
5e-15 
[FUNCAT] 
[FUNCAT) 
(FUNCAT J 
[FUNCAT J 
(FUNCAT] 
repair) 
[FUNCAT] 
[FUNCAT] 
jannaschii, 
[FUNCAT] 
[FUNCAT] 

MYOl - myosin- 1 isoforro] 3e-04 

(FUNCAT] 08.22 cytoslceleton-dependent transport 
myosin-1 isoform] 3e-04 

(FUNCAT) 03.25 cytokinesis (S. cerevisiae, YHR023w MYOl - myosin-1 isoform] 3e-04 

[FUNCAT] 98 classification not yet clear-cut [S. cerevisiae, YJR134c3 5e-04 

tECl 3.6.1.32 Myosin ATPase le-16 

[PIRKWJ nucleus 3e-10 

[PIRKW] phosphotransferase 6e-09 

(PIRKWJ duplication 2e-06 

[PIRKW] citrulline 2e-12 

[PIRKW] tandem repeat le-16 

[PIRKW] endocytosis 2e-13 

[PIRKWJ heart 8e-13 

[PIRKW] transmembrane protein le-13 

[PIRKWJ serine/threonine-specific protein kinase 6e-09 

[PIRKW] zinc finger 2e-13 

[PIRKW] metal binding 2e-13 

[PIRKW] DNA binding 4e-12 

(PIRKWJ muscle contraction le-16 

(PIRKWJ acetylated amino end le-11 

(PIRKW) actin binding le-16 

(PIRKW) mitosis 5e-15 

(PIRKWJ microtubule binding 5e-15 

[PIRKWJ ATP ie-16 

[PIRKWJ thick filament le-16 

[PIRKWJ phosphoprotein 4e-16 

(PIRKW] skeletal muscle 2e-14 

(PIRKW] calcium binding 2e-12 

(PIRKW) alternative splicing le-16 

(PIRKWJ coiled coil le-16 

[PIRKWJ P-loop le-16 

[PIRKW] heptad repeat 3e-10 

[PIRKW] methylated amino acid le-16 

[PIRKWJ immunoglobulin receptor 2e-06 

(PIRKWJ peripheral membrane protein 2e-13 

(PIRKWJ cardiac muscle 8e-13 

[PIRKWJ hydrolase le-16 

[PIRKWJ microtubule 3e-10 

(PIRKWJ muscle 8e-13 

(PIRKW) EF hand 2e-12 

(PIRKW) cytoskeleton 2e-15 

[PIRKWJ hair 2e-12 

[PIRKWJ calmodulin binding 2e-13 

[PIRKWJ Golgi apparatus 3e-10 

[SUPFAM] myosin heavy chain le-16 

(SUPFAMJ conserved hypothetical Pi 15 protein le-07 

(SUPFAM] centromere protein E 5e-15 

(SUPFAMJ unassigned Ser/Thr or Tyr-specific protein kinases 6e-09 

(SUPFAMJ calmodulin repeat homology 2e--12 

(SUPFAM) myosin motor domain homology le-16 

(SUPFAM) alpha-actinin actin-binding domain homology 2e-07 

[SUPFAM] plectin 2e-07 

(SUPFAM) trichohyalin 2e-12 

(SUPFAM] pleckstrin repeat homology 8e~08 

(SUPFAM) ribosoidal protein SIO homology 2e-07 

(SUPFAM) giantin 3e-13 

[SUPFAM) protein kinase homology 6e-09 

[SUPFAM) protein kinase C zinc-binding repeat homology 8e-08 

[SUPFAM] kinesin motor domain homology 6e-15 

(SUPFAM) human early endosome antigen 1 2e-13 

(SUPFAM) M5 protein le-07 

(PROSITE) LEUCINE_ZIPPER 7 

[PROSITE] MYRISTYL 2 

[PROSITE] CAMP_PHOSPHO_SITE 2 

(PROSITE) CK2 PHOSPHO SITE 20 
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[PROSrTEJ TYR_PH0SPHO_SITE 1 

(PROSITE) PKC_PHOSPHO_SITE 16 

CPROSITE] ASN_GLYCOSYLATI0N 2 

[KW] All^Alpha 

[KWJ LOW COMPLEXITY 15.00 % 

(KWJ COILED^COIL 42.40 % 

SEQ MKDEAGERDREVSSLNSKLLSLQLDIKNLHDVCKRQRKTLQDNQLCMEEAMNSSHDKKQA 

SEG xxxxxxxxxxxx 

PRD ccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ QALAFEESEVEFGSSKQCHLRQLQQLKKKLLVLQOELEFHTEELQTSYYSLRQYQSILEK 

SEG xxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COI LS cccccccccccccccccccccccccccccccc 

SEQ QTSDLVLLHHHCKLKEDEVILYEEEMGNHNENTGEKLHLAQEQLALAGDKIASLERSLNL 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COI LS ccccccccccccccccccccccccccccccc 

SEQ YRDKYQSSLSNIELLECQVKMLQGELGGIMGQEPENKGDHSKVRIYTSPCMIQEHQETQK 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCC 

SEQ RLSEVWQKVSQQDDLIQELRNKLACSNALVLEREKALIKLQADFASCTATHRYPPSSSEE 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS 

SEQ CEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEAVSEQKRNIMKDMMKLELDLHGL 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ REETSAHIERKDKDITILQCRLQELQLEFTETQKLTLKKDKFLQEKDEMLQELEKKLTQV 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COI LS CCC CCCCCCCCCCCCCC 

SEQ QNSLLKKEKELEKOQCMATELEMTVKEAKQDKSKEAECKALQAEVQKLKNSLEEAKQQER 

SEG . . . xxxxxxxxxx xxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ LAAQQAAQCKEEAALAGCHLEDTQRKLQKGLLLDKQKADTIQELQRELQMLQKESSMAEK 

SEG xxxxxxxxxxxxxxxx xxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS cccccccc : 

SEQ EQTSNRKRVEELSLELSEALRKLEHSDKEKRQLQKTVAEQDMKMNDMLDRIKHQHREQGS 

SEG xxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COI LS cccccccccccccccccccccccccccccccccc 

SEQ IKCKLEEDLQEATKLLEDKREQLKKSKEHEKLMEGELEALRQEFKKKDKTLKENSRKLEE 

SEG xxxxxxxxxx 

PRO hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCC 

SEQ ENENLRAELQCCSTQLESSLNKYNTSQQVIQDLNKEIALQKESLMSLQAQLDKALQKEKH 

SEG xxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COI LS CCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ YLQTTITKEAYDALSRKSAACQDDLTQALEKLNHVTSETKSLQQSLTQTQEKKAQLEEEI 

SEG xxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COI LS CCCCCCCCCCCCCCCCCCCCCCC 

SEQ lAYEERMKKLNTELRKLRGFHQESELEVHAFDKKLEEMSCQVLQWQKQHQNDLKMLAAKE 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCC 

SEQ EQLREFQEEMAALKENLLEDDKEPCCLPQWSVPKDTCRLYRGNDQIMTNLEQWAKQQKVA 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
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COILS 


SEQ NEKLGNQLREQVNYIAKLSGEKDHLHSVMVHLQQENKKLKKEIEEKKMKAENTRLCTKAL 

xxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 


SEQ 
SEG 
PRD 
COILS 


GPSRTESTQREKVCGTLGWKGLPQDMGQRMDLTKYIGMPHCPGSSYC 
cchhhhhhhhhhhhhhhhcccccccccchhhhhheeecccccccccc 


PSOOOOl 

52->56 

PSOOOOl 

684 

->688 

PS00004 

240 

->244 

PS00004 

415 

->419 

PS00005 

74->77 

PS00005 

110 

->113 

PS00005 

238 

->241 

PS00005 

290 

->293 

PS00005 

392 

->395 

PS00006 

396 

->399 

PS00005 

444 

->447 

PS00005 

503 

->506 

PS00005 

544 

->547 

PS00005 

566 

->569 

PS00005 

600 

->603 

PS00005 

650 

->653 

PS00005 

655- 

->658 

PS00005 

735- 

->738 

PS00005 

876- 

->879 

PS00005 

968- 

->971 

PS00006 

39->43 

PS00006 

53->57 

PS00006 

68->72 

PS00006 

116- 

->120 

PS00006 

190- 

->194 

PS00006 

250- 

->254 

PS00006 

296- 

->300 

PS00006 

439- 

->443 

PS00006 

444- 

•>448 

PS00006 

471- 

•>475 

PS00006 

520- 

■>524 

PS00006 

536- 

■>540 

PS00006 

566- 

■>570 

PS00006 

576- 

■>580 

PS00006 

650- 

•>654 

PS00006 

674- 

•>678 

PS00006 

804- 

■>808 

PS00006 

888- 

>892 

PS00006 

963- 

>967 

PS00006 

968- 

>972 

PS00007 

135- 

>143 

PS00008 

207- 

>213 

PS00008 

599- 

>605 

PS00029 

83- 

>105 

PS00029 

90- 

>H2 

PS00029 

97- 

>119 

PS00029 

104- 

>126 

PS00029 

403- 

>425 

PS00029 

410- 

>432 

PS00029 

918- 

>940 


Prosite for DKFZphtes3_lgl3 . 1 

ASNGLYCOSYLATION PDOCOOOOl 

ASN_GLYCOSYLATION PDOCOOOOl 

CAMP_PHOSPHO_SITE PDOC00004 

CAMPPHOSPHOSITE PDOC00004 

PKC_PH0SPHO_SITE PDOC00005 

PKC_PHOSPHO_SITE PDOC00005 

PKC_PHOSPHO_SITE PDOC00005 

PKC_PHOSPHO_SITE PDOC00005 

PKC_PHOSPHO_SITE PDOC00005 

PKC_PHOSPHO_SITE PDOC00005 

PKC_PHOSPHO_SITE PDOC00005 

PKC_PHOSPHO_SITE PDOC00005 

PKC_PHOSPHO_SITE PDOC00005 

PKC_PHOSPHO_SITE PDOC00005 

PKC^PHOSPHOSITE PDOC00005 

PKC_PHOSPHO_SITE PDOC00005 

PKC_PHOSPHO_SITE PDOC00005 

PKC_PHOSPHO_SITE PDOC00005 

PKC PHOSPHORS ITE PDOC00005 

PKC"PHOSPHO_SITE PDOC00005 

CK2_PHOSPHO SITE PDOC00006 

CK2_PHOSPHO]3siTE PDOC00006 

CK2_PHOSPHO_SITE PDOC00006 

CK2_PH0SPH0_SITE PDOC00006 

CK2_PHOSPHO_SITE PDOC0000 6 

CK2_PHOSPHO_SITE PDOC00006 

CK2_PHOSPH0_SITE PDOC00006 

CK2_PHOSPHO_SITE PDOC00006 

CK2_PHOSPHO_SITE PDOC00006 

CK2_PHOSPHO_SITE PDOC00006 

CK2_PHOSPHO_SITE PDOC00006 

CK2_PHOSPHO_SITE PDOC00006 

CK2_PHOSPHO_SITE PDOC00006 

CK2_PHOSPHO_SITE PDOC00006 

CK2_PHOSPHO_SITE PDOC00006 

CK2_PHOSPH0_SITE PDOC00006 

CK2_PHOSPHO_SlTE PDOC00006 

CK2_PHOSPHO_SITE PDOC00006 

CK2 PHOSPHO_SITE PDOC00006 

CK22pHOSPHO_SITE PDOC00006 

TYR_PHOSPHO_SITB PDOC00007 

MYRISTYL PDOC00008 

MYRISTYL PDOC00008 

LEUCINE_ZIPPER PDOC00029 

LEUCINE^ZIPPER PIXX:00029 

LEUCINE_2IPPER PDOC00029 

LEOCINE_ZIPPER PDOC00029 

LEUCINE_2IPPER PDOC00029 

LEUCINE_ZIPPER PDOC00029 

LEOCINE~ZIPPER PDOC00029 


(No Pfam data available for DKFZphtes3_lgl3. 1) 


DKF2phtes3_lkll 


group: cell structure and motility 
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DKFZphtes3_lkH encodes a novel 589 amino acid protein with strong similarity to Mus musculus 
actin-binding protein (ENC-l) . 

Ectoderm-neural cortex-1 protein (ENC-l) is an early and highly specific marker of neural 
induction in vertebrates. The protein is related to the kelch family proteins and is expressed 
during early gastrulatxon m the prospective neuroectodermal region of the epiblast and later 
xn development throughout the nervous system (NS) . ENC-l functions as an actin-binding protein 
organising the actm cytosJceleton during neural differentiation and development of the NS 
The novel protein is highly similar to ENC-l. 

The new protein can find application in modulation of cyto skeleton organisation in human 
testicular cells. 


strong similarity to mouse ENC-l 

complete cDNA, compete cds, EST hits 

Sequenced by DKF2 

Locus: unknown 

Insert length: 3525 bp 

Poly A stretch at pos. 3515, polyadenylation signal at pos. 3499 


1 GGTGGAGAGC CGGCCGACGG GAGCCGCGGC GGAGCCTGTT GAGCTCGCGC 
51 GGGCTGCCGG GAGTGGTCTC TGAGGCGGCG GCGGCGGCGG GGATCGTCTC 
101 CGGCACTGGC GCACCATGTC GGTCAGTGTC CATGAGACCC GCAAGTCGCG 
151 GAGCAGCACG GGGTCCATGA ACGTCACCCT CTTCCACAAG GCCTCCCACC 
201 CGGACTGTGT GCTGGCCCAC CTCAACACGC TTCGCAAGCA CTGCATGTTC 
251 ACCGACGTCA CACTCTGGGC GGGCGACCGT GCCTTCCCCT GTCACCGTGC 
301 CGTGCTGGCC GCCTCTAGCC GCTATTTTGA GGCCATGTTC AGCCATGGCC 
351 TTCGGGAGAG CCGGGATGAC ACTGTCAACT TCCAGGACAA CCTGCACCCG 
401 GAGGTGCTGG AGCTGCTGCT GGACTTTGCC TACTCCTCAC GCATCGCCAT 
451 CAACGAGGAG AACGCTGAGT CACTGCTGGA GGCAGGCGAC ATGCTGCAGT 
501 TCCACGATGT GCGGGATGCT GCCGCCGAGT TCCTGGAGAA GAACCTTTTC 
551 CCCTCCAACT GCCTGGGCAT GATGCTGCTC TCGGACGCCC ACCAGTGCCG 
601 CCGGCTGTAT GAGTTCTCCT GGCGCATGTG CCTGGTGCAC TTTGAGACGG 
651 TGAGGCAGAG CGAGGACTTC AACAGCCTGT CCAAGGACAC ACTGCTGGAC 
701 CTCATCTCGA GTGATGAGCT GGAGACCGAG GACGAGCGGG TGGTCTTCGA 
751 GGCCATCCTC CAGTGGGTGA AGCACGACCT GGAGCCACGG AAGGTCCACT 
801 TGCCCGAGCT CCTCCGCAGC GTGCGTCTGG CCTTGCTGCC GTCCGACTGC 
851 CTGCAGGAGG CCGTCTCCAG CGAGGCCCTC CTCATGGCAG ACGAGCGCAC 
901 CAAGCTTATC ATGGATGAGG CCCTGCGCTG CAAGACCAGG ATCCTGCAGA 
951 ATGATGGCGT GGTCACCAGC CCCTGTGCCC GGCCACGCAA GGCGGGCCAC 
1001 ACGCTACTCA TCCTGGGGGG CCAGACCTTC ATGTGTGACA AGATCTACCA 
1051 GGTGGACCAC AAGGCCAAGG AGATCATCCC CAAGGCCGAC CTGCCCAGCC 
1101 CCCGGAAGGA GTTCAGCGCC TCAGCGATCG GCTGCAAGGT CTATGTGACG 
- 1151 GGGGGCAGGG GCTCCGAGAA CGGGGTCTCC AAGGATGTCT GGGTGTACGA 

1201 CACCGTACAT GAGGAATGGT CCAAGGCGGC GCCCATGCTG ATTGCCCGCT 
1251 TTGGCCATGG CTCAGCTGAG CTGGAGAACT GCCTCTATGT GGTGGGGGGA 
1301 CACACATCCC TGGCAGGGGT CTTCCCGGCC TCGCCTTCTG TCTCCCTGAA 
1351 ACAAGTGGAG AAATACGACC CTGGGGCCAA CAAGTGGATG ATGGTGGCCC 
1401 CCTTGCGGGA TGGCGTCAGC AATGCCGCAG TGGTGAGTGC CAAGCTGAAG 
1451 CTCTTTGTTT TCGGAGGAAC CAGCATCCAC CGGGACATGG TGTCCAAGGT 
1501 CCAGTGCTAT GACCCCTCGG AGAACAGGTG GACGATCAAG GCCGAGTGCC 
1551 CCCAGCCTTG GCGGTACACA GCCGCTGCCG TCCTGGGCAG CCAGATCTTC 
1601 ATCATGGGAG GTGACACGGA ATTCACAGCC GCCTCGGCCT ACCGCTTTGA 
1651 CTGTGAGACC AACCAGTGGA CGCGGATTGG GGACATGACT GCCAAGCGCA 
1701 TGTCCTGCCA TGCCCTGGCT TCCGGCAACA AGCTCTATGT GGTCGGGGGC 
17 51 TACTTTGGGA CCCAGAGGTG TAAGACTCTG GACTGCTATG ACCCCACTTC 
1801 AGATACATGG AACTGCATCA CCACAGTGCC CTACTCACTT ATCCCCACGG 
1851 CCTTTGTCAG CACCTGGAAG CACCTGCCCG CGTGAGGAGC ACCTGCTGAG 
1901 CCCAGCCAGA CCGCGGCCTT CAGTGTCACA GCGTGGCCTT GCTTGTCTGC 
1951 CACAGCGGGA GCTAAGCCGG CCCTGGGCCA GCACTCCGAG AGGTGGAAGG 
2001 GGCCCTGCCA GCTCTGGGGA GCAGCAGCCT TGGGCTGTTC TGAGCTTTAG 
2051 GCAAGAGAAG AGAAGCATCT CTTGCATCCG TGCCCCTGGG GGCCTCTTCA 
2101 GCTTTGCAGT GGTTTGTGGG AAGACATACC TCCCAGAGGG GCATGGACTG 
2151 CCACCAGGAC TGACCCTGGC GTCGGGGAGA AGGACACTTG CAGAGCCTTG 
2201 AGATCACCTG TTTGGCAGGT CCTGGACTGG GGCCGGGCAG GCAGGGGCAG 
2251 GGAGGCGCCC CGGGTGGGCT TTGGGGCTGC GGCACTGCCA CACATCCTTT 
2301 CCCTCCTGGC CTGCCCTGCT GGGGCTCTAC TGCCATCTAT AGATGGTGTC 
2351 CTGGGCCTGG GAAACTAGGT TCCCAGGGGT TGAGACCAGA AAGGTGACCA 
2401 AGACAGATTT TTTAAGGTGC AGAAACTGCA GGGGGGCCTC AGTGACATCC 
2451 ATGAGGCCTT ATTAGCAAAG GACACCCAGA CCTCCAAGGT TTGTGGGCCC 
2501 CTTCCACAAA GCTGTAAGTC CCAGCCCACC TACTCAGGGC CTTGCTCAGT 
2551 GCTGTGGCCC GGTGGGGACA CAGTTGCTCG TGGCCACTCA GTGGAGCTGG 
2601 GCCTGCAGCA GACTCAAGGC TCCGAGTGCC CTGGGGGTCA CCCCTCCCCT 
2651 CCCCTCCTCA GAGCCCACCC TGAGAGGCAG CAGTGACCCC CATGGCACAC 
2701 ACCTGCCAAC AGCACTGGGG GCTTCTCCCC AGGAGACCAC GCTGCCCTCC 
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2751 AAGACCAGGA GCAGCTGTGA GCTGGAGACA GCAGAGGGAC CCCAGGGTGT 
2801 CCCCTGCAGA TCCCACCAGG GCCGCATCCA TCTCAGTGTG GAGGACAGTG 
2851 ACGGGACCCT CACCATCCTC TTGCGTTTTG GCCCCCATTT GCTCCCTGAG 
2901 CTCCAAGATA AGAATGGCCC CGAGAGAACT GCTGAACATT TGTTCATTGC 
2951 TGTCACCTCC TGAGTCACTG GGGTCCCTCA CCAGCACCTC CCTGACACCT 
3001 GGGCTATGGA GAGGTTGGCG CCTGTCAGTG ACCATCCTAA TGCCTCTCGC 
3051 TCACTCCCAA GCCACCATTT GAGAGGGAGG GGTGTTGGTG CCCTGACAGG 
3101 GACTGGGCAG GGTGTCCAAA CTTGGGGCTT CCCAGGCACC TGCAGTGTGA 
3151 ACACTGCTTG GCTGGCTCAA GATTAGGGCC GCGGAGGGGG CTGTGCACAT 
3201 ACCAGTTACT TAAGCAGCCA CGAGTGTCCC CCATGCCTTG GTGCGGGTCC 
3251 TGGAGGCCTC TTGGGGGTGG GACCTTTGGG CAGGGTTTGC CCACTGACGC 
3301 GCCCGCCATG GGGCACTGGC TGCATGGGGC TCCTTGGACC CTGTAGAGCC 
3351 AGCAGGAGCC TGGCCGCGGG GACTGCAGGG AGGGTGCCTG GACCCGTGGG 
3401 GTTGCTTCAT TGAGATAAAG CACACTTATC ACATAGCACA AAGGACGTGC 
3451 CATGGTGCTT TCCCCAAAAG TTGTGTTGCT TTTATCAGTT TTCTAACTTA 
3501 ATAAAAAGAG TTGAGAAAAA AAAAA 


BLAST Results 


No BLAST result 


Medline entries 


98350113: 

Cloning of human ENC-1 and evaluation of its expression 
and regulation in nervous system tumors. 

97252647: 

ENC-1: a novel mammalian kelch-related gene specifically expressed in 
the nervous system 

encodes an actin-binding protein. 

98234394: 

NRP/B, a novel nuclear matrix protein, associates with 
pllO(RB) and is involved in neuronal dif f erentiati 


Peptide information for frame 2 


ORF from 116 bp to 1882 bp; peptide length: 589 
Category: strong similarity to known protein 
Classification: Cell structure/motility 


1 MSVSVHETRK SRSSTGSMNV TLFHKASHPD CVLAHLNTLR KHCMFTDVTL 
51 WAGDRAFPCH RAVLAASSRY FEAMFSHGLR ESRDDTVNFQ DNLHPEVLEL 
101 LLDFAYSSRI AINEENAESL LEAGDMLQFH DVRDAAAEFL EKNLFPSNCL 
151 GMMLLSDAHQ CRRLYEFSWR MCLVHFETVR QSEDFNSLSK DTLLDLISSD 
201 ELETEDERW FEAILQWVKH DLEPRKVHLP ELLRSVRLAL LPSDCLQEAV 
251 SSEALLMADE RTKLIMDEAL RCKTRILQND GVVTSPCARP RKAGHTLLIL 
301 GGQTFMCDKI YQVDHKAKEI IPKADLPSPR KEFSASAIGC KVYVTGGRGS 
351 ENGVSKDVWV YDTVHEEWSK AAPMLIARFG HGSAELENCL YWGGHTSLA 
401 GVFPASPSVS LKQVEKYDPG ANKWMMVAPL RDGVSNAAW SAKLKLFVFG 
451 GTSIHRDMVS KVQCYDPSEN RWTIKAECPQ PWRYTAAAVL GSQIFIMGGD 
501 TEFTAASAYR FDCETNQWTR IGDMTAKRMS CHALASGNKL YWGGYFGTQ 
551 RCKTLDCYDP TSDTWNCITT VPYSLIPTAF VSTWKHLPA 

BLASTP hits 

Entry MMU65079_1 from database TREMBL: 

gene: "ENC-1**; product: "actin-binding protein"; Mus musculus 
actin-binding protein (ENC-1) mRNA, complete cds. 

Score = 2402, P = 1.9e-249, identities ^ 440/589, positives = 513/589 
Entry AF0596111 from database TREMBLNEW: 

gene: "NRPB**; product: •'nuclear matrix protein NRP/B"; Homo sapiens 

nuclear matrix protein NRP/B (NRPB) mRNA, complete cds. 

Score = 2400, P « 3.0e-249, identities - 440/589, positives = 512/589 

Entry AF010314_1 from database TREMBL: 

gene: "PIGIO'*; product: "PiglO"; Homo sapiens PiglO (PIGIO) mRNA, 
complete cds. 

Score = 1745, P - 7.8e-180, identities = 335/507, positives = 403/507 
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Entry KELC_DROME from database SWISSPROT: 

RING CANAL PROTEIN (KELCH PROTEIN). >TREMBL : DMRCPA^l product: "ring 
canal protein"; Drosophila melanogaster ring canel protein and 0RF2 
mRNA, complete cds. 

Score = 672, P = 3.9e-66, identities = 168/536, positives = 257/536 


Alert BLASTP hits for DKF2phtes3_lkll, frame 2 
No Alert BLASTP hits found 

Pedant information for 0KFZphtes3_lkll, frame 2 


Report for DKFZphtes3_lkll .2 


[LENGTH] 

589 

(MW] 

65923.45 

Ipl] 

6.10 

(HOMOL] 

TREMBL:MMU650791 gene: -ENC-l"; product: "actin-binding protein"; Mus musculus 

actin-binding 

protein (ENC-l) mRNA, complete cds. 0.0 

IFUNCATJ 

10.05.99 other pheromone response activities JS. cerevisiae, YHR158cl 

2e-09 


{ BLOCKS 1 

BL01016D Glycoprotease family proteins 

IPIRKWJ 

zinc finger le-08 

IPIRKWJ 

DNA binding le-08 

[PIRKWJ 

transcription factor le-08 

{SUPFAMl 

POZ domain homology 3e-68 

{SUPFAMl 

vaccinia virus 59K Hindi II-C protein le-15 

ISUPFAMJ 

A55R protein 5e-29 

(SUPFAMJ 

hypothetical protein YHR158c 4e-08 

(SUPFAMl 

A55R protein middle region homology 5e-29 

[SUPFAMl 

myxoma virus M9-R protein ie-14 

[SUPFAMJ 

A55R protein car boxyl- terminal homology 5e-29 

[KW] 

AlphaBeta 


SEQ MSVSVHETRKSRSSTGSMNVTLFHKASHPDCVLAHLNTLRKHCMFTDVTLWAGDRAFPCH 

PRD cccccccccccccccccceeeeeeccccchhhhhhhhhhhhhhhhheeeeeecccchhhh 

SEQ RAVLAASSRYFEAMFSHGLRESRDDTVNFQDNLHPENTLELLLDFAYSSRIAINEENAESL 

PRD hcccccccccccccccccchhhhhheeeeccccchhhhhhhhhhhhccceeehhhhhhhh 

SEQ LEAGDMLQFHDVRDAAAEFLEKNLFPSNCLGMMLLSDAHQCRRLYEFSWRMCLVHFETVR 

PRD hhhhhhhhhhhhhhhhhhhhhhhccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ QSEDFNSLSKDTLLDLISSDELETEDERWFEAILQWVKHDLEPRKVHLPELLRSVRLAL 

PRD hhhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhhc 

SEQ LPSDCLQEAVSSEALLMADERTKLIMDEALRCKTRILQNDGWTSPCARPRKAGHTLLIL 

PRD ccchhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhcccccccccccccccccceeeeee 

SEQ GGQTFMCDKI YQVDHKAKEZ I PKADLPSPRKEFSASAIGCKVYVTGGRGSENGVSKDVWV 

PRD cccccccceeeeeccccccccccccccccccceeeeeeceeeeeecccccccccceeeee 

SEQ YDTVHEEWSKAAPHLIARFGHGSAELENCLYWGGHTSLAGVFPASPSVSLKQVEKYDPG 

PRD cccccccccccccccccccccceeeccceeeeecccccccccccccccccccceeecccc 

SEQ ANKWMMVAPLRDGVSNAAVVSAKLKLFVFGGTSIHRDMVSKVQCYDPSENRWTIKAECPQ 

PRD ccceeeeccccccccceeeeeccceeeeeccccccccccceeeecccccccccccccccc 

SEQ PWRYTAAAVLGSQIFIHGGDTEFTAASAYRFDCETNQWTRIGDMTAKRMSCHALASGNKL 

PRD ccccceeeeecceeeeecccccccccceeecccccccceeeccccccccceeeeecccee 

SEQ YVVGGYFGTQRCKTLDCYDPTSDTWNCITTVPYSLIPTAFVSTWKHLPA 

PRD eeecccccccccccccccccccccceeeeeccccccceeeeeecccccc 


(No Prosite data available for DKFZphtes3_lkll .2) 
(No Pfam data available for DKFZphtes3_lkll .2) 
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DKF2phtes3_ln3 


group: signal transduction 

DKFZphtes3_ln3 encodes a novel 1196 amino acid protein with similarity to S. pombe Tupl 
protein . 

The protein contains 1 WD-40 repeat, which is typical for the beta-transducin subunit of G- 
proteins. The beta subunits seem to be required for the replacement of GDP by GTP as well as 
for membrane anchoring and receptor recognition. In addition, a RGD site is present. 

The new protein can find application in modulating/blocking G-protein-dependent pathways. 


similarity to Tuplp 

complete cDNA, complete cds, EST hits 

Sequenced by DKFZ 

Locus : /map« " 6q2 4 " 

Insert length: 5277 bp 

Poly A stretch at pos. 5267, polyadenylation signal at pos. 5244 


1 GCTGCATAAA GCTGAGAGAT GCCTACAGCT GAGAGTGAAG CAAAAGTAAA 
51 AACCAAAGTT CGCTTTGAAA AATTGCTTAA GACCCACAGT GATCTAATGC 
101 GTGAAAAGAA AAAACTGAAG AAAAAACTTG TCAGGTCTGA AGAAAACATC 
151 TCACCTGACA CTATTAGAAG CAATCTTCAC TATATGAAAG AAACTACAAG 
201 TGATGATCCC GACACTATTA GAAGCAATCT TCCCCATATT AAAGAAACTA 
251 CAAGTGATGA TGTAAGTGCT GCTAACACTA ACAACCTGAA GAAGAGCACG 
301 AGAGTCACTA AAAACAAATT GAGGAACACA CAGTTAGCAA CTGAAAATCC 
351 TAATGGTGAT GCTAGTGTAG AGGAAGACAA ACAAGGAAAG CCAAATAAAA 
401 AGGTGATAAA GACGGTGCCC CAGTTGACTA CACAAGACCT GAAACCGGAA 
451 ACTCCTGAGA ATAAGGTTGA TTCTACACAC CAGAATVACAC ATACAAAGCC 
501 ACAGCCAGGC GTTGATCATC AGAAAAGTGA GAAGGC7VAAT GAGGGAAGAG 
551 AAGAGACTGA TTTAGAAGAG GATGAAGAAT TGATGCAAGC ATATCAGTGC 
601 CATGTAACTG AAGAAATGGC AAAGGAGATT AAGAGGAAAA TAAGAAAGAA 
651 ACTGAAAGAA CAGTTGACTT ACTTTCCCTC AGATACTTTA TTCCATGATG 
701 ACAAACTAAG CAGTGAAAAA AGGAAAAAGA AAAAGGAAGT TCCAGTCTTC 
751 TCTAAAGCTG AAACAAGTAC ATTGACCATC TCTGGTGACA CAGTTGAAGG 
801 TGAACAAAAG AAAGAATCTT CAGTTAGATC AGTTTCTTCA GATTCTCATC 
851 AAGATGATGA AATAAGCTCA ATGGAACAAA GCACAGAAGA CAGCATGCAA 
901 GATGATACAA AACCTAAACC AAAAAAAACA AAAAAGAAGA CTAAAGCAGT 
951 TGCAGATAAT AATGAAGATG TTGATGGTGA TGGTGTTCAT GAAATAACAA 
1001 GCCGAGATAG CCCGGTTTAT CCCAAATGTT TGCTTGATGA TGACCTTGTC 
1051 TTGGGAGTTT ACATTCACCG AACTGATAGA CTTAAGTCAG ATTTTATGAT 
1101 TTCTCACCCA ATGGTAAAAA TTCATGTGGT TGATGAGCAT ACTGGTCAAT 
1151 ATGTCAAGAA AGATGATAGT GGACGGCCTG TTTCATCTTA CTATGAAAAA 
1201 GAGAATGTGG ATTATATTCT TCCTATTATG ACCCAGCCAT ATGATTTTAA 
1251 ACAGTTAAAA TCAAGACTTC CAGAGTGGGA AGAACAAATT GTATTTAATG 
1301 AAAATTTTCC CTATTTGCTT CGAGGCTCTG ATGAGAGTCC TAAAGTCATC 
1351 CTGTTCTTTG AGATTCTTGA TTTCTTAAGC GTGGATGAAA TTAAGAATAA 
1401 TTCTGAGGTT CAAAACCAAG AATGTGGCTT TCGGAAAATT GCCTGGGCAT 
1451 TTCTTAAGCT TCTGGGAGCC AATGGAAATG CAAACATCAA CTCAAAACTT 
1501 CGCTTGCAGC TATATTACCC ACCTACTAAG CCTCGATCCC CATTAAGTGT 
1551 TGTTGAGGCA TTTGAATGGT GGTCAAAATG TCCAAGAAAT CATTACCCAT 
1601 CAACACTGTA CGTAACTGTA AGAGGACTGA AAGTTCCAGA CTGTATAAAG 
1651 CCATCTTACC GCTCTATGAT GGCTCTTCAG GAGGAAAAAG GTAAACCAGT 
1701 GCATTGTGAA CGTCACCATG AGTCAAGCTC AGTAGACACA GAACCTGGAT 
1751 TAGAAGAGTC AAAGGAAGTA ATAAAGTGGA AACGACTCCC TGGGCAGGCT 
1801 TGCCGTATCC CAAACAAACA CCTCTTCTCA CTAAATGCAG GAGAACGAGG 
1851 ATGTTTTTGT CTTGATTTCT CCCACAATGG AAGAATATTA GCAGCAGCTT 
1901 GTGCCAGCCG GGATGGATAT CCAATTATTT TATATGAAAT TCCTTCTGGA 
1951 CGTTTCATGA GAGAATTGTG TGGCCACCTC AATATCATTT ATGATCTTTC 
2001 CTGGTCAAAA GATGATCACT ACATCCTTAC TTCATCATCT GATGGCACTG 
2051 CCAGGATATG GAAAAATGAA ATAAACAATA CAAATACTTT CAGAGTTTTA 
2101 CCTCATCCTT CTTTTGTTTA CACGGCTAAA TTCCATCCAG CTGTAAGAGA 
2151 GCTAGTAGTT ACAGGATGCT ATGATTCCAT GATACGGATA TGGAAAGTTG 
2201 AGATGAGAGA AGATTCTGCC ATATTGGTCC GACAGTTTGA TGTTCACATUV 
2251 AGTTTTATCA ACTCACTTTG TTTTGATACT GAAGGTCATC ATATGTATTC 
2301 AGGAGATTGT ACAGGGGTGA TTGTTGTTTG GAATACCTAT GTCAAGATTA 
2351 ATGATTTGGA ACATTCAGTG CACCACTGGA CTATAAATAA GGAAATTAAA 
2401 GAAACTGAGT TTAAGGGAAT TCCAATAAGT TATTTGGAGA TTCATCCCAA 
2451 TGGAAAACGT TTGTTAATCC ATACCATW^GA CAGTACTTTG AGAATTATGG 
2501 ATCTCCGGAT ATTAGTAGCA AGGAAGTTTG TAGGAGCAGC AAATTATCGG 
2551 GAGAAGATTC ATAGTACTTT GACTCCATGT GGGACTTTTC TGTTTGCTGG 
2601 AAGTGAGGAT GGTATAGTGT ATGTTTGGAA CCCAGAAACA GGAGAACAAG 
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2651 TAGCCATGTA TTCTGACTTG CCATTCAAGT CACCCATTCG AGACATTTCT 
2701 TATCATCCAT TTGAAAATAT GGTTGCATTC TGTGCATTTG GGCAAAATGA 
2751 GCCAATTCTT CTGTATATTT ACGATTTCCA TGTTGCCCAG CAGGAGGCTG 
2801 AAATGTTCAA ACGCTACAAT GGAACATTTC CATTACCTGG AATACACCAA 
2851 AGTCAAGATG CCCTATGTAC CTGTCCAAAA CTACCCCATC AAGGCTCTTT 
2901 TCAGATTGAT GAATTTGTCC ACACTGAAAG TTCTTCAACG AAGATGCAGC 
2951 TAGTAAAACA GAGGCTTGAA ACTGTCACAG AGGTGATACG TTCCTGTGCT 
3001 GCAAAAGTCA ACAAAAATCT CTCATTTACT TCACCACCAG CAGTTTCCTC 
3051 ACAACAGTCT AAGTTAAAGC AGTCAAACAT GCTGACCGCT CAAGAGATTC 
3101 TACATCAGTT TGGTTTCACT CAGACCGGGA TTATCAGCAT AGAAAGAAAG 
3151 CCTTGTAACC ATCAGGTAGA TACAGCACCA ACGGTAGTGG CTCTTTATGA 
3201 CTACACAGCG AATCGATCAG ATGAACTAAC CATCCATCGC GGAGACATTA 
3251 TCCGAGTGTT TTTCAAAGAT AATGAAGACT GGTGGTATGG CAGCATAGGA 
3301 AAGGGACAGG AAGGTTATTT TCCAGCTAAT CATGTGGCTA GTGAAACACT 
3351 GTATCAAGAA CTGCCTCCTG AGATAAAGGA GCGATCCCCT CCTTTAAGCC 
3401 CTGAGGAAAA AACTAAAATA GAAAAATCTC CAGCTCCTCA AAAGCAATCA 
3451 ATCAATAAGA ACAAGTCCCA GGACTTCAGA CTAGGCTCAG AATCTATGAC 
3501 ACATTCTGAA ATGAGAAAAG AACAGAGCCA TGAGGACCAA GGACACATAA 
3551 TGGATACACG GATGAGGAAG AACAAGCAAG CAGGCAGAAA AGTCACTCTA 
3601 ATAGAGTAAA GAATTGAAGA AAAGTTAAGA GCTGCCGAAA TGCACAGAGG 
3651 TGAAAATGAC AAACCAAATG GAATTTCTCT TCAGAGTTCA GAATTTTCAG 
3701 ATACTAAGGA GGAAGAAAGG ATCCACTACT TCTTGTTCTT ATGAATGACT 
3751 CTAGAAAAAT CAGAATCAAG TTGTGGGTGG AAAAATCAAC GTGGCCTTTG 
3801 AGTTCAGTTG TTATAAACCA TTGTGACTAT TGTTGGTCAA AGTATTGGTA 
3851 CTTATATTGT TAGTAATTGC ATCATAATTA CATTACCAGT GTTGGAAAAC 
3901 TAATGAAGAA AACACTGTAA TTGCTACTCA GCAAATGTGA ATAAAAGGTG 
3951 TTTGCGTTAT TAGGATGTCT GTTAAGTAAT CATTTAATAT TATTATATTG 
4001 GTAATGGTTG TATGTGTGAT GCTATGCCCA GAATATGAAG TATCTGTTTT 
4051 TGAAATTCAC TTTATTTAAA AGATAAGCAG CTGACTGGGC ACGGTGCCTC 
4101 ATGCCTGTAA TCCTAGCACC TTGGGAGGCT GAGGCAGGTG GATCACCTAA 
4151 GGTCAGGAGT TCAACAACAC CAGCCTGACC AACATGGTGA AACCCCATCT 
4201 CTACTAAAAA TACAAAAATC AGCCGGGTCT CATGGCAGGC ACCTGTAATC 
4251 CCATCTACTG AGGCAGGAGA ATTGCTTGAC CCAGGAGGCA GAGGTTGCAG 
4301 TGAGCCAAGA TCACGCCATT GCACTCCAGC CTGGGGGACA GAGCAAGACT 
4351 CTATCTCCAA AAAACAAAAA AGATAAGCAG CTTTAGAATA TGGCGCATTC 
4401 AAAACAGTCT CAGTAACAAA GACATTAAAA GAAAACAATT TACTTTCTAA 
4451 TTAAAATTTT GTGTTTCTTA AGATCAAATC ATATAGGTAA CTTCATAGAC 
4501 CTAAATTAAA AGTGATTTTT GGCTGGACTG GCAACAATGT TCCCAATGTC 
4551 TTTACTTTTT AAAAAAGGCT TTTCATATTT AAGCACATAC CTATTTTGTA 
4601 GACTTACATT GTTTAATATT TATTTTAATC TTAATATTTT TACATTATTA 
4651 TATTGCATTA TTTATTTTTT CTAAGTTCCA GAATAATAGT GTCATTATTA 
4701 TAGACTATAT GTTTTGAAGT TTGATATTAT AATGGGATAT TCATTTTTTG 
4751 TTCTTTTCTT GACTCCTTTC TCAAGTGTGT GATAAGGTCT GCTGATAAAA 
4801 TATTTAACCC CAAGAAAGTG AAAACTAATA TAAAATTAGA AAGACCTATC 
4851 CAAATTAGAC AGTCAATTCC ATTAAAATAA GAAGTGAGAA AAACAATGTT 
4901 GGGCATTGAG GTGTAAATTT TGCCCAGATG TATACCCAGT GTGATJlTATC 
4951 TTCTAATAAA AATATATTTG GCTCTTATCC CTGCACATGT AGAGGCATAA 
5001 AAATTGGTAA ACATGTCCCG CTGTGTAGAA CTTTAAAAAA AAGGCATTTT 
5051 TGAAAGTGTT GAGTGGCACT GATAACTGGT GAAGCCTACA GCCATCCGCC 
5101 CAAAAGTCTG TTCTGATGGC ACTGAGTTTT CATTGTTCTG GATGTATAAG 
5151 TCTGTGTGTC AGGTACAGCT GGGCCCAGCC AGCTTGAGTC ACTCTTGTAC 
5201 AAGCTTGTTT TTTTCTGTCT TGTGAATGCA CTTGATAATT TAAAAATAAA 
5251 AATATCTGTT TCTCTGCAAA AAAAAAA 


BLAST Results 


Entry HS32B1 from database E»fBL: 

Human DNA sequence *** SEQUENCING IN PROGRESS *** from clone 32B1 
Score = 4445, P - O.Oe+00, identities = 889/889 

Entry U93816 from database EMBL: 

Human exon-trapped sequence from 6q24 . 

Score « 965, P * 4.0e-35, identities « 193/193 


Medline entries 


No Medline entry 


Peptide information for frame 1 


ORF from 19 bp to 3606 bp; peptide length: 1196 
Category: similarity to known protein 
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1 MPTAESEAKV KTKVRFEKLL KTHSDLMREK KKLKKKLVRS EENISPDTIR 
51 SNLHYMKETT SDDPDTIRSN LPHIKETTSD DVSAANTNNL KKSTRVTKNK 
101 LRNTQLATEN PNGDASVEED KQGKPNKKVI KTVPQLTTQD LKPETPENKV 
151 DSTHQKTHTK PQPGVDHQKS EKANEGREET DLEEDEELMQ AYQCHVTEEM 
201 AKEIKRKIRK KLKEQLTYFP SDTLFHDDKL SSEKRKKKKE VPVFSKAETS 
251 TLTISGDTVE GEQKKESSVR SVSSDSHQDD EISSMEQSTE DSMQDDTKPK 
301 PKKTKKKTKA VADNNEDVDG DGVHEITSRD SPVYPKCLLD DDLVLGVYIH 
351 RTDRLKSDFM ISHPMVKIHV VDEHTGQYVK KDDSGRPVSS YYEKENVDYI 
401 LPIMTQPYDF KQLKSRLPEW EEQIVFNENF PYLLRGSDES PKVILFFEIL 
451 DFLSVDEIKN NSEVQNQECG FRKIAWAFLK LLGANGNANI NSKLRLQLYY 
501 PPTKPRSPLS WEAFEWWSK CPRNHYPSTL YVTVRGLKVP DCIKPSYRSM 
551 MALQEEKGKP VHCERHHESS SVDTEPGLEE SKEVIKWKRL PGQACRIPNK 
601 HLFSLNAGER GCFCLDFSHN GRILAAACAS RDGYPIILYE IPSGRFMREL 
651 CGHLNIIYDL SWSKDDHYIL TSSSDGTARI WKNEINNTNT FRVLPHPSFV 
701 YTAKFHPAVR ELWTGCYDS MIRIWKVEMR EDSAILVRQF DVHKSFINSL 
751 CFDTEGHHMY SGDCTGVIVV WNTYVKINDL EHSVHHWTIN KEIKETEFKG 
801 IPISYLEIHP NGKRLLIHTK DSTLRIMDLR ILVARKFVGA ANYREKIHST 
851 LTPCGTFLFA GSEDGIVYVW NPETGEQVAM YSDLPFKSPI RDISYHPFEN 
901 MVAFCAFGQN EPILLYIYDF HVACX?EAEMF KRYNGTFPLP GIHQSQDALC 
951 TCPKLPHQGS FQIDEFVHTE SSSTKMQLVK QRLETVTEVI RSCAAKVNKN 
1001 LSFTSPPAVS SQQSKLKQSN MLTAQEILHQ FGFTQTGIIS lERKPCNHQV 
1051 DTAPTWALY DYTANRSDEL TIHRGDIIRV FFKDNEDWWY GSIGKGQEGY 
1101 FPANHVASET LYQELPPEIK ERSPPLSPEE KTKIEKSPAP QKQSINKNKS 
1151 QDFRLGSESM THSEMRKEQS HEDQGHIMDT RMRKNKQAGR KVTLIE 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_ln3, frame 1 

TREMBL:U92792_1 gene: "tupl"; product: "Tupl"; Schizosaccharomyces 
porabe general transcriptional repressor Tupl (tupl) inRNA, complete 
cds., N * 1, Score 186, P = le-10 

TREMBL:AF104258_1 gene: "Pmc733"; product: "putative copper-inducible 
35.6 kDa protein**; Festuca rubra putative copper-inducible 35,6 kOa 
protein (Pmc733) mRNA, complete cds., N = 1, Score - 235, P « 4.6e-18 

TREMBL:SPAC3H5_8 gene: "SPACSHS .OBc"; product: "beta-transducin"; 
S.pombe chromosome I cosmid c3H5., N = 2, Score = 231, P = 2e-14 

PIR:T02533 hypothetical protein F13M22.17 - Arabidopsis thaliana, N « 
2, Score = 228, P « le-13 

TREMBL:AF104258_1 gene: -Pmc733"; product: "putative copper-inducible 
35.6 kDa protein"; Festuca rubra putative copper-inducible 35.6 kDa 
protein (Pmc733) mRNA, complete cds., N = 1, Score 235, P « 4.6e-18 

TREMBL:SPAC3H5_8 gene: •'SPAC3H5 . 08c" ; product: "beta-transducin"; 
S.pombe chromosome I cosmid c3H5., N = 2, Score = 231, P = 2e-14 

TREMBL:CER03E1__1 gene: -R03E1.1'*; Caenorhabditis elegans cosmid R03E1, 
N = 1, Score = 215, P = 2.3e-13 

SWISSPR0T:Y2LL_CAEEL HYPOTHETICAL 43.1 KD TRP-ASP REPEATS CONTAINING 

PROTEIN K04G11.4 IN CHROMOSOME X., N = 1, Score = 203, P =r 7.1e-13 


>TREMBL:AF104258_1 gene: -Pmc733**; product: "putative copper-inducible 35.6 
kDa protein"; Festuca rubra putative copper-inducible 35.6 kDa protein 
(Pmc733) mRNA, complete cds. 
Length = 321 

HSPs: 

Score « 235 (35.3 bits), Expect = 4.6e-18, P « 4.6e-18 

Identities 59/225 (26%), Positives = 111/225 (49%) 

Query: 647 MRELCGHLNIIYDLSWSKDDHYILTSSSDGTARIWKNEINNTNTFRVLPHPSFVYTAKFH 706 

+ E GH + I DLSWSK+ +L++S D T R+W ++ + +v H ++V +F+ 
SbjCt:. 63 VHEFYGHGDAILDLSWSKNGD-LLSASMDKTVRLW— QVGRDSCLKVFSHTNYVTCVQFN 119 

Query: 707 PAVRELWTGCYDSMIRIWKVEMREDSAILVRQFDVHKSFINSLCFDTEGHHMYSGDCTG 766 

P +TGC D ++RIW V LV + K + ++C+ +G +G TG 

Sbjct: 120 PTNGNYFITGCIDGLVRIWDVRK CLWDWANSKEIVTAVCYRPDGKGAVAGTITG 174 

Query: 767 VIWWNTYVKINDLEHSVHHWTINKEIKETEFKGIPISYLEIHPNGKRLLIHTKDSTLRI 826 
++ +LE V ++N K + + Y P K+L++ + D+ +RI 
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Sbjct: 175 NCRYYDASENRLELESQV SLNGRKKSLHKRIVGFQYCPSDP — KKLMVTSGDAQVRI 229 

Query: 827 MDLRILVARKFVGAANYREKIHSTLTPCGTFLFAGSEDGIVYVWN 871 

+D +++ + G+ +++TPG+ + S+D +Y+WN 
Sbjct: 230 LDGAHVISN-YKGLQS-SSQVARSFTPDGDHIVSASDDSRIYMWN 272 


Pedant information for DKFZphtes3_ln3, frame 1 


Report for DKF2phtes3_ln3 . 1 


[LENGTH) 
^^4W] 
[pll 
[HOMOLl 
C14B1.4 IN 
[FUNCATl 
[FUNCATJ 
TAF90 - TFI 
[FUNCAT] 
4e-10 
[FUNCATl 
(FUNCAT) 
[FUNCAT) 
(FUNCAT) 
[ FUNCAT] 
9e-08 
(FUNCAT) 
YDL145cJ 9e-08 
[FUNCAT) 
[FUNCAT) 
[FUNCAT] 
(FUNCAT) 
(FUNCAT) 
(FUNCAT] 
YMR116C] 4e-06 
(FUNCAT] 
[FUNCAT] 
4e-05 
(FUNCAT] 
(FUNCAT] 
(FUNCAT) 
[BLOCKS] 
[SCOP] 
(SCOP I 
[SCOP] 
(SCOP) 
(SCOP) 
(SCOP) 
(SCOP) 
(SCOP) 
[SCOPj 
[SCOP] 
(SCOP) 
(SCOP] 
(SCOP) 
(SCOP) 
(EC) 
(EC) 
(ECl 
(EC] 
[PIRKW] 
(PIRKW) 
[PIRKW) 
[PIRKW] 
[PIRKW] 
(PIRKW) 
[PIRKW] 
(PIRKW] 
[PIRKW] 
[PIRKW] 
(PIRKW) 
[PIRKW] 
(PIRKW] 
[PIRKW] 
[PIRKW) 
[PIRKW] 
[PIRKW] 
[PIRKW] 


1196 

137114.70 
6.79 

SWISSPR0T:YKy4_CAEEL HYPOTHETICAL 40.4 KD TRP-ASP REPEATS CONTAINING PROTEIN 
CHROMOSOME III. 8e-21 

99 unclassified proteins [S. cerevisiae, YKL121w] 2e-ll 

04.05.01.01 general transcription activities [S. cerevisiae, YBR198c 

ID subunit) 4e-10 

30.10 nuclear organization [S, cerevisiae, YBRl98c TAF90 - TFIID subunit] 

06.10 assembly of protein complexes (S. cerevisiae, YPR178w) le-08 

04.05.03 mrna processing (splicing) [S. cerevisiae, YPR178wJ le-08 

03.22 cell cycle control and mitosis [S. cerevisiae, YDR364ci 4e-08 

03.16 dna synthesis and replication (S. cerevisiae, YDR364ci 4e-08 

08.07 vesicular transport (golgi networJc, etc.) (S. cerevisiae, YDLl45c] 


30.09 organization of intracellular transport vesicles (S. cerevisiae, 

04.05.01.04 transcriptional control (S. cerevisiae, YCR084c) 2e-07 
10.99 other signal-transduction activities (S. cerevisiae, YHL002wJ 7e-07 
98 classification not yet clear-cut [S. cerevisiae, YFR024c-a] 2e-06 
02.16 fermentation (S. cerevisiae, YMR116cJ 4e-06 

30.03 organization of cytoplasm (S. cerevisiae, YMR116cJ 4e-06 

05.04 translation (initiation, elongation and termination) (S. cerevisiae, 

03.10 sporulation and germination (S. cerevisiae, YFLOOSw] 4e-05 

03.04 budding, cell polarity and filament formation (S. cerevisiae, YFLOOSw) 

30,04 organization of cytoskeleton [S. cerevisiae, YFLOOSw] 4e-05 
03.01 cell growth (S. cerevisiae, YCR088wJ 6e-05 

03.25 cytokinesis [S. cerevisiae, YCR057cl 7e-05 

BL00024H 

dltbgd_ 2.4 6.3.1.1 betal-subunit of the signal-transducing 3e-91 

dlgfc 2.21.2.1.9 Growth factor receptor-bound protein 2 (GRB2), N 4e-14 

dlfm]c_l 2.21.2.1.8 (1-64) c-src tyrosine kinase [human (Horn 5e-15 
dladSbl 2.21.2.1.7 (1-63) Hemapoetic cell kinase Hck (human (Horn 3e-15 
dllckal 2.21.2.1.16 (1-54) p56-lck tyrosine kinase, SH3 domain [huma le-13 
dlqwea_ 2.21.2.1.15 Src kinase, SH3 domain (Avian sarcoma virus 2e-15 

dlshg 2,21.2.1.6 alpha-Spectrin, SH3 domain [chicken (Gallu 2e-13 

dlprmc_ 2.21.2.1-13 Src kinase, SH3 domain [chicken (Callus gallus) 2e-15 

dlhsq 2.21.2.1.12 Phospholipase C, SH3 domain (human (Horn 2e-13 

diaboa_ 2.21,2.1,3 Abl tyrosine kinase, SH3 domain [Mouse (Mu 3e-13 

dlefna_ 2.21.2.1.2 Fyn, SH3 domain (human (Homo sapiens) 2e-15 

dlsema_ 2.21.2.1.11 Growth factor receptor -bound protein 2 (GRB2), N le-13 

dlgbqa_ 2.21.2.1.10 Growth factor receptor-bound protein 2 (GRB2), N 3e-16 

dlckaa_ 2.21.2.1.1 C-Crk, N-terminal SH3 domain (mouse (Mu 3e-15 

3.1.4.3 Phospholipase C 2e-07 

3.1.4.11 l-Phosphatidylinositol-4,5-bisphosphate phosphodiesterase 7e-07 

3.6.1.32 Myosin ATPase 7e-07 

2,7,1.112 Protein-tyrosine kinase 8e-06 

nucleus 2e-08 

phosphotransferase 8e-06 

plasma 4e-07 

duplication 4e-07 

phosphoric diester hydrolase 2e-07 

tandem repeat 7e-07 

hormone 4e-07 

transmembrane protein 2e-06 
stomach 4e-07 
actin binding 7e-07 
ATP 7e-07 

phosphoprotein 7e-07 

signal transduction 7e-09 

heterotrimer 7e-09 

P-loop 7e-07 

hydrolase 7e-07 

transcription regulation 5e-06 

GTP binding 7e-09 
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(SUPFAMJ 
{SUPFAMJ 
(SUPFAMJ 
(SUPFAM] 
[SOPFAM] 
(SUPFAM] 
(SUPFAM] 
(SUPFAMJ 
(SUPFAM] 
07 

(SUPFAM] 
[SUPFAMJ 
07 

[SUPFAMJ 

(SUPFAMJ 

(PROSITEJ 

(PROSITEJ 

( PROSITEJ 

[PROSITEJ 

(PROSITEJ 

(PROSITEJ 

(PROSITEJ 

[PROSITEJ 

(PFAMJ 

(PFAMJ 

(KWJ 

(KWJ 

(KWJ 

(KWJ 


l-phosphatidylinositol-4,5-bisphosphate phosphodiesterase II 2e-07 

SH3 homology 2e-07 
SH2 homology 2e-07 

protozoan myosin heavy chain IB 7e-07 
myosin motor domain homology 7e-07 
pleckstrin repeat homology '26-07 
protein-tyrosine Icinase src 8e-06 
WD repeat homology 3e-12 

l-phosphatidylinositol-4,5-bisphosphate phosphodiesterase domain Y homology 2e- 
protein Icinase homology 8e-06 

l-phosphatidylinositol-4,5-bisphosphate phosphodiesterase domain X homology 2e- 

GTP-binding regulatory protein beta chain 7e-09 

yeast coatomer complex alpha chain 4e-07 

ROD 1 

MYRISTYL 6 

AMIDATION 2 

CAMP_PHOSPHO_SITE A 

CK2_PH0SPH0_SITE 25 

TYR_PHOSPHO_SITE 4 

PKC~PHOSPHO_SITE 19 

ASNGLYCOSYLATION 6 

Src homology domain 3 

WD domain, G-beta repeats 

Irregular 

3D 

LOWCOMPLEXITY 5.77 % 

COILED COIL 2.42 % 


SEQ MPTAESEAKVKTKVRFEKLLKTHSDLMREKKKLKKKLVRSEENISPDTIRSNLHYMKETT 
SEG xxxxxxxx 

COILS ccccccccccccccccccccccccccccc! 

IgotB 


SEQ SDDPDTIRSNLPHIKETTSDDVSAANTNNLKKSTRVTKNKLRNTQLATENPNGDASVEED 
SEG 


COILS 
IgotB 


SEQ KQGKPNKKVIKTVPQLTTQDLKPETPENKVDSTHQKTHTKPQPGVDHQKSEKANEGREET 

XXX 

COILS 

IgotB !!!!!!!!!!!!!.! 


SEQ DLEEDEELMQAYQCHVTEEMAKEIKRKIRKKLKEQLTYFPSDTLFHDDKLSSEKRKKKKE 
xxxxxxxx xxxxxxxxxxxxxxx xxxxxxxxxxxx 


SEG 
COILS 
IgotB 


SEQ VPVFSKAETSTLTISGDTVEGEQKKESSVRSVSSDSHQDDEISSMEQSTEDSMQDDTKPK 


SEG 
COILS 
IgotB 


. xxxxxxxxxx xxxx 


SEQ PKKTKKKTKAVADNNEDVD6DGVHEITSRDSPVYPKCLLDDDLVLGVYIHRTDRLKSDFM 

SEG xxxxxxxxx 

COILS 

IgotB 


SEQ ISHPMVKIHWDEHTGQYVKKDDSGRPVSSYYEKENVDYILPIMTQPYDFKQLKSRLPEW 

SEG 

COILS 

IgotB 


SEQ EEQIVFNENFPYLLRGSDESPKVILFFEILDFLSVDEIKNNSEVQNQECGFRKIAWAFLK 

SEG 

COILS 

IgotB [\[[ 


SEQ LLGANGNANINSKLRLQLYYPPTKPRSPLSVVEAFEWWSKCPRNHYPSTLYVTVRGLKVP 

SEG 

COILS !!!!!!!!!!!! 

IgotB !!!!!!!!!! 


SEQ DCIKPSYRSMMALQEEKGKPVHCERHHESSSVDTEPGLEESKEVIKWKRLPGQACRIPNK 

SEG 

COILS 

IgotB 
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SEQ 
SEG 
COILS 
IgotB 


HLFSLNAGERGCFCLDFSHNGRI LAAACASRDGYPI I LYEI PSGRFMRELCGHLNI I YDL 


CEEEEEECCCCCEEEE 


SEQ SWSKDDHYILTSSSDGTARIWKNEINNTNTFRVLPHPSFVYTAKFHPAVRELVVTGCYDS 

SEG 

COILS 

IgotB EETTTTTEEEEEETTTEEEEEETT — TTCEEEEEETTTCEEEEEETTT-TCEEEEEETTT 

SEQ MIRIHKVEMREDSAILVRQFDVHKSFINSLCFDTEGHHMYSGDCTGVIWWNTYVKINDL 

SEG 

COILS 

IgotB EEEEEETTTTTBTTEEEEEEECCCCCE-EEEEEEETTEEEEEETTTEBEEEE 

SEQ EHSVHHWTINKEIKETEFKGIPISYLEIHPNGKRLLIHTKDSTLRIMDLRILVARKFVGA 

SEG 

COILS 

IgotB 

SEQ ANYREKIHSTLTPCGTFLFAGSEDGIVYVWNPETGEQVAMYSDLPFKSPIRDISYHPFEN 

SEG 

COILS 

IgotB 

SEQ MVAFCAFGQNEPILLYIYDFHVAQQEAEMFKRYNGTFPLPGIHQSQDALCTCPKLPHQGS 

SEG 

COILS 

IgotB 

SEQ FQIDEFVHTESSSTKMQLVKQRLETVTEVIRSCAAKVNKNLSFTSPPAVSSQQSKLKQSN 

SEG 

COILS 

IgotB 

SEQ MLTAQEILHQFGFTQTGIISIERKPCNHQVDTAPTVVALYDYTANRSDELTIHRGDIIRV 

SEG 

COILS 

IgotB 

SEQ FFKDNEDWWYGSIGKGQEGYFPANHVASETLYQELPPEIKERSPPLSPEEKTKIEKSPAP 


SEQ QKQSINKNKSQDFRLGSESMTHSEMRKEQSHEDQGHIMDTRMRKNKQAGRKVTLIE 

SEG 

COILS 

IgotB , 


SEG 
COILS 

IgotB 


Prosite for DKFZphtes3_ln3 . 1 


PSOOOOl 
PSOOOOl 
PSOOOOl 
PSOOOOl 
PSOOOOl 
PSOOOOl 
PS00004 
PS00004 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PSOOOOS 
PS00005 
PSOOOOS 
PSOOOOS 
PSOOOOS 


1190->1194 


1000->1004 
1065->1069 
1148->1152 
91->95 


170->173 
232->235 
268->271 
304->307 
327->330 
352->355 
384->387 
440->443 
533->536 
546->549 
643->64e 
677->680 
690->693 
702->705 


264->268 
305->309 


460->464 
686->690 
934->938 


48->Sl 
66->69 
93->96 


ASN_GLYCOS YL AT ION 
ASN_GLYCOSYLATION 
ASN GLYCOSYLATION 
ASN^GLYCOSYLATION 
ASN^GLYCOSYLATION 
ASNGL YCOS YLAT I ON 
CAMP_PHOSPHO_SITE 
CAMP_PHOSPHO_SITE 
CAMP_PHOSPHO_SITE 
CAMP PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC_PHOS PHO_S I TE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO__SITE 
PKC PHOSPHORS ITE 
PKC"PHOS PHO_S I TE 
PKC^PHOS PHO_S I TE 
PKC~PHOS PHO_S I TE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC_PHOS PHO__S ITE 
PKC_PHOS PHO_S ITE 
PKC_PHOSPHO_SITE 
PKC_PHOS PHO_S ITE 
PKC PHOSPHORS I TE 
PKC~PHOSPHO_S ITE 
PKC PHOSPHO SITE 


PDOCOOOOl 
PDOCOOOOl 
PDOCOOOOl 
PDOCOOOOl 
PDOCOOOOl 
PDOCOOOOl 
PEXX:00004 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOCOOOOS 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
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PS00005 

823->826 

PKC PHOSPHO 

SITE 

PDOC00005 

PS00005 

973->976 

PKC PHOSPHO' 

"SITE 

PDOC00005 

PSOO006 

22->26 

CK2_PH0SPH0" 

"site 

PDOC00006 

PS00006 

59'>63 

CK2 PHOSPHO' 

"site 

PDOC00006 

PS00006 

77->81 

CK2 PHOSPHO" 

"site 

PDOC00006 

PS00006 

116->120 

CK2 PHOSPHO' 

"site 

PDOC00006 

PS00006 

137->141 

CK2 PHOSPHO" 

"site 

PDOC00006 

PS00006 

180->184 

CK2 PHOSPHO' 

"site 

PDOC00006 

PS00006 

245->249 

CK2 PHOSPHO" 

"site 

PDOC00006 

PS00006 

276->280 

CK2 PHOSPHO' 

'site 

PDOC00006 

PS00006 

283->287 

CK2 PHOSPHO" 

"site 

PDOC00006 

PS00006 

288->292 

CK2 PHOSPHO' 

'site 

PDOC00006 

PS00006 

292->296 

CK2 PHOSPHO" 

"site 

PDOC00006 

PS00006 

327->331 

CK2 PHOSPHO" 

"site 

PDOC00006 

PS00006 

390->394 

CK2 PHOSPHO" 

'site 

PDOC00006 

PSO0006 

454->458 

CK2 PHOSPHO" 

"site 

POOC00006 

PS00006 

510->514 

CK2 PHOSPHO" 

'site 

PDOC00006 

PS00006 

570->574 

CK2 PHOSPHO"" 

"site 

PDOC00006 

PS00006 

663->667 

CK2 PHOSPHO" 

"site 

PDOC00006 

PS00006 

672->676 

CK2 PHOSPHO 

site 

PDOC00006 

PS00006 

804->808 

CK2 PHOSPHO 

"site 

PDOC00006 

PS00006 

985->989 

CK2 PHOSPHO 

"site 

PDOC00006 

PS00006 

1023->1027 

CK2 PHOSPHO" 

"site 

PDOC00006 

PS00006 

1127->1131 

CK2 PHOSPHO" 

"site 

PDOC00006 

PS00006 

1132->1136 

CK2 PHOSPHO" 

"site 

PDOC00006 

PS00006 

1161->1165 

CK2 PHOSPHO 

site 

PDOC00006 

PS00006 

li70->1174 

CK2 PHOSPHO 

site 

PDOC00006 

PS00007 

1083->1091 

TVR PHOSPHO 

site 

PDOC00007 

PS00007 

211->219 

TYR PHOSPHO 

site 

PDOC00007 

PS00007 

1083->1091 

TYR PHOSPHO" 

"site 

PDOC00007 

PS00007 

210->219 

TYR PHOSPHO 

site 

PDOC00007 

PS00008 

483->489 

MYRISTYL 


PDOC00008 

PS00008 

577->583 

MYRISTYL 


PDOC00008 

PS00008 

716->722 

MYRISTYL 


. PDOC00008 

PS00008 

800->806 

MYRISTYL 


PDOC00008 

PS00008 

86l->867 

MYRISTYL 


PDOC00O08 

PS00008 

941->947 

MYRISTYL 


PDOC00008 

PS00009 

811->815 

AMIDATION 


PDOC00009 

PS00009 

1188->1192 

AMIDATION 


PDOC00009 

PS00016 

1074->1077 

RGD 


PDOC00016 


Pfam for DKFZphtes3_ln3 . 1 

HMM_NAME WD domain, G-beta repeats 

HMM *MrGHnnWVWCVaFSPDGrWFIvSGSWDgTCRLWD* 

+ GH+N ++++++S D ++ I+++S DGT R+W 
Query 650 LCGHLNII YDLSWSKDDHY-ILTSSSDGTARIWK 682 


HMM_NAME Src homology domain 3 

HMM *py VIALYDYqAqdpDELSFJcEGDI I i II EdsDD . WWrgRnnnTNGQEGW 

P+V+ALYDY+A+++DEL++ +GDII + ++++ WW+G GQEG+ 
Query 1054 PTWALYDYTANRSDELTIHRGDIIRVFFKDNEDWWYGSIGK — GQEGY 1100 

HMM IPSNYVEPi* 
+P+N V+ + 

Query 1101 FPANHVASE 1109 


689 


wo 01/12659 


PCT/IBOO/01496 


DKF2phtes3_20c21 


group: testes derived 

DKF2phtes3_20c21 encodes a novel 708 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 


unknown 

Sequenced by MediGenomix 
Locus: /map«"22qll.2-12.2" 
Insert length: 3997 bp 

Poly A stretch at pos. 3877, polyadenylation signal at pos. 3853 


1 GGTAGGCGGG GCGGCGCGTG ACCTAAGGCC TCTCTGCCGC GCGCGCAGGT 
51 ACGGGGCAGA AGTCGCAGGT ACCCAGCTGC TGCCCACGTT TCTGGTCCAG 
101 AGTCCCGAAC CCCGAGCACT GGGATGCCTG GCTACTCCGA GCCAAGGCAC 
151 TGATGTTTGA ACTGGAAACT TCAAAACGTT TAATAAGAGT CTTCAGGATG 
201 GGTTTGAACT AGACAAGCTA GAAATTTCTT TAGAACACCA GCTCTAGCAT 
251 GCATCTCCCA CTTTTGGCTT TCCTGGAGAG GAGCTTGAAG AGGTGGTTCT 
301 GCAGACAGCC ACAGTGATAC TCAGGAAACC AGAGGAATGG ATTTGACTTT 
351 TCTGCTAGGA TTCTTTGTTA TAGTTTCTCC CTGAGTTGTA AGAGGCATGG 
401 AAATATACAT GAAACTGAAG AACCTGCAAG GAAGGGAAGT GGAACTTTCC 
4 51 ATGCTGAGTG AAAACTAACC AAGTGGCAGT TGTGACTGAA AACACTGAAA 
501 CCTACCACGT CCAGATTCAC TGGATTGGGG GATAGAGGAA CGGTCACAGC 
551 TAGGGAGAAA GAAGTGATAC CGGAAAAGAA AACCTAAATG AAGAGAATGA 
601 GGATGACTGC ACAGTAGATG GCCACCTCTA CCTCCACAGA GGCAAAGTCA 
651 GCCTCGTGGT GGAATTATTT TTTTCTTTAT GATGGTTCCA AGGTAAAGGA 
701 AGAAGGCGAT CCAACAAGAG CTGGCATTTG TTACTTTTAT CCTTCCCAGA 
751 CCCTGCTAGA CCAACAGGAG TTGCTTTGTG GACAGATTGC TGGAGTTGTC 
801 CGCTGTGTTT CTGACATTTC TGACTCTCCT CCTACTCTTG TTCGTCTGAG 
851 AAAACTGAAG TTTGCCATAA AAGTTGATGG AGATTACCTT TGGGTGCTGG 
901 GCTGTGCTGT GGAGCTCCCT GATGTCAGCT GCAAGCGGTT TCTGGATCAG 
951 CTAGTTGGAT TCTTTAATTT TTACAATGGA CCTGTTTCCC TAGCTTATGA 
1001 GAACTGTTCT CAGGAAGAAC TGAGCACGGA GTGGGACACC TTCATCGAGC 
1051 AAATTCTGAA AAACACCAGT GATCTGCATA AGATTTTCAA TTCCCTCTGG 
1101 AACTTGGACC AAACTAAAGT GGAGCCCCTG TTGTTGCTGA AGGCAGCCCG 
1151 CATTCTGCAG ACCTGCCAGC GCTCGCCTCA CATTCTCGCT GGCTGCATCC 
1201 TCTATAAAGG ACTGATTGTC AGCACCCAAC TCCCGCCCTC CCTCACCGCC 
1251 AAGGTCCTGC TTCACCGAAC AGCACCTCAG GAGCAGAGAC TCCCTACGGG 
1301 AGGGGATGCC CCGCAGGAAC ATGGAGCGGC ATTGCCCCCG AATGTCCAGA 
1351 TTATCCCTGT TTTTGTGACC AAAGAGGAAG CCATTAGTCT CCACGAGTTC 
1401 CCGGTGGAAC AGATGACAAG GTCTCTAGCA TCTCCAGCAG GACTCCAGGA 
1451 TGGTTCAGCC CAGCACCATC CAAAGGGTGG GAGCACATCT GCCCTGAAAG 
1501 AAAACGCCAC TGGCCATGTG GAATCCATGG CCTGGACCAC CCCAGATCCC 
1551 ACATCCCCTG ACGAAGCTTG TCCAGATGGC AGGAAGGAGA ACGGATGCTT 
1601 GTCTGGCCAT GATCTGGAGA GCATCAGGCC CGCAGGACTG CACAACTCTG 
1651 CCAGGGGTGA GGTTCTTGGC CTCAGCTCCT CCCTGGGGAA GGAACTAGTC 
1701 TTTCTCCAAG AAGAACTCGA CTTGTCTGAA ATCCACATTC CAGAGGCTCA 
1751 GGAAGTGGAA ATGGCCTCAG GTCATTTTGC CTTCCTACAT GTGCCTGTTC 
1801 CAGATGGCAG GGCTCCTTAC TGCAAGGCAT CtCTCAGCGC CTCCAGCAGC 
1851 CTGGAACCCA CGCCTCCTGA GGACACAGCC ATCAGCAGCT TGCGCCCTCC 
1901 CTCTGCTCCT GAGATGCTGA CCCAGCATGG AGCCCAAGAG CAGGTCGAAG 
1951 ACCATCCTGG CCATAGCAGC CAAGCCCCCA TTCCCAGAGC AGACCCTCTC 
2001 CCCAGAAGGA CCCGCAGGCC CTTGTTATTG CCTCGCTTAG ATCCAGGACA 
2051 GAGAGGAAAC AAGCTTCCCA CGGGGGAACA AGGCCTGGAT GAGGATGTTG 
2101 ATGGGGTCTG TGAAAGCCAC GCAGCCCCTG GTCTGGAATG CAGTTCAGGC 
2151 TCAGCAAACT GTCAGGGTGC TGGCCCCTCT GCAGATGGAA TCAGCTCCAG 
2201 GCTGACACCA GCAGAGTCCT GCATGGGGCT CGTGAGGATG AATCTCTACA 
2251 CTCACTGCGT CAAAGGGCTG ATGCTGTCCC TGCTGGCTGA GGAGCCGCTG 
2301 CTGGGAGACA GCGCAGCCAT AGAGGAAGTG TACCACAGCA GCCTGGCTTC 
2351 ACTGAATGGG CTGGAAGTCC ACCTGAAAGA GACGCTGCCC AGGGATGAGG 
2401 CAGCCTCCAC GAGCAGCACC TACAACTTCA CATATTACGA CCGCATTCAG 
2451 AGCTTGCTGA TGGCAAACCT GCCGCAGGTG GCCACCCCGC ATGATCGCCG 
2501 CTTCCTCCAG GCCGTCAGCC TGATGCATAG CGAATTTGCC CAGCTGCCCG 
2551 CGCTTTATGA AATGACTGTC AGAAATGCCT CCACGGCTGT GTACGCCTGT 
2601 TGCAACCCCA TCCAGGAGAC ATATTTCCAG CAGCTGGCAC CTGCAGCACG 
2651 GAGCTCCGGC TTCCCAAACC CTCAGGATGG CGCCTTCAGC CTCTCCGGCA 
2701 AAGCAAAGCA GAAGCTGCTG AAGCACGGGG TGAACTTGCT CTGAACTGCA 
2751 CCCAGGAGGT GACTGGGAAG GAG/VAAACCA GCAAAGGAAG CTCTGCCTTT 
2801 TATAATTGAA AAGGCCCCTC TATTTTATTT TTCTTGAAAA CATTCCCTTT 
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2851 TTTAGGAACC AAATGATATT TGAGTTTTTG TTATTCCTTT TGCAGATTGG 
2901 GATGTGTTTT GGGGGCAGGG GTTAGTTCTT CAGGTCGGCA GACCCAGAGC 
2951 ACTTGATAAA GAACTGTATT TAATCGGTAG TGTTGGGGCC GGGACGGGCT 
3001 TGGCTCCCTC TCTGCCATAC TGAGCCTGAG GTATTTCATA TCTCCTGCTG 
3051 TTCCATCCCA GCTTGAATTG GTGCCACAAG CTTCCAAGTT GGCATTTTTT 
3101 CTAGAACCTG ATCGTCCACT AGCCCAGAGT GTGTGTGTTC AACCCCCACA 
3151 CCAGGTGGTG GTAGGCGGTG TGACTGCACA GCGAGGTGCC GGATCTGTGA 
3201 GCAGGCCGAC TCCACTCCCA CGCCGCAGGT AGGTTTCTCC AGTGCGCTCT 
3251 TGCTGGGAGG TCCGGATCGT TCCTGCAGGG AAGCGGCAGC ACACGGAGAC 
3301 CACTTGGTTG AATTCTGTTG GAACTCTACT CAAATCTAGG GGCGTCTTCT 
3351 TTGGACCCAC AATGGGGGCA AGCCTTAATA ATATGGAAGG GAGTTTGGGC 
3401 TTTAGAGATC CCTTTATAAA AGCTCTGGGG GCTGAGCCCT GAGAATTCAG 
3451 TGACAACAGG ACCAACCTGC GCTGCCTTTG ACTACAAGTG GGCCGTGCAG 
3501 CTGGTTCCTC TCGAGCGAGT GTCCCTAAAT AGGAGTTTAC AAGATGTCTG 
3551 GGGGTAAAAG CACTGTGCTT TTCAGTGGTG GCTGCGTGTU^ AGGGAGCGAC 
3601 ACTCAGCTGT GTGTTCCTGG GCTTGTGTGG TACTTAGAAC CTCAGTTCTA 
3651 TTACGTTATA GTCAGACATT TTTTTGACAG TATGAGACAG ACTGCAGGAT 
3701 GAAAATATTT GTCAAAATCT TAACTGAATG TTTACTGGAA GTACTTGAGA 
3751 TTCCATTTGA GAGTTGTATT GTTAATAATT TCATGTCAGT GAACTGATAT 
3801 CTGATGTTTA TGATATGGTG TCTTTTTCTT GAAACAAGCT TCCAAGGGCT 
3851 AGAAATAAAA TAGCCAAAAA ATGCTGGAAA AAAAAAAAAA AAAAAAAAAA 
3901 AAAAAAA/^ AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
3951 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAA 


BLAST Results 


Entry HS1048E9 from database EMBLNEW: 

Human DNA sequence from clone 1048E9 on chromosome 22qll.2-12.2 
Contains pseudogene similar to ribosomal protein S3A and part of a gene 
similar to C.elegans protein CE02118, ESTs, STS, GSS. 
Score « 6540, P =» O.Oe+00, identities « 1308/1308 
-14 exons 


Medline entries 


Ho Medline entry 


Peptide information for frame 3 


ORF from 618 bp to 2741 bp; peptide length: 708 
Category: putative protein 
Classification: no clue 

1 MATSTSTEAK SASWWNYFFL YDGSKVKEEG DPTRAGICYF YPSQTLLDQQ 
51 ELLCGQIAGV VRCVSDISDS PPTLVRLRKL KFAIKVDGDY LWVLGCAVEL 
101 PDVSCKRFLD QLVGFFNFYN GPVSLAYENC SQEELSTEWD TFIEQILKNT 
151 SDLHKIFNSL WNLDQTKVEP LLLLKAARIL QTCQRSPHIL AGCILYKGLI 
201 VSTQLPPSLT AKVLLHRTAP QEQRLPTGGD APQEHGAALP PNVQIIPVFV 
251 TKEEAISLHE FPVEQMTRSL ASPAGLQDGS AQHHPKGGST SALKENATGH 
301 VESMAWTTPD PTSPDEACPD GRKENGCLSG HDLESIRPAG LHNSARGEVL 
351 GLSSSLGKEL VFLQEELDLS EIHIPEAQEV EMASGHFAFL HVPVPDGRAP 
401 YCKASLSASS SLEPTPPEDT AISSLRPPSA PEMLTQHGAQ EQVEDHPGHS 
4 51 SQAPIPRADP LPRRTRRPLL LPRLDPGQRG NKLPTGEQGL DEDVDGVCES 
501 HAAPGLECSS GSANCQGAGP SADGISSRLT PAESCMGLVR MNLYTHCVKG 
551 LMLSLLAEEP LLGDSAAIEE VYHSSLASLN GLEVHLKETL PRDEAASTSS 
601 TYNFTYYDRI QSLLMANLPQ VATPHDRRFL QAVSLMHSEF AQLPALYEMT 
651 VRNASTAVYA CCNPIQETYF QQLAPAARSS GFPNPQDGAF SLSGKAKQKL 
701 LKHGVNLL 


BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_20c21, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_20c21, frame 3 


Report for DKF2phtes3_20c21 .3 
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(LENGTH] 708 

(MW] 76900.23 

[plj 5.30 

(KWl Alpha_Beta 

IKW) LOW^COMPLEXITY 6.36 % 


SEQ MATSTSTEAKSASWWNYFFLYDGSKVKEEGDPTRAGICYFYPSQTLLDQQELLCGQIAGV 

SEG - xxxxxxxxxxxx 

PRD ccccccccccccccceeeeeccccccccccccccccceeeeccchhhhhhhhhhhcccee 

SEQ VRCySDISDSPPTLVRLRKLKFAIKVDGDYLWVLGCAVELPDVSCKRFLDQLVGFFNFYN 

SEG 

PRD eeeeeeccccccchhhhhhhhheeeeccceeeeeeeeeecccccchhhhhhhhheeeecc 

SEQ GPVSLAYENCSQEELSTEWDTFIEQILKNTSDLHKIFNSLWKLDQTKVEPLLLLKAARIL 

SEG 

PRD ccccccccccchhhhhhhhhhhhhhhhhhcchhhhhhhcccccccccchhhhhhhhhhhh 

SEQ QTCQRSPHILAGCILYKGLIVSTQLPPSLTAKVLLHRTAPQEQRLPTGGDAPQEHGAALP 

SEG 

PRD hhhhccccchhhhhhhcccccccccccchhhhhhhhhccccccccccccccccccccccc 

SEQ PNVQII PVFVTKEEAI SLHEFPVEQMTRSLASPAGLQDGSAQHHPKGGSTSALKENATGH 

SEG 

PRD ccceeeeeeeecccceeeccccchhhhhhhccccccccccccccccccchhhhhhhcccc 

SEQ VESMAWTTPDPTSPDEACPDGEUCENGCLSGHDLESIRPAGLHNSARGEVLGLSSSLGKEL 

SEG 

PRD ccccccccccccccccccccccccccccccccccccccccccccccceeeeeccccchhh 

SEQ VFLQEELDLSEIHIPEAQEVEMASGHFAFLHVPVPDGRAPYCKASLSASSSLEPTPPEDT 

SEG 

PRD hhhhhhhcccccccccchhhhhhccceeeeeecccccccceeeccccccccccccccccc 

SEQ AISSLRPPSAPEMLTQHGAQEQVEDHPGHSSQAPIPRADPLPRRTRRPLLLPRLDPGQRG 

SEG xxxxxxxxxxxxxxxxxxxxx .... 

PRD cccccccccchhhhhhccccceeecccccccccccccccccccccccccccccccccccc 

SEQ NKLPTGEQGLDEDVDGVCESHAAPGLECSSGSANCQGAGPSADGISSRLTPAESCMGLVR 

SEG 

PRD ccccccccccccccccccccccccccccccccccccccccccccccccccccccceeeee 

SEQ MNLYTHCVKGLMLSLLAEEPLLGDSAAIEEVYHSSLASLNGLEVHLKETLPRDEAASTSS 

SEG xxxxxxxxxxxx 

PRD ceeeeeeehhhhhhhhhccccccchhhhhhhhhhccccccchhhhhhhcccccccccccc 

SEQ TYNFTYYDRIQSLLMANLPQVATPHDRRFLQAVSLMHSEFAQLPALYEMTVRNASTAVYA 

SEG 

PRD ccceeeehhhhhhhhhcccccccccchhhhhhhhhhhhhhhcchhhhhhhhhccceeeee 

SEQ CCNPIQETYFQQLAPAARSSGFPNPQDGAF5LSGKAKQKLLKHGVNLL 

SEG 

PRD eccchhhhhhhhhhhhhhhcccccccccceeecchhhhhhhhhccccc 


(No Prosite data available for DKFZphtes3_20c21 . 3} 
(Mo Pfam data available for DKFZphtes3_20c21 .3) 
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DKFZphtes3_.20k2 


group: signal transduction 


DKrzphtes3_20k2 encodes a novel 839 amino acid protein with strong similarity to rat vanilloid ' 
receptor subtype 1. 

VRl seems to play an important role in the activation and sensitization of nociceptors. It is 
the receptor for e.g. capsaicin, a selective activator of nociceptors, a natural product of 
capsicum peppers. The novel protein is the human orthologue of rat VRl. 

The new protein can find application as a target for the development of new nociception- 
modulating drugs. 


strong similarity to rat vanilloid receptor subtype 1 
Sequenced by MediGenomix 
Locus : unknown 


Insert length: 4187 bp 

Poly A stretch at pos. 4154, polyadenylation signal at pos. 4135 


1 GGCTCAGGCA GGCCTGGCCC AGAGTCACGC TGGCAACCAC GAGTTTGGGA 
51 AGCAGTCGTA TTCTCTCTCT CTCTCTCTCT CTCTCAGTAT CCATGACAGT 
101 GTGATGGAGA GTCTCTGCCG TGCCATCTGG GATGCAAACC GTCCCTGTGT 
151 CCCCCACGTC CAGGCCGTAG ATGCTCCCCG CCGGTCAGTC ACTTAGTCGT 
201 CAGATCGCCC GTCCTGGTAT CACAGTGCTT CTGTTCAGGT TGCACACTGG 
251 GCCACAGAGG ATCCAGCAAG GATGAAGAAA TGGAGCAGCA CAGACTTGGG 
301 GGCAGCTGCG GACCCACTCC AAAAGGACAC CTGCCCAGAC CCCCTGGATG 
351 GAGACCCTAA CTCCAGGCCA CCTCCAGCCA AGCCCCAGCT CTCCACGGCC 
401 AAGAGCCGCA CCCGGCTCTT TGGGAAGGGT GACTCGGAGG AGGCTTTCCC 
451 GGTGGATTGC CCTCACGAGG AAGGTGAGCT GGACTCCTGC CCGACCATCA 
501 CAGTCAGCCC TGTTATCACC ATCCAGAGGC CAGGAGACGG CCCCACCGGT 
551 GCCAGGCTGC TGTCCCAGGA CTCTGTCGCC GCCAGCACCG AGAAGACCCT 
601 CAGGCTCTAT GATCGCAGGA GTATCTTTGA AGCCGTTGCT CAGAATAACT 
651 GCCAGGATCT GGAGAGCCTG CTGCTCTTCC TGCAGAAGAG CAAGAAGCAC 
701 CTCACAGACA ACGAGTTCAA AGACCCTGAG ACAGGGAAGA CCTGTCTGCT 
751 GAAAGCCATG CTCAACCTGC ATGACGGACA GAACACCACC ATCCCCCTGC 
801 TCCTGGAGAT CGCGCGGCAA ACGGACAGCC TGAAGGAGCT TGTCAACGCC 
851 AGCTACACGG ACAGCTACTA CAAGGGCCAG ACAGCACTGC ACATCGCCAT 
901 CGAGAGACGC AACATGGCCC TGGTGACCCT CCTGGTGGAG AACGGAGCAG 
951 ACGTCCAGGC TGCGGCCCAT GGGGACTTCT TTAAGAAAAC CAAAGGGCGG 
1001 CCTGGATTCT ACTTCGGTGA ACTGCCCCTG TCCCTGGCCG CGTGCACCAA 
1051 CCAGCTGGGC ATCGTGAAGT TCCTGCTGCA GAACTCCTGG CAGACGGCCG 
1101 ACATCAGCGC CAGGGACTCG GTGGGCAACA CGGTGCTGCA CGCCCTGGTG 
1151 GAGGTGGCCG ACAACACGGC CGACAACACG AAGTTTGTGA CGAGCATGTA 
1201 CAATGAGATT CTGATCCTGG GGGCCAAACT GCACCCGACG CTGAAGCTGG 
1251 AGGAGCTCAC CAACAAGAAG GGAATGACGC CGCTGGCTCT GGCAGCTGGG 
1301 ACCGGGAAGA TCGGGGTCTT GGCCTATATT CTCCAGCGGG AGATCCAGGA 
13 51 GCCCGAGTGC AGGCACCTGT CCAGGAAGTT CACCGAGTGG GCCTACGGGC 
1401 CCGTGCACTC CTCGCTGTAC GACCTGTCCT GCATCGACAC CTGCGAGAAG 
1451 AACTCGGTGC TGGAGGTGAT CGCCTACAGC AGCAGCGAGA CCCCTAATCG 
1501 CCACGACATG CTCTTGGTGG AGCCGCTGAA CCGACTCCTG CAGGACAAGT 
1551 GGGACAGATT CGTCAAGCGC ATCTTCTACT TCAACTTCCT GGTCTACTGC 
1601 CTGTACATGA TCATCTTCAC CATGGCTGCC TACTACAGGC CCGTGGATGG 
1651 CTTGCCTCCC TTTAAGATGG AAAAAATTGG AGACTATTTC CGAGTTACTG 
1701 GAGAGATCCT GTCTGTGTTA GGAGGAGTCT ACTTCTTTTT CCGAGGGATT 
1751 CAGTATTTCC TGCAGAGGCG GCCGTCGATG AAGACCCTGT TTGTGGACAG 
1801 CTACAGTGAG ATGCTTTTCT TTCTGCAGTC ACTGTTCATG CTGGCCACCG 
1851 TGGTGCTGTA CTTCAGCCAC CTCAAGGAGT ATGTGGCTTC CATGGTATTC 
1901 TCCCTGGCCT TGGGCTGGAC CAACATGCTC TACTACACCC GCGGTTTCCA 
1951 GCAGATGGGC ATCTATGCCG TCATGATAGA GAAGATGATC CTGAGAGACC 
2001 TGTGCCGTTT CATGTTTGTC TACATCGTCT TCTTGTTCGG GTTTTCCACA 
2051 GCGGTGGTGA CGCTGATTGA AGACGGGAAG AATGACTCCC TGCCGTCTGA 
2101 GTCCACGTCG CACAGGTGGC GGGGGCCTGC CTGCAGGCCC CCCGATAGCT 
2151 CCTACAACAG CCTGTACTCC ACCTGCCTGG AGCTGTTCAA GTTCACCATC 
2201 GGCATGGGCG ACCTGGAGTT CACTGAGAAC TATGACTTCA AGGCTGTCTT 
2251 CATCATCCTG CTGCTGGCCT ATGTAATTCT CACCTACATC CTCCTGCTCA 
2301 ACATGCTCAT CGCCCTCATG GGTGAGACTG TCAACAAGAT CGCACAGGAG 
2351 AGCAAGAACA TCTGGAAGCT GCAGAGAGCC ATCACCATCC TGGACACGGA 
2401 GAAGAGCTTC CTTAAGTGCA TGAGGAAGGC CTTCCGCTCA GGCAAGCTGC 
2451 TGCAGGTGGG GTACACACCT GATGGCAAGG ACGACTACCG GTGGTGCTTC 
2501 AGGGTGGACG AGGTGAACTG GACCACCTGG AACACCAACG TGGGCATCAT 
2551 CAACGAAGAC CCGGGCAACT GTGAGGGCGT CAAGCGCACC CTGAGCTTCT 
2601 CCCTGCGGTC AAGCAGAGTT TCAGGCAGAC ACTGGAAGAA CTTTGCCCTG 
2651 GTCCCCCTTT TAAGAGAGGC AAGTGCTCGA GATAGGCAGT CTGCTCAGCC 
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2701 CGAGGAAGTT TATCTGCGAC AGTTTTCAGG GTCTCTGAAG CCAGAGGACG 
2751 CTGAGGTCTT CAAGAGTCCT GCCGCTTCCG GGGAGAAGTG AGGACGTCAC 
2801 GCAGACAGCA CTGTCAACAC TGGGCCTTAG GAGACCCCGT TGCCACGGGG 
2851 GGCTGCTGAG GGAACACCAG TGCTCTGTCA GCAGCCTGGC CTGGTCTGTG 
2901 CCTGCCCAGC ATGTTCCCAA ATCTGTGCTG GACAAGCTGT GGGAAGCGTT 
2951 CTTGGAAGCA TGGGGAGTGA TGTACATCCA ACCGTCACTG TCCCCAAGTG 
3001 AATCTCCTAA CAGACTTTCA GGTTTTTACT CACTTTACTA AACAGTTTGG 
3051 ATGGTCAGTC TCTACTGGGA CATGTTAGGC CCTTGTTTTC TTTGATTTTA 
3101 TTCTTTTTTT TGAGACAGAA TTTCACTCTT CTCACCCAGG CTGGAATGCA 
3151 GTGGCACAAT TTTGGCTCCC TGCAACCTCC GCCTCCTGGA TTCCAGCAAT 
3201 TCTCCTGCCT CGGCTTCCCA AGTAGCTGGG ATTACAGGCA CGTGCCACCA 
3251 TGTCTGGCTA ATTTTTTGTA TTTTTTTAAT AGATATGGGG TTTCGCCATG 
3301 TTGGCCAGGC TGGTCTCGAA CTCCTGACCT CAGGTGATCC GCCCACCTCG 
3351 GCCTCCCAAA GTGCTGGGAT TACAGGTGTG AGCCTCCACA CCTGGCTGTT 
3401 TTCTTTGATT TTATTCTTTT TTTTTTTTCT GTGAGACAGA GTTTCACTCT 
3451 TGTTGCCCAG GCTGGAGTGC AGTGGTGTGA TCTTGGCTCA CTGCAACCTC 
3501 TGCCTCCCGG GTTCAAGCGA TTCTTCTGCT TCAGTCTCCC AAGTAGCTTG 
3551 GATTACAGGT GAGCACTACC ACGCCCGGCT AATTTTTGTA TTTTTAATAG 
3601 AGACGGGGTT TCACCATGTT GGCCAGGCTG GTCTCG7VACT CTTGACCTCA 
3651 GGTGATCTGC CCGCCTTGGC CTCCCAAAGT GCTGGGATTA CAGGTGTGAG 
3701 CCGCTGCGCT CGGCCTTCTT TGATTTTATA TTATTAGGAG CAAAAGTAAA 
3751 TGAAGCCCAG GAAAACACCT TTGGGAACAA ACTCTTCCTT TGATGGAAAA 
3801 TGCAGAGGCC CTTCCTCTCT GTGCCGTGCT TGCTCCTCTT ACCTGCCCGG 
3851 GTGGTTTGGG GGTGTTGGTG TTTCCTCCCT GGAGAAGATG GGGGAGGCTG 
3901 TCCCACTCCC AGCTCTGGCA GAATCAAGCT GTTGCAGCAG TGCCTTCTTC 
3951 ATCCTTCCTT ACGATCAATC ACAGTCTCCA GAAGATCAGC TCAATTGCTG 
4001 TGCAGGTTAA AACTACAGAA CCACATCCCA AAGGTACCTG GTAAGAATGT 
4051 TTGAAAGATC TTCCATTTCT AGGAACCCCA GTCCTGCTTC TCCGCAATGG 
4101 CACATGCTTC CACTCCATCC ATACTGGCAT CCTCAAATAA ACAGATATGT 
4151 ATACATATAA AAAAAAAAAA AAAAAAAAAA AAAAAAA 


BLAST Results 


NO BLAST result 


Medline entries 


99288727: 

Recent advances in neuropharmacology of cutaneous nociceptors. 
99231880: 

A non-pungent triprenyl phenol of fungal origin, scutigeral, stimulates 
rat dorsal root ganglion 

neurons via interaction at vanilloid receptors. 


Peptide information for frame 2 


ORF from 272 bp to 2788 bp; peptide length: 839 
Category: strong similarity to known protein 
Classification; Cell signaling/communication 

1 MKKWSSTDLG AAADPLQKDT CPDPLDGDPN SRPPPAKPQL STAKSRTRLF 
51 GKGDSEEAFP VDCPHEEGEL DSCPTITVSP VITIQRPGDG PTGARLLSQD 
101 SVAASTEKTL RLYDRRSIFE AVAQNNCQDL ESLLLFLQKS KKHLTDNEFK 
151 DPETGKTCLL KAMLNLHDGQ NTTIPLLLEI ARQTDSLKEL VNASYTDSYY 
201 KGQTALHIAI ERRNMALVTL LVENGADVQA AAHGDFFKKT KGRPGFYFGE 
251 LPLSLAACTN QLGIVKFLLQ NSWQTADISA RDSVGKTVLH ALVEVADNTA 
301 DNTKFVTSMY NEILILGAKL HPTLKLEELT NKKGMTPLAL AAGTGKIGVL 
351 AYILQREIQE PECRHLSRKF TEWAYGPVHS SLYDLSCIDT CEKNSVLEVI 
401 AYSSSETPNR HDMLLVEPLN RLLQDKWDRF VKRIFYFNFL VYCLYMIIFT 
451 MAAYYRPVDG LPPFKMEKIG DYFRVTGEIL SVLGGVYFFF RGIQYFLQRR 
501 PSMKTLFVDS YSEMLFFLQS LFMLATWLY FSHLKEYVAS MVFSLALGWT 
551 NMLYYTRGFQ WGIYAVMIE KMILRDLCRF MFVYIVFLFG FSTAVVTLIE 
601 DGKNDSLPSE STSHRWRGPA CRPPDSSYNS LYSTCLELFK FTIGMGDLEF 
651 TENYDFKAVF IILLLAYVIL TYILLLNMLI ALMGETVNKI AQESKNIWKL 
701 QRAITILDTE KSFLKCMRKA FRSGKLLQVG YTPDGKDDYR WCFRVDEVNW 
751 TTWNTNVGII NEDPGNCEGV KRTLSFSLRS SRVSGRHWKN FALVPLLREA 
801 SARDRQSAQP EEVYLRQFSG SLKPEDAEVF KSPAASGEK 


BLASTP hits 
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No BLASTP hits available 

Alert BLASTP hits for DKF2phtes3_20k2, frame 2 

TREMBL: AF029310_1 product: "vanilloid receptor subtype 1"; Rattus 
norvegicus vanilloid receptor subtype 1 mRNA, complete cds., N = 1, 
Score «= 3760, P = 0 

TREMBLNEW:Aa01523l_l product: "stretch-inhibitable nonselective channel 
(SIC)"; Rattus norvegicus mRNA for stretch-inhibitable nonselective 
channel (SIC), con^lete cds., N = 2, Score - 2090, P « 2e-219 


>TREMBL:AF029310_1 product: "vanilloid receptor subtype 1"; Rattus 
norvegicus vanilloid receptor subtype 1 mRNA, complete cds. 
Length «= 838 

HSPs: 

Score = 3760 (564.1 bits). Expect = O.Oe+00, P = O.Oe+00 
Identities 721/839 (85%), Positives = 773/839 (92%) 

Query: 1 MKKWSSTDLGAAADPLQKDTCPDPLDGDPNSRPPPAKPQLSTAKSRTRLFGKGDSEEAFP 60 

M++ +S D + P Q+++C DP D DPN +PPP KP + T +SRTRLFGKGDSEEA P 
Sbjct: 1 MEQRASLDSEESESPPQENSCLDPPDRDPNCKPPPVKPHIFTTRSRTRLFGKGDSEEASP 60 

Query: 61 VDCPHEEGELDSCPTITVSPVITIQRPGDGPTGARLLSQDSVAASTEKTLRLYDRRSIFE 120 

+DCP+EEG h SCP ITVS V+TIQRPGDGP R SQDSV+A EK RLYDRRSIF+ 
Sbjct: 61 LDCPYEEGGLASCPIITVSSVLTIQRPGDGPASVRPSSQDSVSAG-EKPPRLYDRRSIFD 119 

Query: 121 AVAQNNCQDLESLLLFLQKSKKHLTDNEFKDPETGKTCLLKAMLNLHDGQNTTIPLLLEI 180 

AVAQ+NCQ+LESLL FLQ+SKK LTD+EFKDPETGKTCLLKAMLNLH+GQN TI LLL++ 
Sbjct: 120 AVAQSNCQELESLLPFLQRSKKRLTDSEFKOPETGKTCLLKAMLNLHNGQMDTIALLLDV 179 

Query: 181 ARQTDSLKELVNASYTDSYYKGQTALHIAIERRNMALVTLLVENGADVQAAAHGDFFKKT 240 

AR+TDSLK+ VNASYTDSYYKGQTALHIAIERRNM LVTLLVENGADVQAAA+GDFFKKT 
Sbjct: 180 ARKTDSLKQFVNASYTDSYYKGQTALHIAIERRNMTLVTLLVENGADVQAAANGDFFKKT 239 

Query: 241 KGRPGFYFGELPLSLAACTNQLGIVKFLLQNSWQTADISARDSVGNTVLHALVEVADNTA 300 

KGRPGFYFGELPLSLAACTNQL IVKFLLQNSWQ ADI SARDS VGNTVLHALVEVADNT 
Sbjct: 240 KGRPGFYFGELPLSLAACTNQLAIVKFLLQNSWQPADISARDSVGNTVLHALVEVADNTV 299 

Query: 301 DKTKFVTSMYNEILILGAKLHPTLKLEELTNKKGMTPLALAAGTGKIGVLAYILQREIQE 360 

DNTKFVTSMYNEILILGAKLHPTLKLEE+TN+KG+TPLALAA +GKIGVLAYILQREI E 
Sbjct: 300 DNTKFVTSMYNEILILGAKLHPTLKLEEITNRKGLTPLALAASSGKIGVLAYILQREIHE 359 

Query: 361 PECRHLSRKFTEWAYGPVHSSLYDLSCIDTCEKNSVLEVIAYSSSETPNRHDMLLVEPLN 420 

PECRHLSRKFTEWAYGPVHSSLYDLSCIDTCEKNSVLEVIAYSSSETPNRHDMLLVEPLN 
Sbjct: 360 PECRHLSRKFTEWAYGPVHSSLYDLSCIDTCEKNSVLEVIAYSSSETPNRHDMLLVEPLN 419 

Query: 421 RLLQDKWDRFVKRIFYFNFLVYCLYMIIFTMAAYYRPVDGLPPFKMEK-IGDYFRVTGEI 479 

RLLQDKWDRFVKRIFYFNF VYCLYMIIFT AAYYRPV+GLPP+K++ +GDYFRVTGEI 
Sbjct: 420 RLLQDKWDRFVKRIFYFNFFVYCLYMIIFTAAAYYRPVEGLPPYKLKNTVGDYFRVTGEI 479. 

Query: 480 LSVLGGVYFFFRGIQYFLQRRPSMKTLFVDSYSEMLFFLQSLFMLATWLYFSHLKEYVA 539 

LSV GGVYFFFRGIQYFLQRRPS+K+LFVDSYSE+LFF+QSLFML +WLYFS KEYVA 
Sbjct: 480 LSVSGGVYFFFRGIQYFLQRRPSLKSLFVDSYSEILFFVQSLFMLVSWLYFSQRKEYVA 539 

Query: 540 St-WFSLALGWTNMLYYTRGFQQMGIYAVMIEKMILRDLCRFMFVYIVFLFGFSTAWTLI 599 

SMVFSLA+GWTNMLYYTRGFQQMGI YAVMI EKMILRDLCRFMFVY+VFLFGFSTAWTLI 
Sbjct: 540 SMVFSLAMGWTNMLYYTRGFQQMGIYAVMIEKMILRDLCRFMFVYLVFLFGFSTAWTLI 599 

Query: 600 EDGKNDSLPSESTSHRWRGPACRPPDSSYNSLYSTCLELFKFTIGMGDLEFTENYDFKAV 659 

EDGKN+SLP EST H+ RG AC+P +SYNSLYSTCLELFKFTIGMGDLEFTENYDFKAV 
Sbjct: 600 EDGKNNSLPMESTPHKCRGSACJCP-GNSYNSLYSTCLELFKFTIGMGDLEFTENYDFKAV 658 

Query: 660 FIILLLAYVILTYILLLNMLIALMGETVNKIAQESKNIWKLQRAITILDTEKSFLKCMRK 719 

FIILLLAYVILTYILLLNMLIALMGETVNKIAQESKNIWKLQRAITILDTEKSFLKCMRK 
Sbjct: 659 FIILLLAYVILTYILLLNMLIALMGETVNKIAQESKNIWKLQRAITILDTEKSFLKCMRK 718 

Que r y : 720 AFRSGKLLQVGYTPDGKDDYRWCFRVDEVNWTTWNTNVGI I NEDPGNCEGVKRTLS FSLR 779 

AFRSGKLLQVG+TPDGKDDYRWCFRVDEVNWTTWNTNVGI INEDPGNCEGVKRTLS FSLR 
Sbjct: 719 AFRSGKLLQVGFTPDGKDDYRWCFRVDEVNWTTWNTNVGIINEDPGNCEGVKRTLSFSLR 778 

Query: 780 ssrvsgrhwknfalvpllreasardrqsaqpeevylrqfsgslkpedaevfkspaasgek 839 

S RVSGR+WKNFALVPLLR+AS RDR + Q EEV L+ ++GSLKPEDAEVFK GEK 
Sbjct: 779 SGRVSGRNWKNFALVPLLRDASTRDRHATQQEEVQLKHYTGSLKPEDAEVFKDSMVPGEK 838 


Pedant information for DKFZphtes3 20k2, frame 2 
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Report for DKFEphtes3_20k2 .2 


(LENGTH] 839 

(MW] 94950.75 

(pi) 6.90 

(HOMOLj TREMBL:AFO29310_l product: "vanilloid receptor subtype 1"; Rattus norvegicus 

vanilloid receptor subtype 1 mRNA, complete cds . 0.0 

IFUNCAT] 99 unclassified proteins (S. cerevisiae, YIL112wJ 4e-05 

(PIRKWI alternative splicing 3e-06 

(PIRKW] peripheral membrane protein 3e-06 

(SUPFAMJ ankyrin repeat homology 3e-06 

[SUPFAMJ unassigned ankyrin repeat proteins 3e-06 

(PFAM] " Ank repeat 

(KW] TRANSMEMBRANE 4 


SEQ MKKWSSTDLGAAADPLQKDTCPDPLDGDPNSRPPPAKPQLSTAKSRTRLFGKGDSEEAFP 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

MEM 

SEQ VDCPHEEGELDSCPTITVSPVITIQRPGDGPTGARLLSQDSVAASTEKTLRLYDRRSI FE 

PRD cccccccccccccccccceeeeeeecccccccceeeccccccccccchhhhhhhhhhhhh 

MEM , . , 

SEQ AVAQNNCQDLESLLLFLQKSKKHLTDNEFKDPETGKTCLLKAMLNLHDGQNTTIPLLLEI 

PRD hhhhcchhhhhhhhhhhhhhcccccccccccccccchhhhhhhhhhccccccchhhhhhh 

MEM 

SEQ ARQTDSLKELVNAS YTDSYYKGQTALHI AI ERRNMALVTLLVENGADVQAAAHGDFFKKT 

PRD hhhcccccccccccccccccccchhhhhhhhhcchhhhhhhhhccceeeccccccccccc 

MEM 

SEQ KGRPGFYFGELPLSLAACTNQLGIVKFLLQNSWQTADISARDSVGNTVLHALVEVADNTA 

PRD ccccceeeccccchhhhhhcchhhhhhhhhcccccccccccccccchhhhhhhhhhcccc 

MEM 

SEQ DNTKFVTSMYNEILILGAKLHPTLKLEELTNKKGMTPLALAAGTGKIGVLAYILQREIQE 

PRD chhhhhhhhhhhhhhhccccccceeeeeecccccccchhhhhhhcchhhhhhhhhhhhhc 

MEM 

SEQ PECRHLSRKFTEWAYGPVHSSLYDLSCIDTCEKNSVLEVIAYSSSETPNRHDMLLVEPLN 

PRD ccccchhhhhheeeccceeeeeeeccccccccccccceeeeeccccccccceeeeehhhh 

MEM 

SEQ RLLQDKWDRFVKRI FYFNFLVYCLYMI I FTMAAYYRPVDGLPPFKMEKIGDYFRVTGEIL 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccccccchhhhhhhhc 

MEM MMMMMMMMMMMMMMMMM 

SEQ SVLGGVYFFFRGIQYFLQRRPSMKTLFVDSYSEMLFFLQSLFMLATWLYFSHLKEYVAS 

PRD cccceeeeeecchhhhhhhhheeeeeeccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMM 

SEQ MVFSLALGWTNMLYYTRGFQQMGIYAVMIEKMILRDLCRFMFVYIVFLFGFSTAWTLIE 

PRD hhhhhhhhhhhhheeecccccccchhhhhhhhhhhhhhhhhhhheeecccccceeeeeec 

MEM MMMMMMMMMMMMMMMMM. 

SEQ DGKNDSLPSESTSHRWRGPACRPPDSSYNSLYSTCLELFKFTIGMGDLEFTENYDFKAVF 

PRD cccccccccccccccccccccccccccccchhhhhhhhhhhhhccccchhhhhhhhhhhh 

MEM MM 

SEQ IILLLAYVILTYILLLNMLIALMGETVNKIAQESKNIWKLQRAITILDTEKSFLKCMRKA 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMM 

SEQ FRSGKLLQVGYTPDGKDDYRWCFRVDEVNWTTWNTNVGIINEDPGNCEGVKRTLSFSLRS 

PRD hhcceeeeeecccccccccceeeeeeecccccccccceeeecccccccceeeeeeeeeec 

MEM 

SEQ SRVSGRHWKNFALVPLLREASARDRQSAQPEEVYLRQFSGSLKPEDAEVFKSPAASGEK 

PRD ccccccccccchhhhhhhhhhhhhhhcccccceeeeecccccccccceeeecccccccc 

MEM 


(No Prosite data available for DKFZphtes3_20k2.2) 
Pfain for DKFZphtes3_20k2 .2 
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HMM^NAME Ank repeat 

HMM *GyTPLHIAARyNNvEMVrlLLQHGADIN* 

G+T+LHIA +++N+ +V LL+++GAD+ 
Query 202 GQTALHIAIERRNMALVTLLVENGADVQ 229 
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DKFZphtes3_2013 


group: transmembrane protein 

DKFZphtes3_2013 encodes a novel 595 amino acid protein with partial similarity to the IL-17 
receptor. ~ 

The novel protein contains one transmembrane region. 

Ho informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes and- as a new marker for testicular cells. 


similarity to IL-17 receptor 
Sequenced by MediGenomix 
Locus: unknown 
Insert length: 2406 bp 

Poly A stretch at pos- 2345, no polyadenylation signal found 


1 GCCTCAGGTG TTCCTGCGTT GTTTGTCAGT GGAGAGCAGG GAGTGGGGCC 
51 AGCCAGCAGA AACAGTGGGC TGTACAACAT CACCTTCAAA TATGACAATT 
101 GTACCACCTA CTTGAATCCA GTGGGGAAGC ATGTGATTGC TGACGCCCAG 
151 AATATCACCA TCAGCCAGTA TGCTTGCCAT GACCAAGTGG CAGTCACCAT 
201 TCTTTGGTCC CCAGGGGCCC TCGGCATCGA ATTCCTGAAA GGATTTCGGG 
251 TAATACTGGA GGAGCTGAAG TCGGAGGGAA GACAGTGCCA ACAACTGATT 
301 CTAAAGGATC CGAAGCAGCT CAACAGTAGC TTCAAAAGAA CTGGAATGGA 
351 ATCTCAACCT TTCCTGAATA TGAAATTTGA AACGGATTAT TTCGTAAAGG 
401 TTGTCCCTTT TCCTTCCATT AAAAACGAAA GCAATTACCA CCCTTTCTTC 
451 TTTAGAACCC GAGCCTGTGA CCTGTTGTTA CAGCCGGACA ATCTAGCTTG 
501 TAAACCCTTC TGGAAGCCTC GGAACCTGAA CATCAGCCAG CATGGCTCGG 
551 ACATGCAGGT GTCCTTCGAC CACGCACCGC ACAACTTCGG CTTCCGTTTC 
601 TTCTATCTTC ACTACAAGCT CAAGCACGAA GGACCTTTCA AGCGAAAGAC 
651 CTGTAAGCAG GAGCAAACTA CAGAGATGAC CAGCTGCCTC CTTCAAAATG 
701 TTTCTCCAGG GGATTATATA ATTGAGCTGG TGGATGACAC TAACACAACA 
751 AGAAAAGTGA TGCATTATGC CTTAAAGCCA GTGCACTCCC CGTGGGCCGG 
801 GCCCATCAGA GCCGTGGCCA TCACAGTGCC ACTGGTAGTC ATATCGGCAT 
851 TCGCGACGCT CTTCACTGTG ATGTGCCGCA AGAAGCAACA AGAAAATATA 
901 TATTCACATT TAGATGAAGA GAGCTCTGAG TCTTCCACAT ACACTGCAGC 
951 ACTCCCAAGA GAGAGGCTCC GGCCGCGGCC GAAGGTCTTT CTCTGCTATT 
1001 CCAGTAAAGA TGGCCAGAAT CACATGAATG TCGTCCAGTG TTTCGCCTAC 
1051 TTCCTCCAGG ACTTCTGTGG CTGTGAGGTG GCTCTGGACC TGTGGGAAGA 
1101 CTTCAGCCTC TGTAGAGAAG GGCAGAGAGA ATGGGTCATC CAGAAGATCC 
1151 ACGAGTCCCA GTTCATCATT GTGGTTTGTT CCAAAGGTAT GAAGTACTTT 
1201 GTGGACAAGA AGAACTACAA ACACAAAGGA GGTGGCCGAG GCTCGGGGAA 
1251 AGGAGAGCTC TTCCTGGTGG CGGTGTCAGC CATTGCCGAA AAGCTCCGCC 
1301 AGGCCAAGCA GAGTTCGTCC GCGGCGCTCA GCAAGTTTAT CGCCGTCTAC 
1351 TTTGATTATT CCTGCGAGGG AGACGTCCCC GGTATCCTAG ACCTGAGTAC 
1401 CAAGTACAGA CTCATGGACA ATCTTCCTCA GCTCTGTTCC CACCTGCACT 
1451 CCCGAGACCA CGGCCTCCAG GAGCCGGGGC AGCACACGCG ACAGGGCAGC 
1501 AGAAGGAACT ACTTCCGGAG CAAGTCAGGC CGGTCCCTAT ACGTCGCCAT 
1551 TTGCAACATG CACCAGTTTA TTGACGAGGA GCCCGACTGG TTCGAAAAGC 
1601 AGTTCGTTCC CTTCCATCCT CCTCCACTGC GCTACCGGGA GCCAGTCTTG 
1651 GAGAAATTTG ATTCGGGCTT GGTTTTAAAT GATGTCATGT GCAAACCAGG 
1701 GCCTGAGAGT GACTTCTGCC TAAAGGTAGA GGCGGCTGTT CTTGGGGCAA 
1751 CCGGACCAGC CGACTCCCAG CACGAGAGTC AGCATGGGGG CCTGGACCAA 
1801 GACGGGGAGG CCCGGCCTGC CCTTGACGGT AGCGCCGCCC TGCAACCCCT 
1851 GCTGCACACG GTGAAAGCCG GCAGCCCCTC GGACATGCCG CGGGACTCAG 
1901 GCATCTATGA CTCGTCTGTG CCCTCATCCG AGCTGTCTCT GCCACTGATG 
1951 GAAGGACTCT CGACGGACCA GACAGAAACG TCTTCCCTGA CGGAGAGCGT 
2001 GTCCTCCTCT TCAGGCCTGG GTGAGGAGGA ACCTCCTGCC CTTCCTTCCA 
2051 AGCTCCTCTC TTCTGGGTCA TGCAAAGCAG ATCTTGGTTG CCGCAGCTAC 
2101 ACTGATGAAC TCCACGCGGT CGCCCCTTTG TAACA7VAACG AAAGAGTCTA 
2151 AGCATTGCCA CTTTAGCTGC TGCCTCCCTC TGATTCCCCA GCTCATCTCC 
2201 CTGGTTGCAT GGCCCACTTG GAGCTGAGGT CTCATACAAG GATATTTGGA 
2251 GTGAAATGCT GGCCAGTACT TGTTCTCCCT TGCCCCAACC CTTTACCGGA 
2301 TATCTTGACA AACTCTCCAA TTTTCTAAAA TGATATGGAG CTCTGAAAAA 
2351 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
2401 AAAAAA 


BLAST Results 


No BLAST result 
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Medline entries 


No Medline entry 


Peptide information for frame 1 


ORF from 346 bp to 2130 bp; peptide length: 595 
Category: similarity to known protein 
Classification: unclassified 

1 MESQPFLNMK FETDYFVKW PFPSIKNESN YHPFFFRTRA CDLLLQPDNL 
51 ACKPFWKPRN LNISQHGSDM QVSFDHAPHN FGFRFFYLHY KLKHEGPFKR 
101 KTCKQEQTTE MTSCLLQNVS PGDYIIELVD DTNTTRKVMH YALKPVHSPW 
151 AGPIRAVAIT VPLWISAE7V TLFTVMCRKK QQENIYSHLD EESSESSTYT 
201 AALPRERLRP RPKVFLCYSS KDGQNHMNVV QCFAYFLQDF CGCEVALDLW 
251 EDFSLCREGQ REWVIQKIHE SQFIIWCSK GMKYFVDKKN YKHKGGGRGS 
301 GKGELFLVAV SAIAEKLRQA KQSSSAALSK FIAVYFDYSC EGDVPGILDL 
351 STKYRLMDNL PQLCSHLHSR DHGLQEPGQH TRQGSRRNYF RSKSGRSLYV 
401 AICNMHQFID EEPDWFEKQF VPFHPPPLRY REPVLEKFDS GLVLNDVMCK 
451 PGPESDFCLK VEAAVLGATG PADSQHESQH GGLDQDGEAR PALDGSAALQ 
501 PLLHTVKAGS PSDMPRDSGI YDSSVPSSEL SLPLMEGLST DQTETSSLTE 
551 SVSSSSGLGE EEPPALPSKL LSSGSCKADL GCRSYTDELH AVAPL 


BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKF2phtes3_2013, frame 1 

TREMBL:U58917_1 product: "IL-17 receptor"; Homo sapiens IL-17 receptor 
mRNA, complete cds., N « 1, Score = 215, P « 4.7e-14 

TREMBL:MM31993_1 product: "interleukin 17 receptor"; Mus inusculus 
interleukin 17 receptor mRNA, complete cds., N « 2, Score = 152, P = 


>TREMBLi058917_l product: -IL-17 receptor"; Homo sapiens IL-17 receptor 
mRNA, complete cds. 

Length « 866 

HSPs: 

Score « 215 (32,3 bits). Expect « 4.7e-14, P = 4.7e-14 

Identities = 85/284 (29%), Positives = 131/284 (46%) 

Query: 213 KVFLCYSSKDGQNHMNVVQCFAYFLQDFCGCEVALDLWEDFSLCREGQREWV-IQK— -I 268 

KV++ YS+ D +++W FA FL CG EVALDL E+ ++ G WV QK + 
Sbjct: 379 KVWIIYSA-DHPLYVDVVLKFAQFLLTACGTEVALDLLEEQAISEAGVMTWVGRQKQEMV 437 

Query: 269 HESQFIIWCSKGMKY FVDKKNYXXXXXXXXXXXXELFLVAVSAIAEXXXXXXXXX 324 

+ IIV+CS+G + + + +LF A++ I 

Sbjct: 438 ESNSKIIVLCSRGTRAKWQALLGRGAPVRLRCDHGKPVGDLFTAAMNMILPDFKRPACFG 497 

Query: 325 XXXXXXFIAVYF-DYSCEGDVPGILDLSTKYRLMDNLPQLCSHLHSRDHGLQEPGQHTRQ 383 

++ YF + SC+GDVP + + +Y LMD ++ + +D + +PG+ R 
Sbjct: 4 98 T YVVCYFSEVSCDGDVPDLFGAAPRYPLMDRFEEV— YFRIQDLEMFQPGRMHRV 550 

Query: 384 G--SRRNYFRSKSGRSLYVAICNMHQFIDEEPDWFEKQFV PFHPPPLR YREPV 434 

G S NY RS GR L A+ + PDWFE + + PL + EP+ 

Sbjct: 551 GELSGDNYLRSPGGRQLRAALDRFRDWQVRCPDWFECENLYSADDQDAPSLDEEVFEEPL 610 

Query: 435 LEKFDSGLVLNDVMCKPGPESDFCLKVEAAVLGATGPADSQHESQHGGLDQDGEARP 491 

L +G+V ++PSCL++VGGA HL G+P 

Sbjct: 611 LPP-GTGIVKRAPLVRE-PGSQACLAIDPLV-GEEGGAAVAKLEPH— LQPRGQPAP 662 


Pedant information for DKF2phtes3_2013, frame 1 
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[LENGTH) 595 

(MWJ 66847.05 

(PH 6.27 

[HOMOL] TREMBL:MM31993_1 product: "interleukin 17 receptor"; Mus musculus interleulcin 
17 receptor mRNA, complete cds. 2e-14 

(BLOCKS] BL00740A MAM domain proteins 

(BLOCKS J BL01224B N-acetyl-gamma-glutamyl -phosphate reductase proteins 

IKW] TRANSMEMBRANE 1 

IKW) LOW_C0MPLEXITY 13.61 % 


SEQ MESQPFLNMKFETDYFVKWPFPSIKNESNYHPFFFRTRACDLLLQPDNLACKPFWKPRN 

SEG 

PRD CGCCccccccccccceeeeeccccccccccceeeeeeceeeeeeeccccccccccccccc 

MEM 

SEQ LNISQHGSDMQVSFDHAPHNFGFRFFYLHYKLKHEGPFKRKTCKQEQTTEMTSCLLQNVS 

SEG 

PRD eeeecccccceeeecccccccceeeeeehhhhhhcccchhhhhhhhhhhhhhhhhhcccc 


MEM 


SEQ PGDYIIELVDDTNTTRKVMHYALKPVHSPWAGPIRAVAITVPLVVISAFATLFTVMCRKK 

SEG 

PRD ccceeeeeeccccccccccccccccccccccccceeeeccchhhhhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMM 

SEQ QQENIYSHLDEESSESSTYTAALPRERLRPRPKVFLCYSSKDGQNHMNVVQCFAYFLQDF 

SEG xxxxxxx xxxxxxxxxx 

PRD hi^hhhhhhhcccccccceeeeccccccccccceeeeeeecccccchhhhhhhhhhhhhhc 

MEM 

SEQ CGCEVALDLWEDFSLCREGQREWVIQKIHESQFIIWCSKGMKYFVDKKNYKHKGGGRGS 

xxxxxxxxx 

PRD ccchhhhhhhhccccccccchhhhhhhhhhheeeeeeeeccceeeeeccccccccccccc 

MEM _ 

SEQ GKGELFLVAVSAIAEKLRQAKQSSSAALSKFIAVYFDYSCEGDVPGILDLSTKYRLMDNL 

SEG xxx xxxxxxxxxxxxxxx 

PRD ccceeeeehhhhhhhhhhhhhhcchhhhhhhheeeeccccccccccccccchhhhhhccc 

MEM 

SEQ PQLCSHLHSRDHGLQEPGQHTRQGSRRNYFRSKSGRSLYVAICNMHQFIDEEPDWFEKQF 

SEG 

PRD cchhhhhhcccccccccccccccccceeeeccccccceeeeeeceeeecccccceeeeee 

MEM 

SEQ VPFHPPPLRYREPVLEKFDSGLVLNDVMCKPGPESDFCLKVEAAVLGATGPADSQHESQH 

SEG 

PRD eecccccccccceeeeeccccceeeeecccccccccchhhhhhhhhhccccccccccccc 

MEM 

SEQ GGLDQDGEARPALDGSAALQPLLHTVKAGSPSDMPRDSGIYDSSVPSSELSLPLMEGLST 

2 EG xxxxxxxxxxxxxxxxx . 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccchh 

MEM 

SEQ DQTETSSLTESVSSSSGLGEEEPPALPSKLLSSGSCKADLGCRSYTDELHAVAPL 

SEG . . xxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhheeecccccccccccccccceeeccccceeeccccccccceeeeccc 

MEM 


(No Prosite data available for DKFZphtes3_2013 . 1) 
(No Pfam data available for DKFZphtes3_2013 . 1) 
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DKFZphtes3 20ml8 


group: nucleic acid management 

]!!!^^2°''®^ protein contains a leucine zipper and a Prosite mitochondrial enerqv transfer 
protexns signature It is men>ber of a family of substrate carrier proteins whLh are found in 
the inner »itochondrxal meiabrane and are involved in energy transfer The RIM2/MRS^2 It^T 

maintenance'"" application in modulation of mitochondrial DNA replication and 

similarity to carrier protein RIM2 

Sequenced by MediGenomix 

Locus : unknown 

Insert length: 3572 bp 

Poly A stretch at pos. 3530, polyadenylation signal at pos. 3510 

1 GCCGCGGGGA GGGCTGTGCC GGTTGCTTTC TGCAGCCGCA TCTCGGCCAG 
51 CTCTCCTCGC CGTCCCCGGG GCGCTGTGCG TCTCCAGTCC GGGACCGAAG 
101 CCGCCTGCCG TAGCGGGCGG CCAGATCCGC GTCCCGCCTC AGCGGCCGGA 
151 GGACATGCGG GAGAGAGAAT GAGCCAGAGG GACACGCTGG TGCATCTGTT 
201 TGCCGGAGGA TGTGGTGGTA CAGTGGGAGC TATTCTGACA TGTCCACTGG 
251 AAGTTGTAAA AACACGACTG CAGTCATCTT CTGTGACGCT TTATA7TTCT 
301 GAAGTTCAGC TGAACACCAT GGCTGGAGCC AGTGTCAACC GAGTAGTGTC 
351 TCCCGGACCT CTTCATTGCC TAAAGGTGAT CTTGGAAAAA GAAGGGCCTC 
401 GTTCCTTGTT TAGAGGACTA GGCCCCAATT TAGTGGGGGT AGCCCCTTCC 
451 AGAGCAATAT ACTTTGCTGC TTATTCAAAC TGCAAGGAAA AGTTGAATGA 
501 TGTATTTGAT CCTGATTCTA CCCAAGTACA TATGATTTCA GCTGCAATGG 
551 CAGGTATGAA TGTATAATAT TAAAAAAAAA AAAAACTTTC TGAAACCTAG 
601 AGGCTTAATA TTGAATTATA AGTTTGTAGT GAAAAGTTGA TGATTAATGT 
651 GCTTTTCATT GATTAGATGA TTTTTACGTT TATCGATATA AACCAAATTA 
701 GGTATATGTA AAATCTGTCA TCAGTTGACA TTTTTGTAGT CAGGAGTTTA 
751 CATGCTAGGG TACAAGTAAT ATATTTATAT TGCCTTGTGT AGTCCACTGA 
801 ATGTTTAGTG ATCATTGTTA ACAGTTTTAA GAATCCAACC ATAATTACAC 
851 TATAAATAAG TTATGGAGCT GTAATTTACT CTTCTCTCCT CAATTTCTGT 
901 TAGTGCCTTT TCCCTTTTTG CTGCATGTTT TGGCTTCTGT CTGAAATGTG 
951 TCGGCAATTC TTGGTAAAGT ATTCATTTTG TCCTGTGCTC AAATGCTGAA 
1001 ATTTTTGTGA GTGATGTATT ATTATTGACA ATTCAGTTAC TATGTGTATT 
1051 TTTTAAAATT GTTTATTATT CTACATAATT CACACTAGAC AGCACCTGAA 
1101 ATTTAGACAC TGGCTATGTG TACATGCTTA CTATAGAAAT GTTTCCAGGA 
1151 ACTCTCTGTT TCTGTCATCA CTGATAAGTA TATATGATTC TGAATTAAAA 
1201 TAACTAGTTT TAGGTCTTTA CCCTGCCATA AAGATAAACA GTTGGTTTGA 
1251 CCAATCTGGT TCTGGAATCA TTTGCTGCTA TGCATGTTAG ACAAAGCCAC 
1301 GAACTTTGAT TTTCCATTGA AAATTCTCCC TAATATCTGA GATTTATTGT 
1351 ATATTTACTC ATATCTCACA TTTTCAAATT ATGCTGTAAC TTTATAAACT 
1401 GTAGCTGCTT TCATCAGCTA TTGATCAATA AATTGAATGT CAATTATGTG 
1451 CTTAATAATG AGTGCCTTAA ACTGTTAAAC ACTTTTGGTT TAGAAATAAA 
1501 GTGAATCAAT TTGACCTATA TACTTCATGA AGTAAGTAAG TTTGAAATAC 
1551 AAATTTCTGA AAGGTCAATA GCCCTTATCG TATTACAAAT TGTTTTTAAG 
1601 GCTTTTTGTA TTTATTAATT GTCAGTTGAT TCACTGAAGC TTTAAAACTG 
1651 GAAGGGACAA TCCAAAGGTC AAAAGAGTGA AATACAATCA TTTACCAATA 
1701 AGGAAACCTT GGGCAAATTA TGTAATTTAT GTGAACCTCT CTTAGCTTAC 
1751 CCATGGAATG AGTCAAGTGG TCTACATAGA TTTGGATTTT GAGAATTAGT 
1801 TCTTTCATTT AGTGTTATAG AGATTATCTT GTTACAACTA GAATTATTTT 
1851 TAATGTAATT TTTACAGATG TTGAATATTA GTAGATAGGA TTTTTCCCCT 
1901 ACGAATTTGG ATGTAAGGTA AAGGTTGGTG GCCAGTGACA AACCTTATAA 
1951 CCACTTTATC AGGTTCTTTA AAAATATATT TGTGAATTAC CAGTGATTAT 
2001 GTTTTTGGCT TATAACCTCA GATAATTATA AAGAAATGTT AATCTTATTT 
2051 GAAAGAATTG GAATCTAGAA AGTTAGATGA GCAGTCATTT TATATTGATA 
2101 TTTGTTATAT CAGTATAGCA AATGCAGAGG TTCAGAATAT CTTTATTTCC 
2151 ACTGGAACAT CTTATTTCAT TAGAGTATCT CATCAGAATT TATTACTGTA 
2201 TTTGTATCAC ATTGCAAAGA ATTTCAGTAG AATTGTCAGT TTGCACTTTT 
2251 TTCTCAAATG TGTACAAATG TTAACATATA GTTCATTTTT ATCTGTACAT 
2301 TGATGCCATT TCCCAACTTG AATTCCTCAA GTTTTGGTAA ACTTACAATC 
2351 TCATACTTGT TCAGAGGTTA TTGCACTGTA CACTTACTGT GTAGAAAATA 
11:^1 ^l^'^'^^'^ TTGTTTGCAG TTACATTGTT CTGAGAACTG TGCTCTCAGA 
2451 GCTTCTGTGC ACTATTCATG AGCATTAACA CTTAGCCTTG CAGTTTTATA 
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2501 CATAACTATA TGGTTAGTAA AACTGAATGG TCCAATGCAG ACTCATTAAA 
2551 GTAGGCTTTT GCCCCCTTTG TTCTTGAAAT AATCTAGACC AGATTACTCG 
2601 GGGTTTTTTT TAGGATTATT TTTATAGGTC TAAATATGAA TGATTTGGGG 
2651 GTATGAAGTA CTTAAAGATA GTTCTGTGAA AAATCATTTT CAGCTGTCTA 
2701 TTCAAGGGAA AAAATGCTAA CCTTGTCACT TTACTACACA AAACCACACT 
2751 AAAATAAACC ATTAATGATA CTGCCTGCAA GATTTTAACA CACCAGATAG 
2801 CACACACATT AAGGATTTAT AAGGCACTGT ACGTAATTTT TATTCCAAGT 
2851 GACCTCTCAA TTCATTTTCA TTTTGCATTT TATCCATATG AACTCATGTT 
2901 TAATTTAGAT AATAAAAATT TATTTTATTA AAAGGACAGT TTATTTAAAG 
2951 TGGGTCTTTT TATTTGTTGT AG7GCATACT ATAAGAATTT GTAAGCCTCT 
3001 AAAGTTGAGC TATAAATTTT CATGCATTAA AAATTTGTTT CAGTTGTGAG 
3051 GATATTTAAT CAGATTAAAT AATGTTGACT CTTAATATTT TGCCTGCCTT 
3101 TTTTTTCTCC TACACATGAC CTTTGACAGA CTAAGTATAT CTCAGCTATT 
3151 GAGGGTATCT GTTTTGTTGC CTGTATATTT TGTTTAAATT AACTTGTATA 
3201 TTCCTTTGTA TACACCTAGG CACAGATGTA TGCAAAAAAA ATTTGTTAAA 
3251 TTACTTCTTT CTTTATACTA ATTCTCAATT TTTAAAAGAT TTTATCTGGC 
3301 ATGTATATAC TTTTATATAG AACATTATAA ATGTAAAGGA AATGAATTCT 
3351 AATTTTAATT GGATTATGTA TTCATACAGT TATTCTCAAT TTTTAAAATA 
3401 CTAATAATGT AATCATTGAA TGTTTCCTAC ATACGTAGTG GGTTTTATTT 
34 51 GCTCACAGCA TACAGTTATT TTTCAATTTA TGTTTTTCTA TTAGACTTAA 
3501 ATTTCATTAT AATAAAGGCT TTTACTCATT AAATACAAAA AAAAAAAAAA 
3551 AAAAAAAAAA AAAAAAAAAA AA 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 

95198680: 

Overexpression of a novel member of the mitochondrial carrier family rescues defects in both 
DNA and RNA metabolism in yeast loitochondria . 

Peptide information for frame 1 


ORF from 169 bp to 564 bp; peptide length: 132 
Category: similarity to known protein 
Classification: Intacellular transport and traffic 
Prosite motifs: LEUCINEZIPPER (27-49) 
MITOCH CARRIER (26-36) 


1 MSORDTLVHL FAGGCGGTVG AILTCPLEW KTRLQSSSVT LYISEVQLNT 
51 MAGASVNRW SPGPLHCLKV ILEKEGPRSL FRGLGPNLVG VAPSRAIYFA 
101 AYSNCKEKLN DVFDPDSTQV HMISAAMAGM NV 


■ BLAST? hits 

No BLAST? hits available 

Alert BLASTP hits for DKFZphtes3_20ml8, frame 1 

prR:S44092 probable carrier protein c2 - Caenorhabditis elegans, N » 2, 
Score = 147, P = 1.5e-19 

PIR:S36081 probable carrier protein RIM2, mitochondrial - yeast 
(Saccharomyces cerevisiae) , N = 1, Score = 230, P = 6.2e-19 


>PIR:S36081 probable carrier protein RIM2, mitochondrial - yeast 
(Saccharomyces cerevisiae) 
Length = 377 

HSPs: 

Score = 230 (34.5 bits). Expect = 6.2e-19, P = 6.2e-19 
Identities - 55/133 (41%), Positives = 80/133 (60%) 

Query: 8 VHLFAGGCGGTVGAILTCPLEVVKTRLQSSS-VTLYISEVQLNTMAGA SVNRWSP 62 

VH AGG GG GA++TCP ++VKTRLQS + Y S+ +N G+ S+N V+ 
Sbjct: 54 VHFVAGGIGGMAGAWTCPFDLVKTRLQSDIFLKAYKSQA-VNISKGSTRPKSINYVIQA 112 
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Query: 63 GP LHCLKVILEKEGPRSLFRGLGPNLVGVAPSEUVIYFAAYSNCKEKLNDVFD— P H5 

G L + + ++EG RSLF+GLGPNL.VGV P+R+I F Y K+ F+ 

Sbjct; 113 GTHFKETLGIIGNVYKQEGFRSLFKGLGPNLVGVIPARSINFFTYGTTKDMYAKAFNNGQ 172 

Query: 116 DSTQVHMISAAiyiAG 129 

++ +H+++AA AG 
Sbjct: 173 ETPMIHLMAAATAG 186 

Score = 77 (11.6 bits). Expect = l.le+00, P =» 6.8e-01 

Identities = 25/88 {28%), Positives = 39/88 (44%) 

Query: 3 QRDTLVHLFAGGCGGTVGAILTCPLEVVKTRLQSSSVTLYISEVQLNTMAGASVNRVVSP 62 

Q ++HL A G A T P+ ++KTR VQL+ SV + + 

Sbjct: 172 QETPMIHLMAAATAGWATATATNPIWLIKTR VQLDKAGKTSVRQYKNS 219 

Query: 63 GPLHCLKVILEKEGPRSLFRGLGPNLVG 90 

CLK ++ EG L++GL + +G • 
Sbjct: 220 WD— CLKSVIRNEGFTGLYKGLSASYLG 245 

Score = 71 (10.7 bits). Expect = 6.6e+00, P = l.Oe+00 
Identities = 28/91 (30%), Positives = 45/91 (49%) 

Query: 12 AGGCGGTVGAILTCPLEWKTRLQSSSVTLYISEVQLNTMAGASVNRWSPGPLHCLKVI 71 

+ G V +1 T P EW+TRL+ + +NG R+G+KVI 
Sbjct: 294 SAGLAKFVASIATYPHEVVRTRLRQTP KEN G KRKYT-GLVQSFKVI 338 

Query: 72 LEKEGPRSLFRGLGPNLVGVAPSRAIYFAAY 102 

+++EG S++ GL P+L+ P+ I F + 
Sbjct: 339 IKEEGLFSMYSGLTPHLMRTVPNSIIMFGTW 369 


Pedant information for DKFZphtes3_20ml8, frame 1 


Report for DKFZphtes3_20ml8 . 1 

[LENGTH] 132 

JMW] 13993.36 

IpIJ 8.42 

[HOMOL] PIR:S36081 probable carrier protein RIM2, mitochondrial - yeast (Saccharomyces 
cerevxsiae) 7e-19 ^ 

[FUNCATJ 07.16 purine and pyrimidine transporters [S. cerevisiae, YBR192wJ 3e-20 

[FUNCAT] 08.04 mitochondrial transport (S. cerevisiae, YBR192w] 3e-20 

[FUNCAT] 30.16 mitochondrial organization (S. cerevisiae, YBR192wl 3e-20 

(FUNCATl 02.13 respiration (S. cerevisiae, YBR192w] 3e-20 

[FUNCATJ 01.05.07 carbohydrate transport [S. cerevisiae, YPR021c3 3e-10 

[FUNCAT] 07.07 sugar and carbohydrate transporters [S. cerevisiae, YPR021c] 3e-10 

[FUNCAT] 07.99 other transport facilitators (S. cerevisiae, YEL006w] le-09 

[FUNCAT] 01.07.10 transport of vitamins, cofactors, and prosthetic groups fS. 

cerevisiae, YIL006w] 3e-09 ^ k i 

[FUNCAT] 07.04.07 anion transporters (cl, so4, po4, etc.) [S. cerevisiae, YKL120wl 

2e~08 

(FUNCAT] 01.03.19 nucleotide transport [s. cerevisiae, YPROllc] 3e-08 

(FUNCAT] 04.05.03 mrna processing (splicing) [S. cerevisiae, YKR052cl 4e-08 

t^NCAT) 01.05.04 regulation of carbohydrate utilization [S. cerevisiae, YJR095w] 

(FUNCAT] 01.01.07 amino-acid transport (S. cerevisiae, YOR130c] 5e-05 

{fcTJNCATl 07.10 amino-acid transporters (s. cerevisiae, YORl30cJ 5e-05 

[FUNCATJ 01.04.07 phosphate transport (S, cerevisiae, YJR077c] 7e-05 

[FUNCAT] 13.04 homeostasis of other ions (S. cerevisiae, YJR077cl 7e-05 

(BLOCKS] BL00215B Mitochondrial energy transfer proteins 

(BLOCKS] BL00215A Mitochondrial energy transfer proteins 

[PIRKW] duplication 6e-09 

[PIRKW] transmembrane protein 6e-09 

[PIRKW) mitochondrial inner membrane 4e-07 

(PIRKttJ transport protein 5e-06 

(PIRKW) mitochondrion 7e-08 

[PIRKW] chloroplast 3e-08 

[SUPFAM] Btl protein 3e-08 

[SUPFAM] ADP,ATP carrier protein repeat homology 4e-09 

(SUPFAM] Caenorhabditis probable carrier protein c2 4e-09 

(SUPFAM] probable carrier protein YPR021c 6e-09 

(PROSITEJ LEUCINEZIPPER 1 

(PROSITEJ MIT(XH_CARRIER 1 * 

[PFAM] Mitochondrial carrier proteins 

IKW) Alpha_Beta 


SEQ MSQRDTLVHLFAGGCGGTVGAILTCPLEVVKTRLQSSSVTLYISEVQLNTMAGASVNRW 
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PRD cccccceeeecccccccceeeeeecchhhhhhhhhhhccccccccccccccccccccccc 

SEQ SPGPLHCLKVILEKEGPRSLFRGLGPNLVGVAPSRAIYFAAYSNCKEKLMDVFDPDSTQV 

PRD cccchhhhhhhhhhcccceeeeccccceeeecccceeeeeehhhhhhhhhcccccccccc 

SEQ HMISAAMAGMNV 

PRD chhhhhhhcccc 


Prosite for DKFZphtes3_20ml8. 1 

PS00029 , 27->49 LEUCINE_ZIPPER PDOC00029 
PS00215 26->36 MITOCH CARRIER PDOC00189 


Pfam for DKFZphtes3_20ml8 . 1 


HMM^NAME Mitochondrial carrier proteins 

HMM *pFwkdFLAGGIAGnMeHTvMFPIDtIKTRMQlQgEMpM. .ahpR 

+++++++AGG +G + +++++P+++-I-KTR+Q++ ++ + ++ 
Query 5 DTLVHLFAGGCGGTVGAILTCPLEVVKTRLQSS-SVTLYISEVQLNTMA 52 

HMM YkGMIdCFRwIwkNEGWRGLWRGLgANvIRYIPqWalRFGFY 

G+++C++ I+++EG+R+L+RGLG+N+++++P +AI+F+ Y 
Query 53 GASVNRVVSPGPLHCLKVILEKEGPRSLFRGLGPNLVGVAPSRAIYFAAY 102 

HMM BFMKeMFiDyfgeddnyWmWFwmnYMaGs* 

+KE ++D F++ D++++++ + +MAG+ 
Query 103 SNCKEKLNDVFDP-DSTQVHMISAAMAGM 130 
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group: signal transduction 

DKFZphtes3_.21d4 encodes a novel 464 amino acid putative GTP exchanging factor related to RCCl. 

and\nte?actrwi?h ^ ^ukaryotic protein which binds to chromatin 

with G^P fn^^ni L f ' nuclear GTP-bmding protein. RCCl promotes the exchange of bound GDP 
with GTP, acting as a guanine-nucleotide dissociation stimulator. 

^^"^ application in the regulation of gene expression by activition of 
reailator ^hic^"^ P'^oteins The X-linJced retinitis pigmentosa is a result'o? a defect GTPase 
regulator, which contains a RCCl-type repeat. 

similarity to RCCl-like G exchanging factor RLG 
complete cDNA, complete cds, EST hits 
Sequenced by LMU 
Locus: /map=*'20" 
Insert length: 2321 bp 

Poly A stretch at pos, 2293, polyadenylation signal at pos. 2262 

1 GGGTCACGCA AGATGGCGGC GCCCAGAGGC TGCTGAGGCG CGGAACGGAG 
51 GATGGCGCTG GTGGCGTTGG TGGCTGGGGC TCGGCTGGGG CGGCGGCTGA 

101 GCGGGCCGGG GCTGGGGCGA GGGCACTGGA CGGCGGCCAG GCGCTCCCGG 

151 AGCCGGCGCG AAGCGGCAGA AGCCGAGGCG GAGGTGCCCG TGGTCCAGTA 

201 CGTGGGCGAG CGCGCTGCCC GCGCCGATCG CGTCTTCGTG TGGGGCTTCA 

251 GCTTCTCGGG GGCGCTGGGC GTGCCTTCCT TTGTGGTGCC CAGCTCCGGG 

301 CCCGGGCCCC GCGCCGGCGC CCGACCGCGC CGCAGGATCC AGCCCGTGCC 

351 CTATCGCCTG GAGCTGGACC AAAAGATTTC ATCTGCTGCT TGCGGCTATG 

401 GATTCACACT GCTGTCCTCT AAGACTGCGG ATGTTACGAA AGTCTGGGGG 

451 ATGGGACTCA ACAAAGATTC TCAGCTTGGA TTTCACAGGA GCCGGAAAGA 

501 TAAAACGAGG GGCTACGAGT ATGTGTTGGA GCCCTCACCC GTCTCCCTGC 

551 CTCTGGACAG ACCTCAGGAG ACACGGGTGC TGCAGGTCTC CTGCGGCCGA 

601 GCTCACTCTC TTGTGTTGAC TGACAGGGAA GGAGTCTTCA GCATGGGAAA 

651 CAATTCTTAT GGGCAATGTG GAAGAAAGGT GGTCGAAAAT GAAATTTACA 

•701 GTGAAAGTCA CAGAGTCCAC AGGATGCAGG ACTTCGATGG CCAGGTGGTC 

751 CAGGTCGCCT GTGGTCAGGA TCATAGTCTG TTCCTGACGG ATAAAGGAGA 

801 AGTCTATTCT TGTGGATGGG GTGCTGATGG GCAAACAGGT CTGGGTCACT 

851 ACAATATCAC CAGCTCGCCC ACCAAGCTGG GTGGAGACCT GGCGGGAGTG 

901 AACGTTATCC AAGTTGCCAC CTACGGTGAT TGCTGCCTGG CCGTGTCCGC 

951 CGACGGAGGA CTTTTTGGTT GGGGAAACTC GGAGTACCTG CAGCTGGCCT 
1001 CTGTCACTGA CTCCACACAG GTGAATGTGC CCCGCTGCTT ACACTTCTCA 
1051 GGAGTGGGGA AGGTGCGACA GGCTGCATGC GGTGGCACGG GCTGTGCAGT 
1101 GTTAAACGGA GAAGGACATG TTTTTGTCTG GGGCTATGGA ATTCTTGGGA 
1151 AAGGTCCAAA CCTAGTGGAA AGTGCCGTCC CTGAAATGAT TCCACCCACT 
1201 CTCTTTGGCT TGACGGAGTT CAACCCAGAA ATCCAGGTTT CCCGCATCCG 
1251 ATGTGGACTC AGCCACTTTG CTGCACTGAC CAACAAAGGA GAGCTGTTTG 
1301 TATGGGGCAA GAACATCCGA GGGTGCCTGG GAATCGGTCG CCTGGAGGAC 
1351 CAGTATTTCC CATGGAGGGT GACGATGCCT GGGGAGCCTG TGGACGTGGC 
1401 ATGTGGCGTG GACCACATGG TGACCCTGGC CAAGTCATTC ATCTAAACCT 
1451 CCCTCACCTG CTTGGGCGGC CCCGTCCCGG GAACCACTGG CACTCCTTGG 
1501 CAGAGGCCAG CGCGTGGCCA GCCCCCCGGG GTTCTTGGAT GGTGGTGGCG 
1551 GAGGACCCTG CGTGCAGTGT GACGCTCTGT CCTGAATCCC TTAGCGGGTA 
1601 CCTACCAGGA GGATCAGGGC AAGGTCCCTC TCCAGCTGCA GGTGAGGCCT 
1651 GCGGAACTCA GCTTGGATGG CAGCCTTTGG TGGGCCGCTG TGGCCCGCAC 
1701 GTCTCTGTTC TCTCCAAGTA ACATGCGACG GTGTCTGGTG TCACGTCTCG 
1751 CCTGAGAAGC CCGTCTTAGG AAAGCTTAGC TTGAACACAG TGCTCGGGAG 
1801 GTTTCTGCTC TGTCTGTCAT GGCAGTCTCT TGGTTTGTGT CTGGCCAAGG 
1851 CCATGCGTGT GCCTCGGACC GAGCCCCAGC TTAGGCGAGG GAGTCAGGCT 
1901 GGCTTCGGCC CTCGGTTTTC ATTCAGGCCA CCCTGCTCAT GGCCCTTCCT 
1951 GGCCGCCTGC CACACCGCAA GCTCGCTGGG GGGACACTAG AAGCACCGTG 
2001 GCCTGGGATT CCATCTGGAG CTGTCCGCAG GCACCAGCCC CAGCCTCCCA 
2051 CCACGCTCAC TGCCTGGCTT GGAAAAGTTA AGAAGCCCCT CAGGAAGAGA 
2101 ATCGAGGCTA AGTTCCTCTG CGCCGAGGGC CCCGAGCATA TCCGCCAAGG 
2151 CTCAGCTGCA GTGCCAGGCG GAGGAGGAAG ATCCAGAAAT TGTGAACAAT 
2201 GTTTGATTTA GTAGCGTGAC TTGCCTTTCC CTTTAAAAAC ATCTTTTACA 
2251 AATCTGTCTT GGAATAAAGT CTATTTTCTG CCTTTTGGTT TTTAAAAAAA 
2301 AAAAAAAAAA AAAAAAAAAA A ««««««« 

BLAST Results 


705 


wo 01/12659 


PCT/lBOO/01496 


Entry HS203358 from database EMBL: 
human STS SHGC-31781. 
Score = 1748, P = l.le-72, identities = 376/394 


Medline entries 


No Medline entry 


Peptide information for frame 1 


ORF from 52 bp to 1443 bp; peptide length: 464 
Category: similarity to known protein 


1 MALVALVAGA RLGRRLSGPG LGRGHWTAAR RSRSRREAAE AEAEVPWQY 
51 VGERAARADR VFVWGFSFSG ALGVPSFWP SSGPGPRAGA RPRRRIQPVP 
101 YRLELDQKIS SAACGYGETL LSSKTADVTK VWGMGLNKDS QLGFHRSRKD 
151 KTRGYEYVLE PSPVSLPLDR PQETRVLQVS CGRAHSLVLT DREGVFSMGN 
201 NSYGQCGRKV VENEIYSESH RVHRMQDFDG QVVQVACGQD HSLFLTDKGE 
251 VYSCGWGADG QTGLGHYNIT SSPTKLGGDL AGVNVIQVAT YGDCCLAVSA 
301 DGGLFGWGNS EYLQLASVTD STQVNVPRCL HFSGVGKVRQ AACGGTGCAV 
351 LNGEGHVFVW GYGILGKGPN LVESAVPEMI PPTLFGLTEF NPEIQVSRIR 
401 CGLSHFAALT NKGELFVWGK NIRGCLGIGR LEDQYFPWRV TMPGEPVDVA 
451 CGVDHMVTLA KSFI 


BLAST? hits 


Entry CEW09G3_5 from database TREMBLNEW: 

gene: ''W09G3.3"; Caenorhabditis elegans cosmid W09G3 

Score « 395, P « 9.3e-37, identities « 111/330. positives = 165/330 

Entry Y032 HUMAN from database SWISSPROT: 
HYPOTHETICAL PROTEIN KIAA0032. 

Score = 309, P = l.Oe-24, identities « 96/308, positives = 143/308 

Entry B38919 from database PIR: 
hypothetical protein 2 - human (fragment) 

Score - 309, P = l.Oe-24, identities = 96/308, positives = 143/308 
Entry AF060219_1 from database TREMBLNEW: 

product: "RCCl-like G exchanging factor RLG**; Homo sapiens RCCl-like 6 
exchanging factor RLG mRNA, complete cds. 

Score = 273, P = 4.0e-21, identities = 84/262, positives = 124/262 

Entry S71752 from database PIR: 
giant protein p619 - human 

Score = 282, P l.le-19, identities = 86/287, positives - 144/287 


Alert BLASTP hits for DKFZphtes3_21d4, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3__21d4, frame 1 


Report for DKFZphtes3_21d4.1 


{LENGTH] 4 64 

[MWl 49997.08 

[pl] 8.74 

(H<»10L) TREMBL:CEW09G3_5 gene: "W09G3.3"; Caenorhabditis elegans cosmid W09G3 5e-34 

[FUNCATJ 04.07 rna transport [S, cerevisiae, YGL097w] 2e-09 

[FUNCATl 03.07 pheromone response, mating-type determination, sex-specific proteins 

[S. cerevisiae, YGL097w] 2e-09 

(FUNCATJ 08.01 nuclear transport (S. cerevisiae, YGL097wJ 2e-09 

[FUNCAT] 04.05.05 mrna processing (5*-end, 3'-end processing and mrna degradation) IS. 
cerevisiae, YGL097w) 2e-09 

[FUNCAT) 04.01.04 rrna processing (S. cerevisiae, YGL097wJ 2e-09 

[FUNCAT) 04.03,03 trna processing [S. cerevisiae, YGL097w] 2e-09 

[FUNCAT] 30.03 organization of cytoplasm [S. cerevisiae, YGL097w] 2e-09 
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(FUNCATl 

(BLOCKS] 

(BLOCKS! 

(BLOCKS] 

(PIRKW) 

(PIRKW] 

[PIRKW] 

(PIRKW J 

[PIRKW] 

(PIRKW] 

(PIRKW] 

(SUPFAM) 

(SUPFAM) 

(PROSITEJ 

[PROSITEl 

[PROSITEJ 

[PROSITE] 

(PROSITEJ 

[PROSITEJ 

[PROSITEJ 

(PROSITEJ 

[PROSITEJ 

(PFAMJ 

[KWJ 

(KWJ 


[S. cerevisiae, YAL020cJ 4e-06 


30.04 organization of cytosJceleton 
BL00870I 

BL00625B Regulator of chromosome condensation (RCCl) proteins 

BL00625A Regulator of chromosome condensation (RCCl) proteins 

bloclced amino end 3e-16 

nucleus 3e-16 

duplication 4e-08 

tandem repeat 3e-16 

DNA binding 3e-16 

mitosis 3e-16 

leucine zipper 3e-21 

pheromone response pathway component SRMl 4e-08 

WD repeat homology 3e-21 

MYRISTYL 7 

RCC1_2 2 

AMIDATION 2 

CAMP_PHOSPHO_SITE 1 

CK2_PH0SPH0 SITE 5 

TYR PHOSPH0~SITE 2 

GLYCOSAMINOGLYCAN 3 

PKC_PHOSPHO_SITE 7 

ASN~GLYCOSYLATION 2 

Regulator of chromosome condensation (RCCl) 
All_Beta 

LOWCOMPLEXITY 13.58 % 


SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRO 

SEQ 
SEG 
PRD 


MALVALVAGARLGRRLSGPGLGRGHWTAARRSRSRREAAEAEAEVPVVQYVGERAARADR 
.xxxxxxxxxxxxxxxxxxxxxxx. . . xxxxxxxxxxxxxxxxx 

ccchhhhhhhhhheeeccccccccchhhhhhhhhhhhhhhhhhhceeeeeehhhhhhhhh 

VFVWGFSFSGALGVPSFVVPSSGPGPRAGARPRRRIQPVPYRLELDQKISSAACGYGFTL 
xxxxxxxxxxxxxxxxxxxxxxx 

eeeeccccccccccceeeeeccccccccccccccccccccchhhhiihhheeeccccceee 

LSSKTADVTKVWGMGLNKDSQLGFHRSRKDKTRGYEYVLEPSPVSLPLDRPQETRVLQVS 

eecccccceeeeccccccccccccccccccccccceeeeeccccccccccccccceeeee 

CGRAHSLVLTDREGVFSMGNNSYGQCGRKWENEIYSESHRVHRMQDFDGQVVQVACGQD 

cccceeeeeeccceeeeeccccccccccccccccccccccccccccccceeeeeeecccc 

HSLFLTDKGEVYSCGWGADGQTGLGHYNITSSPTKLGGDLAGVNVIQVATYGDCCLAVSA 

eeeeeecccceeeecccccccccccccccccccccccccccceeeeeeecccceeeeeec 

DGGLFGWGNSEYLQLASVTDSTQVNVPRCLHFSGVGKVRQAACGGTGCAVLNGEGHVFVW 

ccceeeeccccccccccccccccccccccccccccceeeeeccccceeeeeecccceeee 

GYGILGKGPNLVESAVPEMIPPTLFGLTEFNPEIQVSRIRCGLSHFAALTNKGELEVWGK 

cccccccccccccccccccccceeeeeeecccceeeeeeecccceeeeeecccceeeecc 

NIRGCLGIGRLEDQYFPWRVTMPGEPVDVACGVDHMVTLAKSFI 

cccccccccccccccccceeecccceeeeecccccccccccccc 


PSOOOOl 

200->204 

PSOOOOl 

268->272 

PS00002 

17->21 

PS00002 

82->86 

PSO0O02 

333->337 

PS00004 

14->18 

PSO0OO5 

34->37 

PS00005 

122->125 

PSD0005 

147->150 

PS00005 

190->193 

PSO0O05 

219->222 

PSO0O05 

246->249 

PS00005 

410->413 

PS00006 

34->38 

PS00006 

147->151 

PS00006 

190->194 

PS00006 

290->294 

PS00006 

317->321 


Prosite for DKF2phtes3_21d4 . 1 

ASNGLYCOS YLAT I ON 
ASN^GLYCOSYLATION 
GLYCOSAMINOGLYCAN 
GLYCOSAMINOGLYCAN 
GLYCOSAMINOGLYCAN 
CAMP_PHOSPH0_S ITE 
PKC_PHOS PHO_S I TE 
PKC^PHOS PHO_S I TE 
PKC_PHOS PHO_S I TE 
PKC^PHOS PHO_S I TE 
PKC_PHOS PHO_S I TE 
PKC PHOSPHO__SITE 
PKC^PHOSPHO SITE 
CK2_PH0S PHO~S I TE 
CK2_PH0SPHO_SITE 
CK2 PH0SPH0_SITE 
CK2~PH0SPH0_SITE 
CK2 PHOSPHO SITE 


PDOCOOOOl 

PDOCOOOOl 
PDOC00002 
PDOC00002 
PDOC00002 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOCOOOOS 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
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TYR_PH0SPHO_SITE 

PDOC00007 

troUUUU ' 


fiR PHOSPHO SITE 

PDOC00007 

PS00008 

9->15 

MYRISTYL 

PDOC00008 

PS00008 

20->26 

MYRISTYL 

PDOC00008 

PS00008 

' 133->139 

MYRISTYL 

PDOC00008 

PS00008 

238->244 

MYRISTYL 

PDOC00008 

PS00008 

277->283 

MYRISTYL 

PDOC00008 

PSOOOOB 

302->308 

MYRISTYL 

PDOC00008 

PS00008 

344->350 

MYRISTYL 

PDOC00008 

PS00009 

12->16 

AMI DAT ION 

PDOC00009 

PS00009 

206->210 

AMI DAT ION 

PDOC00009 

PS00626 

179->190 

RCCl 2 

PDOC00544 

PS00626 

235->246 

RCCl 2 

PDOC00544 


Pfam for DKFZphtes3_21d4 . 1 


HMM_NAME 

HMM 

Query 


Regulator of chromosome condensation (RCXl) 


* I AaGqHHTVCLTqDGRVYtWG* 
+A GQ+H++ LT++G VY++G 
235 VACGQDHSLFLTDKGEVYSCG 


255 
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DKFEphtes3_21jl5 


group: transcription factors 

DKF2phtes3_21jl5 encodes a novel 898 amino acid protein with similarity human NY-CO-33 
protein , 

Ny-CO-33 is a protein recognised by autologous antibodies of human colon cancer patients. The 
novel protein contains 4 C2H2 Zinc fingers and is a new putativ transcription factor. 

The new protein can find application in modulating/blocking the expression of genes controlled 
by this transcription factor. 

strong similarity to "NY-CO-33" 

complete cDNA, complete cds, potential start at bp 27, EST hits 

Sequenced by LMU 

Locus: unknown 

Insert length: 4407 bp 

Poly A stretch at pos. 4321, polyadenylation signal at pos. 4301 

1 CGCTGCAGCA GGTGTCACAG AGCCGCATGC TCCCGGAGCC CAGCCTCTTC 
51 AGCACCGTGC AGCTGTACCG GCAGAGCAGC AAGCTCTATG GCTCCATCTT 

101 CACGGGGGCC AGCAAGTTCC GCTGTAAGGA CTGCAGCGCT GCCTACGACA 

151 CCCTGGTGGA GTTGACAGTG CACATGAACG AGACGGGGCA TTACCGCGAC 

201 GACAACCATG AGACCGATAA CAACAACCCC AAGCGCTGGT CCAAGCCTCG 

251 CAAACGCTCC TTGCTGGAAA TGGAAGGGAA GGAAGACGCC CAGAAGGTGC 

301 TGAAGTGCAT GTACTGTGGC CACTCCTTTG AGTCCCTGCA GGATTTGAGT 

351 GTCCATATGA TCAAAACAAA ACACTACCAA AAAGTGCCTC TGAAGGAACC 

401 CGTCACTCCT GTCGCCGCCA AAATCATCCC TGCCACTCGG AAGAAAGCTT 

451 CCCTGGAGCT GGAGCTCCCC AGCTCCCCAG ATTCCACAGG TGGAACCCCC 

501 AAAGCCACCA TCTCAGACAC CAACGATGCA CTTCAGAAGA ACTCCAACCC 

551 TTACATCACG CCAAATAATC GGTACGGCCA CCAGAATGGG GCCAGCTATG 

601 CATGGCACTT TGAGGCCCGG AAGTCGCAGA TCCTGAAGTG CATGGAGTGT 

651 GGGAGCTCGC ATGACACCCT GCAGGAGCTC ACTGCCCACA TGATGGTCAC 

701 TGGCCACTTC ATCAAGGTCA CCAACTCTGC TATGAAAAAG GGGAAGCCCA 

751 TTGTGGAGAC GCCTGTCACA CCTACCATCA CAACCCTGCT GGATGAGAAG 

801 GTCCAGTCCG TGCCCCTGGC AGCCACCACC TTCACGTCCC CCTCCAATAC 

851 ACCTGCCAGC ATCTCCCCAA AACTGAATGT GGAGGTCAAG AAGGAAGTCG 

901 ACAAGGAGAA AGCGGTCACT GACGAGAAAC CTAAGCAAAA AGACAAGCCT 

951 GGCGAAGAAG AGGAGAAGTG TGACATCTCT TCCAAATACC ATTACTTGAC 
1001 TGAAAATGAC TTAGAAGAGA GTCCCAAGGG GGGGCTTGAT ATCCTCAAAT 
1051 CCTTGGAAAA CACAGTGACA TCCGCAATCA ACAAGGCCCA GAACGGCACT 
1101 CCTAGCTGGG GGGGCTATCC CAGCATCCAT GCCGCCTACC AACTTCCCAA 
1151 CATGATGAAG TTGTCCCTGG GCTCGTCGGG GAAGAGCACG CCCCTGAAAC 
1201 CCATGTTTGG CAACAGTGAG ATTGTCTCCC CGACGAAAAA CCAGACCCTG 
1251 GTCTCTCCAC CCAGCAGCCA GACGTCCCCC ATGCCCAAGA CAAACTTTCA 
1301 TGCCATGGAG GAGCTGGTGA AAAAGGTCAC TGAGAAAGTT GCCAAAGTGG 
1351 AGGAGAAGAT GAAGGAGCCG GATGGGAAGC TTTCCCCGCC CAAGCGGGCC 
1401 ACTCCCTCCC CATGTAGCAG CGAAGTCGGG GAACCCATCA AGATGGAGGC 
1451 ATCCAGCGAT GGGGGCTTCC GCAGCCAGGA GAACAGCCCC AGCCCCCCGC 
1501 GGGATGGGTG CAAGGATGGG AGCCCCCTCG CTGAGCCGGT GGAGAATGGC 
1551 AAGGAGCTGG TGAAGCCCCT AGCCAGCAGT TTGAGTGGCA GCACGGCCAT 
1601 CATCACCGAC CACCCGCCTG AACAGCCTTT TGTTAACCCT TTGAGCGCCC 
1651 TGCAGTCAGT CATGAACATT CACCTGGGCA AGGCCGCCAA GCCCTCCCTG 
1701 CCTGCCCTGG ACCCCATGAG CATGCTTTTC AAGATGAGCA ACAGCCTGGC 
1751 GGAGAAGGCT GCTGTGGCCA CCCCGCCGCC CCTGCAGTCC AAGAAGGCAG 
1801 ACCACCTCGA CCGCTATTTC TACCACGTCA ACAACGACCA GCCCATAGAC 
1851 TTGACAAAAG GGAAGAGTGA CAAAGGCTGC TCCTTGGGTT CAGTGCTTCT 
1901 GTCACCCACG TCCACAGCCC CGGCAACCTC CTCATCCACG GTGACAACGG 
1951 CAAAGACATC TGCCGTCGTA TCATTCATGT CAAACTCGCC GCTACGCGAG 
2001 AATGCCTTGT CAGATATATC CGATATGCTG AAGAACTTGA CAGAGAGCCA 
2051 CACGTCAAAA TCCTCCACTC CTTCCAGCAT CTCCGAGAAG TCTGACATTG 
2101 ACGGGGCCAC TCTGGAGGAG GCTGAGGAGT CGACGCCCGC CCAGAAGAGG 
2151 AAGGGCCGCC AGTCAAACTG GAACCCCCAG CACCTCCTGA TCCTCCAGGC 
2201 CCAGTTTGCC GCCAGCCTCC GGCAGACCTC AGAAGGGAAG TACATCATGT 
2251 CAGACCTGAG CCCCCAGGAG CGGATGCATA TCTCCAGGTT CACCGGGCTG 
2301 TCCATGACCA CCATCAGCCA CTGGCTGGCC AACGTGAAAT ACCAGCTTCG 
2351 AAGGACAGGT GGAACAAAGT TCCTCAAAAA CTTGGACACT GGCCACCCCG 
2401 TCTTCTTTTG TAACGATTGT GCGTCCCAAA TCAGGACTCC TTCCACGTAC 
2451 ATCAGTCACC TAGAGTCACA CTTAGGCTTC CGGCTACGGG ACTTATCCAA 
2501 ACTGTCCACC GAACAGATTA ACAGTCAGAT AGCACAAACC AAGTCACCGT 
2551 CAGAAAAAAT GGTGACGTCC TCCCCCGAGG AAGACCTGGG GACTTCCTAT 
2601 CAGTGCAAAC TTTGCAATCG GACCTTTGCC AGCAAGCACG CTGTTAAACT 
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2651 TCACCTTAGC AAAACACACG GGAAATCTCC GGAAGACCAC CTTCTGTATG 
2701 TCTCTGAGTT AGAGAAGCAG TAGCATTTGC TTTTGATAGA AAGGACTGCA 
2751 GTTTGCTTTG AGGGAAACTG TGGAAGGCAC CTTCAGGCCC CCTCTGACTT 
2801 GTTGTTCTTG GCACATGTTC TTATTTTAAC TGCAGAGAAT CACTCTGGGC 
285X TGGACTGTTT TGTATAACTG TACAGTGTTT AATAGAGGTG CATAATCAGC 
2901 TGTTGTTACT GGTAAAATAT GAAGGTTAAA ATGCAGTGGT AAGTGTTTGG 
2951 AACTTTGTGT AAACGGGATT TAGTTGTGAG CATCCTCCCG ATGCTTCAAG 
3001 CTGCATGCAT TAACAGACAG TTTAATTAAG CATTTATAAC GGAATCAGGC 
3051 ACACCTTTTC CACGAGACTC GAGTGTGCTG GCATTTCTCA CCCTTTCATC 
3101 TTTAGCCCTC TGAGTACTTT GAAGCACTTT TGCATTAATT TGGTTAAAAA 
3151 ATAAAATAAA ATAATAATAA TGTATGAAGC TCTGTTTTTT AAACTCCTTA 
3201 CCAGCTTAGT TATAATGAAT AATATGAACC TCCATTTATG CAGGTCTGCA 
3251 GGGGTATAAC ACGCCTTGAA ATTTAAAAGA ATATTATTTT CACATTGAAA 
3301 CATAGATGTA TATATTGTAT AGATTTCAGA CTCTCTTATG AAAAAAAATG 
3351 TGATTGTGGT TAAATGACCT TTTTCTTGCA TTTATAGCAA CAGTGTTTTA 
3401 TGCACCTGCT ATGCTCTGGG CATAAGCTGT GCCTATGTAT AGTGTATATT 
3451 TCTTTTTTTC TTTTTTTTAA GGTCTATGGG TTTTGTTTTT TACATGCAAA 
3501 CATTGTAAAT TATACAGAAG ATACCACAGA TAGCATTTAT AAAGTATACA 
3551 GAAACATTAT CTGAAAGCAA AGTATGATAG TTTGTTTTGC TATACAGTAC 
3601 ATCTATATTG ATAGAGGTTC ATGTTTAAAT TATACATATT TATTAGCATC 
3651 ATATTGTCAT TTGTTTTGAG CAGTCTGAAT AAACGAGACC GGGAAAGACA 
3701 TCCCTGGCAG GCATCAGAAC TATTTTGCAC ATGATTTTTA AAGGTATTTA 
3751 TTAGAAATCA AAGAACACTC AAAATAAACT CAGTGCTCAA AGGGTTAAGT 
3801 CTATTTGAAA AGGTTAAAAA AAAGAACAAA AAAAAAAAAA GAACTTGTAC 
3851 TGTATTTCCT AAACATTGAT AAAGCCTTTA AAATGTTTGT ACTGTAATAC 
3901 TTTGCTTAAA AGTCATGAGG CATTCTGTGA TCCAACCTCT TTCACTTATT 
3951 TATAAGCCCT CTTGGTTGCT ATTCCATATT GTAGGATGCC TTTCTATTTC 
4001 AATTGGTAAC TTTCTGTTTT GTTCTTCCTA ATTATTCTCC CAAGATCCCA 
4051 CACTGCAGCT TTATCTTTAG GCTTATGAAA GGTAACCCGT GGTTACCGGC 
4101 TCTCCAAGTG ATTCTGTTCT TCTCCATTTT TGGCAGTTAA TTTGCAGAAG 
4151 TAACTGACAG CTGACACCAT ATGAGAACCT TTGTATAAAA TATTGGCATG 
4201 TAAACAGCAC AGACACCGTA ACACACTCTG TGCCCTGTTT GGTTGTTGAC 
4251 AATGAAGCAC CATTATGTGA CTCTTCATAT AACCCTTTTT TCTACGGCAG 
4301 CATTAAAATT GTCTTTTTGC TATAAAAAAA AAAAAAAAAA AAAAAAAAAA 
4351 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
4401 AAAAAAA 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 3 


ORF from 27 bp to 2720 bp; peptide length: 898 
Category: strong similarity to known protein 


1 MLPEPSLFST VQLYRQSSKL YGSIFTGASK FRCKDCSAAY DTLVELTVHM 
51 NETGHYRDDN HETDNNNPKR WSKPRKRSLL EMEGKEDAQK VLKCMYCGHS 
101 FESLQDLSVH MIKTKHYQKV PLKEPVTPVA AKIIPATRKK ASLELELPSS 
151 PDSTGGTPKA TISDTNDALQ KNSNPYITPN NRYGHQNGAS YAWHFEARKS 
201 QILKCMECGS SHDTLQELTA HMMVTGHFIK VTNSAMKKGK PIVETPVTPT 
251 ITTLLDEKVQ SVPLAATTFT SPSNTPASIS PKLNVEVKKE VDKEKAVTDE 
301 KPKQKDKPGE EEEKCDISSK YHYLTENDLE ESPKGGLDIL KSLENTVTSA 
351 INKAQNGTPS WGGYPSIHAA YQLPNMMKLS LGSSGKSTPL KPMFGNSEIV 
401 SPTKNQTLVS PPSSQTSPMP KTNFHAMEEL VKKVTEKVAK VEEKMKEPDG 
451 KLSPPKRATP SPCSSEVGEP IKMEASSDGG FRSQENSPSP PRDGCKDGSP 
501 LAEPVENGKE LVKPLASSLS GSTAIITDHP PEQPFVNPLS ALQSVMNIHL 
551 GKAAKPSLPA LDPMSMLFKM SNSLAEKAAV ATPPPLQSKK ADHLDRYFYH 
601 VNNDQPIDLT KGKSDKGCSL GSVLLSPTST APATSSSTVT TAKTSAVVSF 
651 MSNSPLRENA LSDISDMLKN LTESHTSKSS TPSSISEKSD IDGATLEEAE 
701 ESTPAQKRKG RQSNWNPQHL LILQAQFAAS LRQTSEGKYI MSDLSPQERM 
751 HISRFTGLSM TTISHWLANV KYQLRRTGGT KFLKNLDTGH PVFFCNDCAS 
\ 801 QIRTPSTYIS HLESHLGFRL RDLSKLSTEQ INSQIAQTKS PSEKMVTSSP 

851 EEDLGTSYQC KLCNRTFASK HAVKLHLSKT H6KSPEDHLL YVSELEKQ 

BLASTP hits 

No BLASTP hits available 
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Alert BLASTP hits for DKF2phtes3_21 jl5, frame 3 

TREMBL:AF039698_1 gene: "Ny-CO-33"/ product: "antigen NY-CO-33"; Homo 
sapiens antigen NY-CO-33 (NY-co-33) mRNA, complete cds., N = 1, Score « 
1039, P = 5.5e-105 

PIR:A38437 probable homeotic protein tsh - fruit fly (Drosophila 
raelanogaster) , N 3, Score * 158, P = 7.2e-09 

TREMBL:CE33058_1 gene: "unc-89"; product: "UNC-89"; Caenorhabditis 
elegans UNC-89 (unc-89) gene, complete cds., N = 2, Score = 175. p = 
3.3e-07 


>TREMBL:AF039698__1 gene: •'NY'CO-33"; product: "antigen Ny-CO-33"; 
sapiens antigen NY-CO-33 (NY-CO-33) mRNA, complete cds. 
Length = 687 

HSPs: 


Homo 


score - 1039 (155.9 bits). Expect 5.5e-105, P «- 5.5e-105 
Identities « 244/504 (48%), Positives « 319/504 (63%) 


Query: 

Sbjct: 

Query: 

Sbjct: 

Query : 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 


170 QKNSNPYITPNNRYGHQNGASYAWHFEARKSQILKCMECGSSHDTLQELTAHMMVTGHFI 229 
QK +NPY+TPNNRYG+QNGASY W FEARK+QILKCMECGSSHDTLQ+LTAHMMVTGHF+ 
14 QKAANPYVTPNNRYGYQNGASYTWQFEARKAQILKCMECGSSHDTLQQLTAHMMVTGHFL 73 

230 KVTNSAMKKGKPIVETPVTPTITTLLDEKVQSVPLAATTFTS-PSNT PASISPKLN 284 

KVT SA KKGK +V PV ++EK+QS+PL TT T P+++ P S + 

74 KVTTSASKKGKQLVLDPV VEEKIQSIPLPPTTHTRLPASSIKKQPDSPAGSTT 126 

285 VE VKKEVDKEKA- VTDEKPKQKDKPGEEEEKCDISSKYHYLTENDLEES PKGGLDILKSL 343 

E KKE +KEK V + K K++ + EK + S+ Y YL E DL++S PKGGLDILKSL 
127 SEEKKEPEKEKPPVAGDAEKIKEESEDSLEKFEPSTLYPYLREEDLDDSPKGGLDILKSL 186 

344 ENTVTSAINKAQNGTPSWGGYPSIHAAYQLPNMMKLSLGSSGKSTPLKPMF-GNSEIVSP 402 

ENTV++AI+KAQNG PSWGGYPSIHAAYQLP +K L ++ +8 ++P + G + +S 
187 ENTVSTAISKAQNGAPSWGGYPSIHAAYQLPGTVK-PLPAAVQSVQVQPSYAGGVKSLSS 245 

403 TKNQTLVSPPSSQTSPMPKTNFHAMEELVKKVTEKV-AKVEEKMKEPDGKLSPPKRATPS 461 

++ L+ P S T P K+N AMEELV+KVT KV K EE+ E + K S K A S 
246 AEHNALLHSPGSLTPPPHKSNVSAMEELVEKVTGKVNIKKEERPPEKE-KSSLAKAA--S 302 
462 


521 


PCSSEVGEPIKMEASSDGGFRSQENSPSPPRDGGKDGSPLAEPVENGKELVKPLASSLSG 
P + E + KES +Q+P KPL NGE+K++ 

303 PIAKENKDFPKTEEVSG KPQKKGPEAETWEAKKEGPLDVHTPNGTEPLKAKVTNGCN 359 

522 STAIITDHPPEQPFVNPLSALQSVMNIHLGKAAKPSLPALDPMSMLFKMSNSLAEKAAVA 581 

+ II DH PE F+NPLSALQS+MN HLGK +KP P+LDP++ML+K+SNS+ +K 
360 NLGIIMDHSPEPSFINPLSALQSIMNTHLGKVSKPVSPSLDPLAMLYKISMSMLDKPVYP 419 

582 TPPPLQSKKADHLDRYFYHVNNDQPIDLTKGKSDK-GCSLGSVLLSPTSTAPATSSSTVT 640 

P K+AD +DRY+Y N+DQPIDLTK K+ S+ + SP + S + 

420 ATPV KQADAIDRYYYE-NSDQPIDLTKSKNKPLVSSVADSVASPLRESALMDISDMV 475 

641 TAKTSAWSFMSN-SPLRENALSDISDHLKNLTE 673 

T+ SS + E + +DS +LE 
476 KNLTGRLTPKSSTPSTVSEKSDADGSSFEEALDE 509 


Score « 865 (129.8 bits), Expect 7.4e-95, P = 7 4e-95 
Identities = 211/434 (48%), Positives = 268/434 (61%) 


Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 


447 EPDGKLSPPKRATPSPCSSEVG— EPIKMEASSDGGFRSQENSPSPPRDG-CKDGSPLAE 503 

E+LP TPPSV E+++ ++EP + K SP+A+ 

247 EHNALLHSPGSLTPPPHKSNVSAMEELVEKVTGKVNIKKEERPPEKEKSSLAKAASPIAK 306 

504 P-VE— NGKELVK-PLASSLSGSTAIITD-HPPE— QPFVNPLSALQSVMNIHLG 551 

PE+GK KPA+ DHP +P ++ + + I + 

307 ENKDFPKTEEVSGKPQKKGPEAETWEAKKEGPLDVHTPNGTEPLKAKVTNGCNNLGIIMD 366 

552 KAAKPSLPALDPMSMLFKMSNSLAEKAAVATPPPLQSKKADHLDRYFYHVNN— DQPID 608 

+ +PS ++P+S L + N+ K 4 PL D L Y ++N D+P+ 

367 HSPEPSF— INPLSALQSIMNTHLGKVSKPVSPSL DPL-AMLYKISNSMLDKPV- 417 

609 LTKGKSDKGCSLGSVLLSPTSTAPATSSSTVTTAKTSAWSFMSNSPLRENALSDISDML 668 

K S P + + s+V 4+ SPLRE4AL DISDM+ 

418 -YPATPVKQADAIDRYYYENSDQPIDLTKSKNKPLVSSVADSVA-SPLRESALMDISDMV 475 

669 KNLTESHTSKSSTPSSISEKSDIDGATLEEA-EESTPAQKRKGRQSNWNPQHLLILQAQF 727 
KNLT T KSSTPS4+SEKSD DG44 EEA 4E 4P KRKGRQSNWNPQHLLILQAQF 
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Sbjct: 476 KNLTGRLTPKSSTPSTVSEKSDADGSSFEEALDELSPVHKRKGRQSNWNPQHLLILQAQF 535 

Query: 728 AASLRQTSEGKYIMSDLSPQERMHISRFTGLSMTTISHWLANVKYQLRRTGGTKFLKNLD 787 

A+SLR+T+EGKYIMSDL PQER+HIS+FTGLSMTTISHWLANVKYQLRRTGGTKFLKNLD 
Sbjct: 536 ASSLRETTEGKYIMSDLGPQERVHISKFTGLSMTTISHWLANVKYQLRRTGGTKFLKNLD 595 

Query: 788 TGHPVFFCNDCASQIRTPSTYISHLESHLGFRLRDLSKLSTEQINSQIAQTKSPSEKMV- 846 

TGHPVFFCNDCASQ RT STYISHLE+HLGF L+DLSKL QI Q +K + K + 
Sbjct: 596 TGHPVFFCNDCASQFRTASTYISHLETHLGFSLKDLSKLPLNQIQEQQNVSKVLTNKTLG 655 

Query: 847 -TSSPEEDLGTSYQCKLCNRTFASK 870 

+ EEDLG+++QCKLCNRTFA + 
Sbjct: 656 PLGATEEDLGSTFQCKLCNRTFAKQ 680 

Score = 98 (14.7 bits). Expect = 7.4e-95, P = 7.4e-95 
Identities ^ 32/95 (33%), Positives = 47/95 (49%) 

Query: 90 KVLKCMYCGHSFESLQDLSVHMIKTKHYQKVPL KEPVT-PVAAKIIPATRKKAS 142 

++LKCM CG S ++LQ L+ HM+ T H+ KV K+ V PV + I + + 

Sbjct: 45 QILKCMECGSSHDTLQQLTAHMMVTGHFLKVTTSASKKGKQLVLDPVVEEKIQSIPLPPT 104 

Query: 143 LELELPSS PDSTGGTPKATISDTNDALQKNSNP 175 

LP+S PDS G+ T S+ +K P 

Sbjct: 105 THTRLPASSIKKQPDSPAGS TTSEEKKEPEKEKPP 139 

Score * 81 (12.2 bits). Expect = 4.6e-93, P = 4.6e-93 
Identities * 13/29 (44%), Positives « 20/29 (68%) 

Query: 28 ASKFRCKDCSAAYDTLVELTVHMNETGHY 56 

A +C +C +++DTL +LT HM TGH+ 
Sbjct: 44 AQILKCMECGSSHDTLQQLTAHMMVTGHF 72 


Pedant information for DKFZphtes3_21 j 15, frame 3 


Report for DKFZphtes3_21 j 15 . 3 


[LENGTH] 898 

(MWl 98486.72 

[plj 8.61 

(HOMOLJ TREMBL:AF039698_1 gene: "NY-CO-33"; product: ''antigen NY-CO-33*; Homo sapiens 

antigen Ny-CO-33 (NY-CO-33) roRNA, complete cds . 0.0 

(BLOCKS] BL00028 Zinc finger, C2H2 type, domain proteins 

(PIRKWJ zinc finger le-06 

[PIRKWJ DNA binding le-06 

[PIRKW] transcription regulation le-06 

[PROSITEJ MYRISTYL 9 

IPROSITEJ ZINC_FINGER_C2H2 4 

(PROSITEJ CAMP_PH0SPHO_SITE 5 

[PROSITEJ CK2_PH0SPH0_SITE 19 

[PROSITE] TYR_PHOSPHO_SITE 2 

[PROSITEJ PKC_PHOSPH0_SITE 15 

[PROSITEJ ASNGLYCOSYLATION 4 

[PFAMJ Zinc finger, C2H2 type 

[KWJ Aipha_Beta 

(KW] LOW COMPLEXITY 11.36 % 


SEQ MLPEPSLFSTVQLYRQSSKLYGSIFTGASKFRCKDCSAAYDTLVELTVHMNETGHYRDDN 

SEG 

PRO ccccceeeeeeeeccccceeeeeeeccccceeecccchhhhhhhhhhhcccccccccccc 

SEQ HETDNNNPKRWSKPRKRSLLEMEGKEDAQKVLKCMYCGHSFESLQDLSVHMIKTKHYQKV 

SEG 

PRO cccccccccccccccchhhhhhhccchhhhhhhhhcccccchhhhheeeeeeeecceeee 

SEQ PLKEPVTPVAAKIIPATRKKAS LELELPSS PDSTGGTPKATISDTNDALQKNSNPYITPN 

SEG xxxxxxxxxx 

PRO eccccccceeeeeeehhhhhhhhhhcccccccccccccceeeeccchhhhhccccccccc 

SEQ NRYGHQNGASYAWHFEARKSQILKCMECGSSHDTLQELTAHMMVTGHFIKVTNSAMKKGK 

SEG 

PRO ccccccccchhhhhhhhhhhhhhhhccccccccchhhhhhhhhhhceeeeeccccccccc 

SEQ PIVETPVTPTITTLLDEKVQSVPLAATTFTSPSNTPASISPKLNVEVKKEVDKEKAVTDE 

SEG xxxxxxxxxxxxx xxxxxxxxxxxxxxxx 

PRO ccccccccccchhhhhhhhccccccccccccccccccccccccccccccccchhhhhhcc 

SEQ KPKQKDKPGEEEEKCDISSKYHYLTENDLEESPKGGLDILKSLENTVTSAINKAQNGTPS 
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SEG 
PRD 

SEQ 
SEG 

PRD 

SEQ 
SEG 
PRD 

seo 

SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 


ccccccccccccccchhhhhhhhhhhcccccccccchhhhhhhhhhhhhhhhhhcccccc 
WGGYPSIHAAYQLPNMMKLSLGSSGKSTPLKPMFGNSEIVSPTKNQTLVSPPSSQTSPMP 
cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

KTNFHAMEELVKKVTEKVAKVEEKMKEPDGKLSPPKRATPSPCSSEVGEPIKMEASSDGG 
xxxxxxxxxxxxxxxxxxxx 

ccchhhhhhhhhhhhhhhhhhhhhhcccccccccccccccccccccccccceeeeecccc 

FRSQENSPSPPRDGCKDGSPLAEPVENGKELVKPLASSLSGSTAIITDHPPEQPFVNPLS 

cccccccccccccccccccccccccccccccccccccccccceeeeeccccccccccccc 

ALQSVMNIHLGKAAKPSLPALDPMSMLFKMSNSLAEKAAVATPPPLQSKKADHLDRYFYH 

chhhhhhcccccccccccccchhhhhhhhhhhhhhccccccccccccccccccccceeee 

VNNDQPIDLTKGKSDKGCSLGSVLLSPTSTAPATSSSTVTTAKTSAVVSEMSNSPLRENA 

xxxxxxxxxxxxxxxxxxxxxxxx 

ecccccceeecccccccccccceeecccccccccccceeeeceeeeeeeeccccccchhh 

LSDISDMLKNLTESHTSKSSTPSSISEKSDIDGATLEEAEESTPAQKRKGRQSNWNPQHL 
xxxxxxxxxxxxxxxxxx 

hhhhhhhhhhhhcccccccccccceeecccccchhhhhhhhccchhhhhhcccccccchh 
LI LQAQFAASLRQTSEGKYIMSDLSPQERMHI SRFTGLSMTTI SHWLANVKYQLRRTGGT 
hhhhhhhhhhhhhccccceeecccccchhhhhhhhccccchhhhhhhhhhhhhhhhcccc 
KFLKNLDTGHPVFFCNDCASQIRTPSTYISHLESHLGFRLRDLSKLSTEQINSQIAQTKS 

ceeecccccccceeecccceeeecccchhhhhhhhhhhhhhhhhcchhhhhhhhhhhhcc 

PSEKMVTSSPEEDLGTSYQCKLCNRTFASKHAVKLHLSKTHGKSPEDHLLYVSELEKQ 

ccceeeeccccccccceeehhhhhhhhhhhhhhhhhccccccccccceeeeeeecccc 


Prosite for DKFZphtes3_21jl5.3 


PSOOOOl 
PSOOOOl 
PSOOOOl 
PSOOOOl 
PS0O0O4 
PS00004 
PS00004 
PS00004 
PS00004 
PS00005 
PS0O0O5 
PS00005 
PSOOODS 
PS00005 
PS00005 
PS00005 
PS0O0O5 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 

psoaoo5 

PS00006 
PS00006 
PS00006 
PS00006 
PS0000 6 
PS0000 6 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 


51->55 
405->409 
670->674 
864->868 
69->73 
75->79 
139->143 
432->436 
456->460 
17->20 
137->140 
157->160 
280->283 
318->321 
332->335 
384->387 
435->438 
588->591 
614->617 
641->644 
676->679 
686>>689 
730->733 
842->845 
42->46 
78'>82 
103->107 
149->153 
161'>165 
210->214 
214->218 
253->257 
325->329 
573->577 
684->688 
689->693 
695->699 
745->749 


ASNGL YCOS YLATI ON 
ASN_GLYCOSYLATION 
ASN^GLYCOSYLATTON 
ASN^GLYCOSYLATION 
CAM P_PHOS PHO_S I TE 
CAMP_PHOS PHO_S I TE 
CAMP_PHOS PHO_SITE 
CAMPPHOS PHO_SITE 
CAMP_PHOS PHO_S ITE 
PKC^PHOSPHOSITE 
PKC_PH0SPHO_SITE 
PKC PHOSPHORS ITE 
PKC~PHOS PHOSITE 
PKC"PHOSPHO_SITE 
PKC_PHOS PHOS ITE 
PKC_PHOSPHO_SITE 
PKC_PHOS PHO_S I TE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO SITE 
PKC_PH0SPH0"SITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
CK2_PH0SPH0_SITE 
CK2_PH0S PHO_S ITE 
CK2_PHOSPHO SITE 
CK2_PH0SPH0~SITE 
CK2_PHOS PHO_SI TE 
CK2_PH0SPH0 SITE 
CK2_PH0S PHOj^SI TE 
CK2_PH0SPH0_SITE 
CK2_PHOS PHO_S ITE 
CK2_PHOS PHO_S ITE 
CK2_PHOS PHO_SI TE 
CK2_PH0SPH0_SITE 
CK2_PH0SPH0_SITE 
CK2 PHOSPHO SITE 


PDOCOOOOl . 

PDOCOOOOl 

PDOCOOOOl 

PDOCOOOOl 

PDOC00004 

PDOC00004 

PDOC00004 

PDOC00004 

PDOC00004 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 
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n r\ i\ f\ d £^ 

810- 

->814 

CK2_PH0SPH0 

_SITE 

PDOC00006 

nc- f\f\ nrt c 

840- 

->844 

CK2_PHOSPHO 

_SITE 

PDOC00006 

no n Art A £r 
PbUUUUb 

848- 

^852 

CK2_PHOSPHO 

_SITE 

PDOC00006 


884- 

->888 

CK2_PH0SPH0 

_SITE 

PDOC00006 

OP rtrt Art C 

893- 

■>897 

CK2_PH0SPH0 

_SITE 

PDOC00006 

PSUUUU / 

732- 

■>740 

TYR__PHOSPHO 

SITE 

PDOC00007 

T^c* rtrt A T 

PSOOOO / 

883- 

■>892 

TyR_PHOS PHO 

SITE 

PDOC00007 

PSOOOUo 

22->28 

MYRISTYL 


PDOC00008 

DC A A ri A P 

156- 

->162 



PDOC00008 

PS00008 

188- 

->194 

MYRISTYL 


PDOC00008 

PS00008 

362- 

•>368 

MYRISTYL 


PDOC00008 

PS00008 

479- 

•>485 

MYRISTYL 


PDOC00008 

PS00008 

494- 

->500 

MYRISTYL 


PDOC00008 

PS00008 - 

498- 

•>504 

MYRISTYL 


PDOC00008 

PS00008 

617- 

>623 

MYRISTYL 


PDOC00008 

PS00008 

757- 

>763 

MYRISTYL 


PDOC00008 

PS00028 

795- 

>816 

ZING FINGER 

C2H2 

PDOC00028 

PS00028 

860- 

>882 

ZINC FINGER 

"C2H2 

PDOC0O028 

PS00028 

33 

->56 

ZINC FINGER 

■C2H2 

PDOC00028 

PS00028 

94- 

>117 

ZINC finger" 

"C2H2 

PDOC00028 


Pfam for DKF2phtes3_21jl5 .3 


HMM^NAME Zinc finger, C2H2 type 

HMM *CpwPDCgKtFrrwsNLrRHMR. .T.H* 

C++ C ++ + +L+ HM+ H 
Query 33 CKD — CSAAYDTLVELTVHMNET-GH 


55 


26.69 (bits) f: 94 t; 116 Target: dkf zphtes3_21 jl5.3 strong similarity to •*NY-CO~33" 

Alignment to HMM consensus: 
Query *CpwPDCgKtFrrwsNLrRHMR. .T .H* 

C + CG +F + +L HM+ H 

dkfzphtes3 94 CMY — CGHSFESLQDLSVHMIKT-KH 116 

Query f: 795 t: 815 Target: dkf 2phtes3_21jl5.3 strong similarity to "NY-CO-aS" 

Alignment to HMM consensus: 

HMM *CpwPDCgKtFrrwsNLrRHMRTH* 

C++ C R++S+++ H+ +H 

Query 795 CND--CASQIRTPSTYISHLESH 815 

27.12 (bits) f: 860 t: 881 Target: dkf zphtes3_21 j 15 . 3 strong similarity to "NY-CO-33" 

Alignment to HMM consensus: 
Query *CpwPDCgKtFrrwsNLrRHMR.T.H* 
C-f- C++TF +++ + H+ H 

dkfzphtesB 860 CKL— CNRTFASKHAVKLHLSK-TH 881 
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group: intracellular transport and trafficking 

DKFZphtes3_2U16 encodes a novel 66 amino acid protein nearly identical to rat ribosome 
attached membrane protein 4 (ramp4) . 

The novel protein seems to be the human orthologe of rat ramp 4. Ramp4 is involved in the 
regulation of translocation of proteins into endoplasmic reticulum, e.g. of the MHC class II 
associated invariant (gamma) chain. 

The new protein can find application in modulation of protein translocation into the 
endoplasmic reticulum. 


identical to rat ribosome attached membrane protein 4 

ORF Bp 316-513 (66 aa) see BLASTX 

Sequenced by LMU 

Locus : unknown 

Insert length: 2488 bp 

Poly A stretch at pos. 24 64, polyadenylation signal at pos. 2442 


1 CTTCCTCTTT CACTCCGCGC TCACGGCGGC GGCCAAAGCG GCGGCGACGG 
51 CGGCGCGAGA ACGACCCGGC GGCCAGTTCT CTTCCTCCTG CGCACCTGCC 
101 CCGCTCGGTC AGTCAGTCGG CGGCCGGCGC CCGGCTTGTG CTCAGACCTC 
151 GCGCTTGCGG CGCCCAGGCC CAGCGGCCGT AGCTAGCGTC TGGCCTGAGA 
201 ACCTCGGCGC TCCGGCGGCG CGGGCACCAC GAGCCGAGCC TCGCAGCGGC 
251 TCCAGAGGAG GCAGGCGAGT GAGCGAGTCC GAGGGGTGGC CGGGGCAGGT 
301 GGTGGCGCCG CGAAGATGGT CGCCAAGCAA AGGATCCGTA TGGCCAACGA 
351 GAAGCACAGC AAGAACATCA CCCAGCGCGG CAACGTCGCC AAGACCTCGA 
401 GAAATGCCCC CGAAGAGAAG GCGTCTGTAG GACCCTGGTT ATTGGCTCTC 
451 TTCATTTTTG TTGTCTGTGG TTCTGCAATT TTCCAGATTA TTCAAAGTAT 
501 CAGGATGGGC ATGTGAAGTG ACTGACCTTA AGATGTTTCC ATTCTCCTGT 
551 GAATTTTAAC TTGAACTCAT TCCTGATGTT TGATACCCTG GTTGAAAACA 
601 ATTCAGTAAA GCATCCTGCC TCAGAATGAC TTTCCTATCA TGCTTCATGT 
651 GTCATTCCAA GGTTTCTTCA TGAGTCATTC CAAGTTTTCT AGTCCATACC 
701 ACAGTGCCTT GCAAAAAACA CCACATGAAT AAAGCAATAA AATTTGATTG 
751 TTAAGATACA GTAGTGGACC CTACTTATTC AGTCAATTAA GAGTAAGTTT 
801 TTTTATGTGG TTATTAAAAC AGTATGAACA ATTAGTCTAA CTCTGCATAG 
851 ACAGGGTCTA GATTTTGTTA ACCCAAATGT ATAACTGCAG TTAGCTTAAA 
901 TTACAATTTG AAGTCTTGTG GTTTTTATAT AGCTAGGCAC TTTATTACTC 
951 TTTTGAACTG AAAGCACACT CCCTTATAGG TTCATGTAAC TGTCCTGTAA 
1001 TAAGGTGCTT ATAAATGGAA CAACTACACA GCCTAGTTTT GCCACAACCT 
1051 TTAGCATCTA AAAAGTTTTA AAAGCTTCTA AATGTCTAAT ATAAAGGGAG 
1101 ATGCTTATAG CCACAACATC TATTTTACCA ATATTGTTTC C ATT AC ACTA 
1151 CCTTGGATTT TGCATGAGTG AGTATAGTAA CCCAAGATGC CATAAAAAAA 
1201 AACTTGATCG TTTTCTGACT TAATTAGTTA CTGTGGTTTC ACTAAAAGCT 
1251 ACCGTGGTGG AGTGAAGTCA GTCAGGGAAG GTTTGTTTAT GTTACATTTA 
1301 TTTCACCAGA ACTATTTTAA TATATCAAAG GGGTTTACTA TGCCAAACAA 
1351 AATTCTAGGG AAAAATACTG CTAAAAATGG ATGCCTCATC AGAACATGCT 
1401 GTTGAGTCCA ATGTGCCATA AGACATTTTA GCATGTTAAA TAGCACTTTT 
1451 AATAGCAAAA AAAGGCACAT CAACTGCGAA GTTATCCTTA GTTTGCAAAT 
1501 GCTTTTTCTA CATTAATGAT TTTTCAATCA TTAGGGTACT AGACACATCA 
1551 GCCTAAAGTG GCATCTGGAA TTGAATGGAT TTACTGATAA TGATCAGTCT 
1601 TTAGTCTTCC CTTTGTTATA TGACTTTATA GGTTATGATT GATCAAATTT 
1651 ACGTTTTACT AATGGTAAGG GTGAGGGTCA TAGGGCAGGT TTTGGGTTTT 
1701 CTAGTACTGT TGAAAACTGC AAGTATTGGC TATTTGTATA CTTAGCCATA 
1751 ACTTGGTGAA AAAAAACCTG AGCAGTGTCT ATGTATTAAT GCGTTGGAAA 
1801 GAAAGCTGCT TGTGTTTGCT TTGTT7VATTG CCTCAGGATA TTTCTTTTAA 
1851 AATAAGCTGT TTTAAGAGGA ACAGAAGGGA AATCTGCTAC CTAGTCTATA 
1901 CACAGCGTGA ACCTCACAGG GGGCTTCTGA TACCCTCAAA CATGGAGAAC 
1951 AGTAAGGGAG CAGAGTGGTT AAGGACTTTC AGGAACTTAA CTATTCTGGA 
2001 ATAAGGAATG AATCAACTGA CCTTGGGCCA GCAGGTTTTT AACTAAATTG 
2051 TTACTTGCCT TTCTCACCCA GTTAATCAGT CTCTGTACTT GTTTCCCTTT 
2101 TTGAAACAAG TGTCTTGGTT AACTAATTCT GTTTTATGGT TGTGCTAAAT 
2151 TCATAGCAGG TGCCTTATTC TTTGCTTTTA GTCAAACCAT TCCATATCAG 
2201 AATTTTCCTT GGTTTACTAT AGATATTTGG CTTTAAGTTG TTGTTTGTGT 
2251 TTTTTAATGT ACAATCTTCT GATAAATTTG ACTGTTAAAT TGCTATAGCT 
2301 AGCAATCATT TTACATATGT AAAAAATTGC ATTCCCTTTG TATTTCATGT 
2351 GTAATTCACC AATTAAGTGC AGTTTATATT CAGGTTGGAT TATGCATGTT 
2401 TAGGTAAACG AAAGCTGTGT CTTACTTGAT TTATTCTTTA AAAATAAAGT 
2451 TCCCTGAATA TTTGAAAAAA AAAAAAAAAA AAAAAAAA 


715 


wo 01/12659 


PCT/lBOO/01496 


BLAST Results 


Entry HSCDN13 from database CMBL: 

H. sapiens {TL5) mRNA from LNCaP cell line 

Score « 10*75, P = 5.8e-41, identities « 219/221 

Entry AF100470_1 from database TREMBLNEW: 

gene: "RAMP4"; product: "ribosome attached membrane protein 4"; Rattus 
norvegicus ribosome attached metabrane protein 4 (RAMP4) mRNA, complete 
cds. 

Score = 331, P = 3.9e-28, identities = 66/66, positives - 66/66, frame 


Entry HSG19910 from database EMBL: 
human STS A002B48. 
Score - 530, P - 2.1e-17, identities = 108/109 


Medline entries 


No Medline entry 


Peptide information for frame 1 


ORF from 316 bp to 513 bp; peptide length: 66 
Category: strong similarity to known protein 
Classification: Intacellular transport and traffic 


I MVAKQRIRMA NEKHSKNITQ RGNVAKTSRN APEEKASVGP WLLALFIFW 
51 CGSAIFQIIQ SIRMGM 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKF2phtes3_21116, frame 1 

TREMBLNEW: RN0238236_1 gene: -ramp4''; product: "ribosome associated 
membrane protein RAMP4-; Rattus norvegicus mRNA for ribosome 
associated membrane protein RAMP4, N = 1, Score = 331, P « 6.2e-30 

TREMBL:AF100470_1 gene: "RAMP^"; product: "ribosome attached membrane 
protein 4"; Rattus norvegicus ribosome attached membrane protein 4 
{RAMP4) mRNA, complete cds., N = 1, Score = 331, P = 6,2e-30 


>TREMBLNEW:RN0238236_1 gene: "ramp4"; product: "ribosome associated membrane 
protein RAMP4"; Rattus norvegicus mRNA for ribosome associated membrane 
protein RAMP4 

Length =75 

HSPs: 

Score = 331 (49.7 bits). Expect = 6.2e-30, P - 6.2e-30 
Identities - 66/66 (100%), Positives « 66/66 (100%) 

Query: 1 MVAKQRIRMANEKHSKNITQRGNVAKTSRNAPEEKASVGPWLLALFIFVVCGSAIFQIIQ 60 

MVAKQRIRMANEKHSKNITQRGNVAKTSRNAPEEKASVGPWLLALFIFWCGSAIFQIIQ 
Sbjct: 10 MVAKQRIRMANEKHSKNITQRGNVAKTSRNAPEEKASVGPWLLALFIFWCGSAIFQIIQ 69 

Query: 61 SIRMGM 66 

SIRMGM 
Sbjct: 70 SIRMGM 75 

NO Pedant data available 
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DKFZphtes3_21n23 


group: testes derived 

DKFZphtes3_15jl8 encodes a novel 14 8 amino acid protein with strong similarity to rat 7acomp 
protein. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific , 
genes . 


strong similarity to rat 7acomp protein 
on genomic level encoded by AF107885 
Sequenced by LMU 
Locus: /map="14q24 . 3" 
Insert length: 3122 bp 

Poly A stretch at pos. 3070, polyadenylation signal at pos. 3045 


1 GGAAAACCTC GTGGGCTCAG CCCGGGAGAA AGGGCCAGGG AAGTTGGGTG 
51 GTTCTGTGCT TGGTCTGTCA ATGGAGGAGA TCAAAGTTTT ACGAAGGGTG 
101 AAGGAGGAGA ATGATCGGCG AGGTGGATTT ATTCGCATAT TTCCTACATC 
151 TGAGACATGG GAAATATATG GGTCCTACCT CGAGCATAAG ACCTCAATGA 
201 ACTATATGCT GGCAACACGC CTCTTCCAGG ACAGGGGAAA CCCAAGAAGA 
251 AGCTTATTGA CAGGAAGAAC ACGAATGACT GCTGATGGAG CGCCAGAATT 
301 GAAGATAGAG AGTCTGAATT CAAAGGCCAA GCTGCATGCT GCACTTTACG 
351 AGAGGAAGCT CCTGTCTCTG GAGGTGCGAA AACGTAGACG ACGGAGTAGC 
401 AGATTGAGGG CAATGAGGCC AAAATACCCA GTGATTACCC AACCAGCTGA 
451 AATGAATGTT AAAACTGAGA CAGAGAGTGA AGAGGAGGAA GAAGTCGCAT 
501 TAGATAATGA AGATGAAGAA CAGGAGGCTT CCCAGGAGGA GTCTGCAGGA 
551 TTTCTTAGAG AAAATCAAGC CAAATATACA CCCTCATTGA CAGCTTTGGT 
601 AGAAAATACA CCCAAAGAAA ATTCCATGAA AGTTCGTGAA TGGAATAATA 
651 AAGGTGGACA CTGCTGCAAA CTTGAGACTC AGGAGCTAGA GCCTAAATTT 
701 AACCTGATGC AGATTCTTCA AGATAATGGC AATCTTAGCA AAATGCAGGC 
751 CCGAATAGCA TTCTCTGCCT ATCTCCAGCA TGTTCAAATT CGCCTGATGA 
801 AAGACAGTGG CGGTCAGACG TTCAGTGCCA GTTGGGCTGC CAAAGAGGAT 
851 GAACAGATGG AGCTGGTTGT TCGTTTCCTC AAGCGAGCAT CAAATAACCT 
901 CCAGCATTCA CTGAGGATGG TATTACCCAG TCGACGATTG GCACTTCTGG 
951 AACGCAGAAG AATCCTGGCC CACCAGCTGG GTGACTTTAT CATTGTATAC 
1001 AACAAGGAAA CAGAACAAAT GGCTGAAAAG AAATCAAAGA AGAAAGTTGA 
1051 GGAAGAAGAG GAAGATGGGG TGAATATGGA AAACTTTCAG GAGTTCATCA 
1101 GACAAGCAAG TGAGGCTGAA CTGGAGGAGG TGTTGACTTT TTATACCCAA 
1151 AAGAACAAGT CTGCTAGTGT CTTCCTGGGG ACTCACTCTA AAATTTCTAA 
1201 GAACAACAAC AATTATTCTG ATAGTGGGGC AAAAGGTGAT CACCCTGAGA 
1251 CTATAATGGA AGAAGTGAAA ATAAAGCCAC CTAAACAGCA ACAGACGACA 
1301 GAAATTCATT CTGATAAATT ATCTCGATTT ACCACTTCAG CAGAAAAAGA 
1351 GGCAAAATTA GTTTATAGCA ATTCCTCCTC TGGTCCTACT GCTACTCTGC 
1401 AGAAAATTCC CAACACCCAT TTGTCATCTG TTACAACCTC TGACCTCTCT 
1451 CCAGGGCCTT GCCACCATTC TTCTTTATCT CAAATTCCTT CAGCTATCCC 
1501 CAGCATGCCT CACCAGCCAA CAATTTTACT GAACACAGTC TCTGCCAGTG 
1551 CTTCTCCCTG CCTACATCCC GGGGCACAGA ACATCCCT^G CCCTACTGGC 
1601 CTGCCACGCT GTCGATCAGG AAGTCACACC ATTGGTCCCT TTTCTTCCTT 
1651 CCAAAGTGCT GCACACATCT ATAGCCAGAA ACTGTCTCGT CCCTCTTCAG 
1701 CAAAGGCAGG ATCGTGCTAT CTAAACAAGC ATCATTCAGG AATAGCCAAA 
1751 ACACAAAAAG AGGGAGAAGA TGCTTCTTTA TATAGCAAAC GGTACAACCA 
1801 AAGTATGGTT ACAGCTGAAC TTCAGCGGCT AGCTGAGAAG CAGGCAGCGA 
1851 GACAGTATTC TCCATCCAGC CACATCAACC TCCTCACCCA ACAGGTAACA 
1901 AACCTGAATT TGGCAACTGG CATCATAAAC AGAAGCAGTG CTTCAGCTCC 
1951 CCCAACCCTC CGACCCATCA TCAGTCCTAG TGGCCCGACA TGGTCTACAC 
2001 AGTCAGACCC CCAAGCTCCC GAGAATCACT CCAGCTCTCC TGGAAGCAGG 
2051 AGCCTGCAGA CAGGGGGATT TGCCTGGGAA GGAGAAGTAG AAAACAACGT 
2101 GTACAGCCAG GCTACAGGGG TGGTCCCCCA GCACAAGTAT CACCCCACAG 
2151 CAGGCAGCTA TCAGCTTCAA TTTGCCCTGC AGCAACTTGA ACAACAAAAA 
2201 CTTCAGTCCC GGCAGCTCCT GGACCAGAGT CGAGCCCGGC ACCAGGCAAT 
2251 CTTTGGCAGC CAGACACTAC CTAACTCCAA TTTATGGACA ATGAATAATG 
2301 GTGCAGGTTG TAGAATTTCC AGTGCCACAG CTAGTGGCCA GAAGCCAACC 
2351 ACTCTGCCAC A7\AAAGTGGT ACCACCTCCA AGTTCTTGCG CCTCCCTGGT 
2401 TCCCAAACCC CCACCCAACC ACGAACAAGT GCTCAGAAGG GCAACATCCC 
2451 AGAAAGCTTC CAATACCCGC TTCAGATCCT CCTTTCAAAA CTATTTGTGG 
2501 TATTTCTTCC AAGCAGTCAG CTGAACTGAG GACGACAGCC TACAAACAAC 
2551 TACATGCATC TGAACTGTCT CTTGTAAATG AGCTTTTTTC AGAGCCAGAA 
2601 TCATACTCTC CAGGAAATAT GGAGAAAGAA ACCTGAGGAG ATTGAAGTTT 
2651 GCCAGGCACA AGGGCAAAAC TCAGACTGAA TGAATTTGAA AGGGTGGGGC 
2701 CAAAGATGTT GTAACCTGGG AGACTTCTCT GAAGAAAGAA AACTGTTTAA 
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2751 GAAACACAGA CTGAACTGCA GTACTTTTCC TTAAATAGCT GAGATGACCT 
2801 TCTTTACCCT GGGCTTAGGT GATTCTCATC AGGGTGACCT GAGTGGAAGT 
2851 TGGTGGTAAC GACTGTTCTG TGTCAGCACC CAGGACAGTG GTGTCTGTTA 
2901 AGGCTGCCAG GGATTAGCAG GGAGGAAAGC CATCAGGACT GGGTAGCCTG 
2951 GTAGCACCAA ATCCCAATTA ATGTTACCTG AACATGTGGT GAGGTCAGCC 
3001 GTATGATGAA AGATGTTTAA GAGATTAATQ TCAGAAGAAT ATGAAAATAA 
3051 ACACCGGCTT AAAAAATGTT AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
3101 AAAAAAAAAA AAAAAAAAAA AA 


BLAST Results 


Entry AF107885 from database EMBL: 

Homo sapiens chromosome 14q24.3 clone BAC270M14 transforming growth 
factor-beta 3 (TGF-beta 3) gene, complete cds; and unknown genes. 
Score * 3042, P = 3.0e-2l9, identities = 610/612 
5 exons matching 1893-3070 


Medline entries 


No Medline entry 


Peptide information for frame 2 


ORF froiD 71 bp to 2521 bp; peptide length: 817 
Category: strong similarity to known protein 


1 MEEIKVLRRV KEENDRRGGF IRIFPTSETW EIYGSYLEHK TSMNYMLATR 
51 LFQDRGNPRR SLLTGRTRMT ADGAPELKIE SLNSKAKLHA ALYERKLLSL 
101 EVRKRRRRSS RLRAMRPKYP VITQPAEMNV KTETESEEEE EVALDNEDEE 
151 QEASQEESAG FLRENQAKYT PSLTALVENT PKENSMKVRE WNNKGGHCCK 
201 LETQELEPKF NLMQILQDNG NLSKMQARIA FSAYLQHVQI RLMKDSGGQT 
251 FSASWAAKED EQMELWRFL KRASNNLQHS LRMVLPSRRL ALLERRRILA 
301 HQLGDFIIVY NKETEQMAEK KSKKKVEEEE EDGVNMENFQ EFIRQASEAE 
351 LEEVLTFYTQ KNKSASVFLG THSKISKNNN NYSDSGAKGD HPETIMEEVK 
401 IKPPKQQQTT EIHSDKLSRF TTSAEKEAKL VYSNSSSGPT ATLQKIPNTH 
451 LSSVTTSDLS PGPCHHSSLS QIPSAIPSMP HQPTILLNTV SASASPCLHP 
501 GAQNIPSPTG LPRCRSGSHT IGPFSSFQSA AHIYSQKLSR PSSAKAGSCY 
551 LNKHHSGIAK TQKEGEDASL YSKRYNQSMV TAELQRLAEK QAARQYSPSS 
601 HINLLTQQVT NLNLATGIIN RSSASAPPTL RPIISPSGPT WSTQSDPQAP 
651 ENHSSSPGSR SLQTGGFAWE GEVENNVYSQ ATGWPQHKY HPTAGSYQLQ 
701 FALQQLECXiK LQSRQLLDQS RARHQAIFGS QTLPNSNLWT MNNGAGCRIS 
751 SATASGQKPT TLPQKWPPP SSCASLVPKP PPNHEQVLRR ATSQKASNTR 
801 FRSSFQNYLW YFFQAVS 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_21n23, frame 2 

TREMBL:AF064856_1 product: "7acomp protein"; Rattus sp. 7acorop protein 
mRNA, complete cds., N = 1, Score 1845, P = 2.2e-190 

TREMBL: AF107885_3 product: "unknown"; Homo sapiens chromosome 14q24.3 
clone BAC270M14 transforming growth factor-beta 3 (TGF-beta 3) gene, 
complete cds; and unknown genes., N = 1, Score = 443, P « 5.3e-41 

TREMBL:AF107885_4 product: "unknown**; Homo sapiens chromosome 14q24.3 
clone BAC270M14 transforming growth factor-beta 3 (TGF-beta 3) gene, 
con^lete cds; and unknown genes., N = 1, Score = 265, P = 8.2e-22 


>TREMBL:AF064856_1 product: "7acomp protein"; Rattus sp. 7acomp protein 
mRNA, con^lete cds. 

Length « 436 

HSPs: 

Score o 1845 (276.8 bits). Expect ■= 2.2e-190, P » 2.2e-190 
Identities «= 369/435 (84%), Positives « 395/435 (90%) 
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Query: 115 MRPKYPVITQPAEMNVKTETESEEEEEVALDNEDEEQEASQEESAGFLRENQAKYTPSLT 174 

MRPKYPVIT PAEMN+KTETESEEEEEV LDNEDEEQEASQEESAG L ENQAKYTPSLT 
Sbjct: I MRPKYPVITLPAEMNIKTETESEEEEEVGLDNEDEEQEASQEESAGSIAENQAKYTPSLT 60 

Query: 175 ALVENTPKENSMKVREWNNKGGHCCKLETQELEPKFNLMQILQDNGNLSKMQARIAFSAY 234 

+VEN+P+EN+MKV EW NKG CCK+ETQE E KFNLMQILQDNGNLSK+QAR+AFSAY 
Sbjct: 61 VIVENSPRENAMKVAEWTNKGESCCKXETQEPESKFNLMQILQDNGNLSKVQARLAFSAY 120 

Query: 235 LQHVQIRLMKDSGGQTFSASWAAKEDEQMELVVRFLKRASNNLQHSLRMVLPSRRLALLE 294 

LQHVQ+RL KDSGGQT S SWAAKEDECJMELWRFLKRAS+NLQHSLRMVLPSRiOALLE 
Sbjct: 121 LQHVQVRLTKDSGGQTLSPSWAAKEDEQMELVVRFLKRASSNLQHSLRMVLPSRRLALLE 180 

Query: 295 RRRILAHQLGDFIIVYNKETEQMAEKKSKKKVEEEEEDGVNMENFQEFIRQASEAELEEV 354 

RRRILAHQLGDFI+VYNKETEQMAEKKSKKK+EEEEEDGVN E+FQEFIRQASEAELEEV 
Sbjct: 181 RRRILAHQLGDFIWYNKETEQMAEKKSKKKLEEEEEDGVNAESFQEFIRQASEAELEEV 240 

Query: 355 LTFYTQKNKSASVFLGTHSKISKNNNNYSDSGAKGDHPETIMEEVKIKPPKQQQTTEIHS 414 

LTFYTQKNKSASVFLGTHSK SKN+++YSDSGAKGDHPETI +EVKIK PKQQQ TEIHS 
Sbjct: 241 LTFYTQKNKSASVFLGTHSKSSKNSSSYSDSGAKGDHPETI-QEVKIKQPKQQQATEIHS 299 

Query: 415 DKLSRFTTSAEKEAKLVYSNSSS— GPTATL-QKIPNTHLSSV-TTSDLSPGPCHHSSLS 470 

DKLSRFTTSA KEAKLVY+N SS GP A L Q++P+THLSS+ TTS LS GP HHSSLS 
Sbjct: 300 DKLSRFTTSAGKEAKLVYTNCSSFSGPAAVLLQRLPSTHLSSIITTSTLSSGPGHHSSLS 359 

Query: 471 QIPSAIPSMPHQPTILLNTVSASASPCLHPGAQNIPSPTGLPRCRSGSHTIGPFSSFQSA 530 

QI AIPSMPHQ +LLN V SASP +HPG N+ SP GLPRCRSGS+TIGPFSSFQSA 
Sbjct: 360 QISPAIPSMPHQSALLLNPVPDSASPPVHPGTPNV^SPAGLPRCRSGSYTIGPFSSFQSA 418 

Query: 531 AHIYSQKLSRPSSAKAG 547 

AHIYSQKLSRPSSAKAG 
Sbjct: 419 AHIYSQKLSRPSSAKAG 435 


Pedant information for DKFZphtes3_21n23, frame 2 


Report for DKrZphtes3_21n23 .2 


{LENGTH! 817 

[HW] 91522.09 

Ipll 9.32 

(HOMOL} TREMBL:AF064B56_1 product: "7acomp protein"; Rattus sp. 7acomp protein mRNA, 
complete cds. le-166 

IPROSITE) MYRISTYL 6 

tPROSITEJ CAMP_PH0SPHO_SITE 4 

[PROSITE] CK2_PHOSPHO_SITE 12- 

tPROSITEl TYR_PHOSPHO_SITE 1 

[PROSITE] PKC_PHOSPHO_SITE 15 

[PROSITE] ASN_GLYC0SYLATION 7 

(KWl Alpha_Beta 

[KW] LOW_COMPLEXITY 13.83 % 


SEQ MEEIKVLRRVKEENDRRGGFIRIFPTSETWEIYGSYLEHKTSMNYMLATRLFQDRGNPRR 

SEG 

PRD ccchhhhhhhhhhhccccceeeecccccceeeecceeeecccchhhhhhhhhhhcccccc 


SEQ SLLTGRTRMTADGAPELKIESLNSKAKLHAALYERKLLSLEVRKRRRRSSRLRAMRPKYP 

SEG xxxxxxxxxxxxxxxxxxxx 

PRD ccccccceeeccccceeeeehhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccc 


SEQ VITQPAEMNVKTETESEEEEEVALDNEDEEQEASQEESAGFLRENQAKYTPSLTALVENT 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ceeeccchhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhccccceeeeeccc 


SEQ PKENSMKVREWNNKGGHCCKLETQELEPKFNLKQILQDNGNLSKMQARIAFSAYLQHVQI 

SEG 

PRD cccccceeeeeccccccccchhhhhhhccchhhhhhhcccchhhhhhhhhhhhhhhhhhh 

SEQ RLMKDSGGQTFSASWAAKEDEQMELVVRFLKRASNNLQHSLRMVLPSRRLALLERRRILA 

SEG xxxxxxxxxxxxxxx . 

PRD hhhhcccccceeehhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhtihhhhhhhhhh 

SEQ HQLGDFIIVYNKETEQMAEKKSKKKVEEEEEDGVNMENFQEFIRQASEAELEEVLTFYTQ 

SEG xxxxxxxxxxxxx 

PEID hhccceeeeeehhhhhhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhhh 


SEQ KNKSASVFLGTHSKISKNNNNYSDSGAKGDHPETIMCEVKIKPPKQQQTTEIHSDKLSRF 

SEG 

PRD ccccceeeecccccccccccccccccccccccchhhhhhhccccccceeeeecccccccc 
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SEQ TTSAEKEAKLVYSNSSSGPTATLQKIPNTHLSSVTTSDLSPGPCHHSSLSQIPSAIPSMP 
SEG 

PRO hhhhhhhheeeecccccccceeeecccccccccccccccccccccccccccccccccccc 

SEQ HQPTILLNTVSASASPCLHPGAQNIPSPTGLPRCRSGSHTIGPFSSFQSAAHIYSQKLSR 
SEG 

PRD cccceeeeccccccccccccccccccccccccccccccccccccccchhhhhhhhhhccc 

SEQ PSSAKAGSCYLNKHHSGIAKTQKEGEDASLYSKRYNQSMVTAELQRLAEKQAARQYSPSS 
SEG 

cccccccceeeecccccccccccccccceeeecchhhhhhhhhhhhhhhhhhhhhhcccc 

HINLLTQQVTNLNLATGIINRSSASAPPTLRPIISPSGPTWSTQSDPQAPENHSSSPGSR 
. .xxxxxxxxxxxx 

cccccccccccccccccccccccccccccceeeecccccccccccccccccccccccccc 

SLQTGGFAWEGEVENNVYSQATGWPQHKYHPTAGSYQLQFALQQLEQQKLQSRQLLDQS 

xxxxxxxxxxxxxxxxxxxx - . . 

cccccccceeeeecccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhh 

RARHQAIFGSQTLPNSNLWTMNNGAGCRISSATASGQKPTTLPQKWPPPSSCASLVPKP 

PRD hhhhhhhhccccccccceeeeccccceeeeeeeccccccccccceeecccccceeecccc 

SEQ PPNHEQVLRRATSQKASNTRFRSSFQNYLWYFFQAVS 

SEG 

PRD cccchhhhhhhhhhhcccccccccccceeeeeeeccc 


PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 


Prosite for DKFZphtes3_21n23 . 2 

PSOOOOl 221->225 ASNGLYCOSYLATION PDOCOOOOl 

PSOOOOl 362->366 ASN_GLYCOSYLATI0N PDOCOOOOl 

PSOOOOl 381->385 ASN_GLYCOSYLATI0N PDOCOOOOl 

PSOOOOl 434->438 ASN_GLYCOSyLATION PDOCOOOOl 

PSOOOOl 576->5B0 ASNGLYCOSYLATION PDOCOOOOl 

PSOOOOl 620->624 ASN_GLYCOSYLATI0N PDOCOOOOl 

PSOOOOl 652->656 ASN_GLYCOSYLATION PDOCOOOOl 

PS00004 106->110 C7VMP_PHOSPHO_SITE PDOC00004 

PS00004 107->111 CAMP_PHOSPHO_SITE PDOC0O004 

PS00004 271->275 CAMP_PHOSPHO SITE PDOC00004 

PS00004 789->793 CAMP^PHOSPHO^SITE PDOC00004 

PS00005 64->67 PKC_PHOSPHO_SITE PDOC00005 

PS00005 109->112 PKC_PHOSPHO_SITE PDOC00005 

PS00005 180->183 PKC_PHOSPHO_SITE PDOC00005 

PS00005 185->188 PKC_PHOSPHO_SITE PDOC00005 

PS0O0O5 280->283 PKC_PHOSPHO_SITE PDOCOOOOS 

PS00005 287->290 PKC_PHOSPH0_SITE PDOCOOOOS 

PS00005 322->325 PKC_PHOSPH0_SITE PDOCOOOOS 

PS00005 359->352 PKC_PHOSPHO_SITE PDOCOOOOS 

PS00005 414->417 PKC_PHOSPHO_SITE PDOCOOOOS 

PS00005 535->538 PKC_PHOSPHO_SITE PDOCOOOOS 

PS00005 543->546 PKC PHOSPHO_SITE PDOCOOOOS 

PSOOOOS 561->S64 PKC~PHOSPHO_SITE PDOCOOOOS 

PS00005 572->57S PKC_PHOSPHO_SITE PDOCOOOOS 

PSOOOOS 629->632 PKC_PHOSPH0_SITE PDOCOOOOS 

PSOOOOS 793->796 PKC_PHOSPHO_SITE PDOCOOOOS 

PS00006 3S->39 CK2 PHOSPHO_SITE PDOCOOOOS 

PSOOOOS 132->136 CK22pHOSPHO SITE PDOCOOOOS 

PSOOOOS 134->138 CK2_PH0SPH0~SITE PDOCOOOOS 

PSOOOOS 136->140 CK2_PH0SPH0_SITE PDOCOOOOS 

PSOOOOS 154->158 CK2_PH0SPH0_SITE PDOCOOOOS 

PSOOOOS 180->184 CK2_PH0SPH0_SITE PDOCOOOOS 

PSOOOOS 347->351 CK2_PH0SPH0_SITE PDOCOOOOS 

PSOOOOS 394->398 CK2_PHOSPHO_SITE PDOCOOOOS 

PSOOOOS 422->426 CK2_PHOSPHO_SITE PDOCOOOOS 

PSOOOOS 455->459 CK2_PH0SPH0_SITE PDOCOOOOS 

PSOOOOS 561->S65 CK2_PHOSPHO_SITE PDOCOOOOS 

PSOOOOS 64 3->647 CK2_PHOSPHO_SITE PDOCOOOOS 

PS00007 563->572 TYR_PHOSPHO_SITE PDOC00007 

PS00008 195->201 MYRISTYL PDOC00008 

PS00008 248->254 MYRISTYL PDOCOOOOS 

PSOOOOS 510->516 MYRISTYL PDOCOOOOS 

PS00008 557->563 MYRISTYL PDOC00008 

PSOOOOS 746->752 MYRISTYL PDOCOOOOS 

PS00008 756->762 MYRISTYL PDOC00008 


(No Pfam data available for DKFZphtes3 21n23.2) 
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DKFZphtes3_22c23 

group: testes derived 

DKFZphtes3_22c23 encodes a novel 223 aniino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 

unknown 

complete cDNA, complete cds, 3 EST hits Uwo from a testis library) 
Sequenced by LMU 
Locus: /map-"9q34** 
Insert length: 1113 bp 

Poly A stretch at pes. 1073, polyadenylation signal at pos. 1055 

1 GGTGGGCAAA GGCATCTTCC TCTGGGAAGG ACTGGCACAA GCACTTGGTC 
51 CCTGGGTTGT GTGCCTGGGA GGCCGGGATC AGGGCTGGCC CTCTTTCTCC 

101 CTGGCAAAGC AAAACCTCCC TTTTACTACT ATCAAGGGGA AGTAACTTGA 

151 AGGTGCCTGT GGCAGGCAGC ACCTTGAGCC AACAGGAACC ATTGACATGC 

201 GAGGCCCAGG GCAGGCAGAC TGTGCAGTGG CCATTGGGCG GCCCCTCGGG 

251 GAGGTGGTGA CCCTCCGCGT CCTTGAGAGT TCTCTCAACT GCAGTGCGGG 

301 GGACATGTTG CTGCTTTGGG GCCGGCTCAC CTGGAGGAAG ATGTGCAGGA 

351 AGCTGTTGGA CATGACTTTC AGCTCCAAGA CC AACACGCT GGTGGTGAGG 

401 CAGCGCTGCG GGCGGCCAGG AGGTGGGGTG CTGCTGCGGT ATGGGAGCCA 

451 GCTTGCTCCT GAAACCTTCT ACAGAGAATG TGACATGCAG CTCTTTGGGC 

501 CCTGGGGTGA AATCGTGAGC CCCTCGCTGA GTCCAGCCAC GAGTAATGCA 

551 GGGGGCTGCC GGCTCTTCAT TAATGTGGCT CCGCACGCAC GGATTGCCAT 

601 CCATGCCCTG GCCACCAACA TGGGCGCTGG GACCGAGGGA GCCAATGCCA 

651 GCTACATCTT GATCCGGGAC ACCCACAGCT TGAGGACCAC AGCGTTCCAT 

701 GGGCAGCAGG TGCTCTACTG GGAGTCAGAG AGCAGCCAGG CTGAGATGGA 

751 GTTCAGCGAG GGCTTCCTGA AGGCTCAGGC CAGCCTGCGG GGCCAGTACT 

801 GGACCCTCCA ATCATGGGTA CCGGAGATGC AGGACCCTCA GTCCTGGAAG 

851 GGAAAGGAAG GAACCTGAGG GTCATTGAAC ATTTGTTCCG TGTCTGGCCA 

901 GCCCTGGAGG GTTGACCCCT GGTCTCAGTG CTTTCCAATT CGAACTTTTT 

951 CCAATCTTAG GTATCTACTT TAGAGTCTTC TCCAATGTCC AAAAGGCTAG 
1001 GGGGTTGGAG GTGGGGACTC TGGAAAAGCA GCCCCCATTT CCTCGGGTAC 
1051 CAATAAATAA AACATGCAGG CT6AAAAAAA AAAAAAAAAA AAAAAAAAAA 
1101 AAAAAAAAAA AAA 

BLAST Results 


Entry HSAC164 4 from database EMBL: 

Genomic sequence from Human 9q34, complete sequence. 
Score = 2072, P = 8.8e-225, identities = 422/430 
5 exons Bp 41969-36232 


Medline entries 

No Medline entry 


Peptide information for frame 2 


ORF from 197 bp to 865 bp; peptide length: 223 
Category: putative protein 


1 MRGPGQADCA VAIGRPLGEV VTLRVLESSL NCSAGDMLLL WGRLTWRKMC 
51 RKLLDMTFSS KTNTLVVRQR CGRPGGGVLL RYGSQLAPET FYRECDMQLF 
101 GPWGEIVSPS LSPATSNAGG CRLFINVAPH ARIAIHALAT NMGAGTEGAN 
151 ASYILIRDTH SLRTTAFHGQ QVLYWESESS QAEMEFSEGF LKAQASLRGQ 
201 yWTLQSWVPE MQDPQSWKGK EG? 
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PCT/IBOO/01496 


BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_22c23, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_22c23, frame 2 

Report for DKrzphtes3_22c23 . 2 


[LENGTH] 

[MW] 

(pll 

{PROSITE] 
(PROSITE) 
IPROSITEI 
(PROSITEJ 
IKWI 


223 

24546.19 
8.57 

MYRISTYL 4 
CK2_PHOSPH0_SITE 
PKC PHOSPHO SITE 
ASN^GLYCOSYLATION 
AXpha_Beta 


SEQ MRGPGQADCAVAIGRPLGEVVTLRVLESSLNCSAGDMLLLWGRLTWRKMCRKLLDMTFSS 

PRD ccccccccceeeecccccceeeeehhhhhcccccchhhhhhchhhhhhhhhhhhhhhccc 

SEQ KTNTLWRQRCGRPGGGVLLRYGSQLAPETFYRECDMQLFGPWGEIVSPSLSPATSNAGG 

PRD ccceeeeeecccccccceeeeccccccchhhhhhhhhccccccceeeecccccccccccc 

SEQ CRLFINVAPHARIAIHALATNMGAGTEGANASYILIRDTHSLRTTAFHGQQVLYWESESS 

PRD ceeeeeecccceeehhhhhhhhccccccccceeeeeecccccceeecccceeeeeccccc 

S EQ QAEMEFSEGFLKAQASLRGQYWTLQSWVPEMQDPQSWKGKEGT 

PRD hhhhhhhcchhhhhhhhhhcccccccccccccccccccccccc 


Prosite for DKF2phtes3_22c23,2 


PSOOOOl 

31 

->35 

PSOOOOl 

150- 

>154 

PS00005 

22 

->25 

PS00005 

45 

->48 

PS00005 

59 

->62 

PSOO0O5 

161- 

>164 

PS00005 

196- 

>199 

PS00005 

216- 

>219 

PS00006 

33 

->37 

PS00006 

180- 

>184 

PS00008 

5 

->11 

PS00008 

145- 

>151 

PS00008 

148- 

>154 

PS00008 

199- 

>205 


ASN_GLYCOSYLATION 

ASN GLYCOSYLATION 

PKC~PH0SPHO_S ITE 

PKC_PHOSPHO_S ITE 

PKC_PHOSPHO_SITE 

PKC_PH0SPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 


PDOCOOOOl 
PDOCOOOOl 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC0O005 
PDOC00005 
PDCX:00006 
PDOC00006 
PDOC00008 
PDOC00008 
PCKDC00008 
PDOC00008 


(No Pfam data available for DKFZphtes3_22c23 .2) 
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PCT/IBOO/01496 


DKFZphtes3_22g2 


group: nucleic acid management 

DKFZphtes3_22g2encodles a novel 1230 amino acid protein with nearly identical to rat TIP120. 

TATA-binding protein TBP is a central component for transcriptional regulation and is a target 
for various transcription regulators. TBP-interacting protein 120 (TIP120) is a protein 
interacting with the TATA-binding protein (TBP) . The novel protein is the human ortholog of 
rat TIP120. The novel TBP-binding protein is considered to participate in transcription 
regulation through the interaction with TBP. 

The new protein can find application in modulation of gene transcription. 


KIAA0829, complete cds, nearly identical to rat TIP120 
coir^lete cDNA, complete cds, EST hits. 
Sequenced by LMU 

Locus: /map="387.3 cR from top of Chrl2 linkage group" 
Insert length: 5387 bp 

Poly A stretch at pos. 5352, polyadenylation signal at pos. 5335 


1 GGGAGCGAGT GCGGAGCGAG TGGGAGCGAG ACGGCCCTGA GTGGAAGTGT 
51 CTGGCTCCCC GTAGAGGCCC TTCTGTACGC CCCGCCGCCC ATGAGCTCGT 
101 TCTCACGCGA ACAGCGCCGT CGTTAGGCTG GCTCTGTAGC CTCGGCTTAC 
151 CCCGGGACAG GCCCACGCCT CGCCAGGGAG GGGGCAGCCC GTCGAGGCGC 
201 CTCCCTAGTC AGCGTCGGCG TCGCGCTGCG ACCCTGGAAG CGGGAGCCGC 
251 CGCGAGCGAG AGGAGGAGCT CCAGTGGCGG CGGCGGCGGC GGCAGCGGCA 
301 GCGGGC AGCA GCTCCAGCAG CGCCAGCAGG CGGGATCGAG GCCGTCAACA 
351 TGGCGAGCGC CTCGTACCAC ATTTCCAATT TGCTGGAAAA AATGACATCC 
401 AGCGACT^GG ACTTTAGGTT TATGGCTACA AATGATTTGA TGACGGAACT 
4 51 GCAGAAAGAT TCCATCAAGT TGGATGATGA TAGTGAAAGG AAAGTAGTGA 
501 AAATGATTTT GAAGTTATTG GAAGATAAAA ATGGAGAGGT ACAGAATTTA 
551 GCTGTCAAAT GTCTTGGTCC TTTAGTGAGT AAAGTGAAAG AATACCAAGT 
601 AGAGACAATT GTAGATACCC TCTGCACTAA CATGCTTTCT GATAAAGAAC 
651 AACTTCGAGA CATTTCAAGT ATTGGTCTTA AAACAGTAAT TGGAGAACTT 
701 CCTCCAGCTT CCAGTGGCTC TGCATTAGCT GCTAATGTAT GTAAAAAGAT 
751 TACTGGACGT CTTACAAGTG CAATAGCAAA ACAGGAAGAT GTCTCTGTTC 
801 AGCTAGAAGC CTTGGATATT ATGGCTGATA TGTTGAGCAG GCAAGGAGGA 
851 CTTCTTGTTA ATTTCCATCC TTCAATTCTG ACCTGTCTAC TTCCCCAGTT 
901 GACCAGCCCT AGACTTGCAG TGAGGAAAAG AACCATTATC GCTCTTGGCC 
951 ATCTGGTTAT GAGCTGTGGA AATATAGTTT TTGTAGATCT TATl'GAACAT 
1001 CTGTTGTCAG AGTTGTCCAA AAATGATTCT ATGTCAACAA CAAGAACCTA 
1051 CATACAATGT ATTGCTGCTA TTAGTAGGCA AGCTGGTCAT AGAATAGGTG 
1101 AATACCTTGA GAAGATAATT CCTTTGGTGG TAAAATTTTG CAATGTAGAT 
1151 GATGATGAAT TAAGAGAGTA CTGTATTCAA GCCTTTGAAT CATTTGTAAG 
1201 AAGATGTCCT AAGGAAGTAT ATCCTCATGT TTCTACCATT ATAAATATTT 
1251 GTCTTAAATA TCTTACCTAT GATCCAAATT ATAATTACGA TGATGAAGAT 
1301 GAAGATGAAA ATGCAATGGA TGCTGATGGT GGTGATGATG ATGATCAAGG 
1351 GAGTGATGAT GAATACAGTG ATGATGATGA CATGAGTTGG AAAGTGAGAC 
1401 GTGCAGCTGC GAAGTGCTTG GATGCTGTAG TTAGCACAAG GCATGAAATG 
1451 CTTCCAGAAT TCTACAAGAC CGTCTCTCCT GCACTAATAT CCAGATTTAA 
1501 AGAGCGTGAA GAGAATGTAA AGGCAGATGT TTTTCACGCA TACCTTTCTC 
1551 TTTTGAAGCA AACTCGTCCT GTACAAAGTT GGCTATGTGA CCCTGATGCA 
1601 ATGGAGCAGG GAGAAACACC TTTAACAATG CTTCAGAGTC AGGTTCCCAA 
1651 CATTGTTAAA GCTCTTCACA AACAGATGAA AGAAAAAAGT GTGAAGACCC 
1701 GACAGTGTTG TTTTAACATG TTAACTGAGC TGGTAAATGT ATTACCTGGG 
1751 GCCCTAACTC AACACATTCC TGTACTTGTA CCAGGAATCA TTTTCTCACT 
1801 GAATGATAAA TCAAGCTCAT CGAATTTGAA GATCGATGCT TTGTCATGTC 
1851 TATACGTTUVT CCTCTGTAAC CATTCTCCTC AAGTCTTCCA TCCTCACGTT 
1901 CAGGCTTTGG TTCCTCCAGT GGTGGCTTGT GTTGGAGACC CATTTTACAA 
1951 AATTACATCT GAAGCACTTC TTGTTACTCA ACAGCTTGTC AAAGTT^TTC 
2001 GTCCTTTAGA TCAGCCTTCC TCGTTTGATG CAACTCCTTA TATCAAAGAT 
2051 CTATTTACCT GTACCATTAA GAGATTAAAA GCAGCTGACA TTGATCAGGA 
2101 AGTCAAGGAA AGGGCTATTT CCTGTATGGG ACAAATTATT TGCAACCTTG 
2151 GAGACAATTT GGGTTCTGAC TTGCCTAATA CACTTCAGAT TTTCTTGGAG 
2201 AGACTAAAGA ATGAAATTAC CAGGTTAACT ACAGTAAAGG CATTGACACT 
2251 GATTGCTGGG TCACCTTTGA AGATAGATTT GAGGCCTGTT CTGGGAGAAG 
2301 GGGTTCCTAT CCTTGCTTCA TTTCTTAGAA AAAACCAGAG AGCTTTGAAA 
2351 CTGGGTACTC TTTCTGCCCT TGATATTCTA ATAAAAAACT ATAGTGACAG 
2401 CTTGACAGCT GCCATGATTG ATGCAGTTCT AGATGAGCTC CCACCTCTTA 
24 51 TCAGCGAAAG TGATATGCAT GTTTCACAAA TGGCCATCAG TTTTCTTACC 
2501 ACTTTGGCAA AAGTATATCC CTCCTCCCTT TCAAAGATAA GTGGATCCAT 
2551 TCTCAATGAA CTTATTGGAC TTGTGAGATC ACCCTTATTG CAGGGGGGAG 
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2601 CTCTTAGTGC CATGCTAGAC TTTTTCCAAG CTCTGGTTGT CACTGGAACA 
2651 AATAATTTAG GATACATGGA TTTGTTGCGC ATGCTGACTG GTCCAGTTTA 
2701 CTCTCAGAGC ACAGCTCTTA CTCATAAGCA GTCTTATTAT TCCATTGCCA 
2751 AATGTGTAGC TGCCCTTACT CGAGCATGCC CTAAAGAGGG ACCAGCTGTA 
2801 GTAGGTCAGT TTATTCAAGA TGTCAAGAAC TCAAGGTCTA CAGATTCCAT 
2851 TCGTCTCTTA GCTCTACTTT CTCTTGGAGA AGTTGGGCAT CATATTGACT 
2901 TAAGTGGACA GTTGGAACTA AAATCTGTAA TACTAGAAGC TTTCTCATCT 
2951 CCTAGTGAAG AAGTCAAATC AGCTGCATCC TATGCATTAG GCAGCATTAG 
3001 TGTGGGCAAC CTTCCTGAAT ATCTGCCGTT TGTCCTGCAA GAAATAACTA 
3051 GTCAACCCAA AAGGCAGTAT CTTTTACTTC ATTCCTTGAA GGAAATTATT 
3101 AGCTCTGCAT CAGTGGTGGG CCTTAAACCA TATGTTGAAA ACATCTGGGC 
3151 CTTATTACTA AAGCACTGTG AGTGTGCAGA GGAAGGAACC AGAAATGTTG 
3201 TTGCTGAATG TCTAGGAAAA CTCACTCTAA TTGATCCAGA AACTCTCCTT 
3251 CCACGGCTTA AGGGGTACTT GATATCAGGC TCATCATATG CCCGAAGCTC 
3301 AGTGGTTACG GCTGTGAAAT TTACAATTTC TGACCATCCA CAACCTATTG 
3351 ATCCACTGTT AAAGAACTGC ATAGGTGATT TCCTAAAAAC TTTGGAAGAC 
34 01 CCAGATTTGA ATGTGAGAAG AGTAGCCTTG GTCACATTTA ATTCAGCAGC 
34 51 ACATAACAAG CCATCATTAA TAAGGGATCT ATTGGATACT GTTCTTCCAC 
3501 ATCTTTACAA TGAAACAAAA GTTAGAAAGG AGCTTATAAG AGAGGTAGAA 
3551 ATGGGTCCAT TTAAACATAC GGTTGATGAT GGTCTGGATA TTAGAAAGGC 
3601 AGCATTTGAG TGTATGTACA CACTTCTAGA CAGTTGTCTT GATAGACTTG 
3651 ATATCTTTGA ATTTCTAAAT CATGTTGAAG ATGGTTTGAA GGACCATTAT 
3701 GATATTAAGA TGCTGACATT TTTAATGTTG GTGAGACTGT CTACCCTTTG 
3751 TCCAAGTGCA GTACTGCAGA GGTTGGACCG ACTTGTTGAG CCATTACGTG 
3801 CAACATGTAC AACTAAGGTA AAGGCAAACT CAGTAAAGCA GGAGTTTGAA 
3851 AAACAAGATG AATTAAAGCG ATCTGCCATG AGAGCAGTAG CAGCACTGCT 
3901 AACCATTCCA GAAGCAGAGA AGAGTCCACT GATGAGTGAA TTCCAGTCAC 
3951 AGATCAGTTC TAACCCTGAG CTGGCGGCTA TCTTTGAAAG TATCCAGAAA 
4001 GATTCATCAT CTACTAACTT GGAATCAATG GACACTAGTT AGATGTTTGT 
4051 TCACCATGGG GACCATTACA TATGACCATA CAATGCACTG AATTGACAGG 
4101 TTAATCATAA GACATGGAAA GAGAAGTGTC TAAAAGCTTC AAAATGTTCC 
4151 ACTTTTTTTT CCTTCATGGA GACTGTTTGT TTGGCTTTCT TCCATTGTTG 
4201 TTTTTGTAGC ATTTATTTCA GAAATGTGTA TTTCCATAAT CCAGAGGTTG 
4251 TAAAACCACT AGTGTTTTAG TGGTTACAGC AACATTTGAA ATGGAAACTA 
4301 AAAGTTAGGA TTTTATGGAG TATGGAGATA GGGTCCAGTA TCTATTTACC 
4351 CTGTAATGTT TAGGATTAAA ATGTTAAAAT TTTGTGACCA TGAATTTCTT 
4401 TCTTTTATAA ATTTTCTCAT TTAAAAATCA AAAATCTTGC AAAACAAAAA 
4451 CCATGTTTCT TTTTCTTGTA TAACTTTTTG TTTTCAGCAA CATAAATTGA 
4501 TTTTTAGCTG GCAGACAAGA ATATCCATAT AAGATTTGTT AACCATTTCA 
4551 GAGAGTTTGG CAATTTTTAA AAGATAATAA GGTATCATTT TTAAGTATGA 
4601 AAATTAACAA TATCCCTGTT GCGCACACTA ATTTTGCATG AGTAAGTTTA 
4 651 CAAATATGTA TCGTCTGTAA AGCAGCATGT GCAGATTATT CATAATATAG 
4 701 AAGTTAAAAT AAGTATTAGT GCAATTTTCA GATATTTATT TTTGCACAGA 
4751 AAACACATTA TCTGGAGAGA AAGAAAGGAG AATTTTTGAG ACTTGGGTTT 
4801 TCTTAATGCC AGTGTGAATT TGCAGATGTT TTCAGAAAAT CAAGTCACAG 
4851 TAACAATTTG CCACTTTTTT CTATTATAAA TCTTCTTACT TAAATTTTGA 
4 901 ATATTTAGTT TTTCTCAGTT ACCCATTTGT GTGTGTGTGA TTCCACTTAG 
4951 AAATTCTTAA AACCAGATTT TTCTTTCATT CCGTTTGGAT GTCTACATTC 
5001 CTTATCAAAG GATATAAATA CTGTGTATGC TTTTGAATTT TATTTTTAGG 
5051 AAAATTCTGA AGCCAGCTAT CACAGGTTTG TTAGCTAATA ATAGTATTTT 
5101 CTTTTAGTTG AGTTAGGTTT TTCCCCATCT CCTGTAGAGC GAATTTACAT 
5151 ATTGTATTGG GTAAGTGTTC ACTACTTTTC CTGATTAAGG GATCTGTGCT 
5201 GGGGAACAAA GCTTTTGCAG TACCTTATAT TGTAGTTAAA ATTTTATTTA 
5251 ACATATCCTT CAGTGAGCTC ATTTCACACT GTAGCCTCTT CCTTAAAATT 
5301 TGTGGTGCTC CTGTAACAGT AAGAACTAAT TCTGAAATAA AAGACATCTC 
5351 CTAAAAAAAA AAAAAAAAAA AAAAATUUU^ AAAAAAA 


BLAST Results 


Entry HS793345 from database EMBL: 
human STS WI-12457. 
Score « 1985, P = 1.3e-83, identities = 433/460 


Medline entries 


97127450: 

Molecular cloning of a novel 120-kDa TBP-interacting 
protein. 


Peptide information for frame 2 


ORF from 350 bp to 4039 bp; peptide length: 1230 
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Category: known protein 
Classification; Nucieic acid management 


1 MASASYHISN LLEKMTSSDK DFRFMATNDL MTELQKDSIK LDDDSERKVV 
51 KMILKLLEDK NGEVQNLAVK CLGPLVSKVK EYQVETIVDT LCTNMLSDKE 
101 QLRDISSIGL KTVIGELPPA SSGSALAANV CKKITGRLTS AIAKQEDVSV 
151 QLEALDIMAD MLSRQGGLLV NFHPSILTCL LPQLTSPRLA VRKRTIIALG 
201 HLVMSCGNIV FVDLIEHLLS ELSKNDSMST TRTYIQCIAA ISRQAGHRIG 
251 EYLEKIIPLV VKFCNVDDDE LREYCIQAFE SFVRRCPKEV YPHVSTIINI 
301 CLKYLTYDPN YNYDDEDEDE NAMDADGGDD DDQGSDDEYS DDDDMSWKVR 
351 RAAAKCLDAV VSTRHEMLPE FYKTVSPALI SRFKEREENV KADVFHAYLS 
401 LLKQTRPVQS WLCDPDAMEQ GETPLTMLQS QVPNIVKALH KQMKEKSVKT 
451 RQCCFNMLTE LVNVLPGALT QHIPVLVPGI IFSLNDKSSS SNLKIDALSC 
501 LYVILCNHSP QVFHPHVQAL VPPVVACVGD PFYKITSEAL LVTQQLVKVI 
551 RPLDQPSSFD ATPYIKDLFT CTIKRLKAAD IDQEVKERAI SCMGQIICNL 
601 GDNLGSDLPN TLQIFLERLK NEITRLTTVK ALTLIAGSPL KIDLRPVLGE 
651 GVPILASFLR KNQRALKLGT LSALDILIKN YSDSLTAAMI DAVLDELPPL 
701 ISESDMHVSQ MAISFLTTLA KVYPSSLSKI SGSILNELIG LVRSPLLQGG 
751 ALSAMLDFFQ ALVVTGTNNL GYMDLLRMLT GPVYSQSTAL THKQSYYSIA 
801 KCVAALTRAC PKEGPAWGQ FIQDVKNSRS TDSIRLLALL SLGEVGHHID 
851 LSGQLELKSV ILEAFSSPSE EVKSAASYAL GSISVGNLPE YLPFVLQEIT 
901 SQPKRQYLLL HSLKEIISSA SVVGLKPYVE NIWALLLKHC ECAEEGTRNV 
951 VAECLGKLTL IDPETLLPRL KGYLISGSSY ARSSVVTAVK FTISDHPQPI 
lOOl DPLLKNCIGD FLKTLEDPDL NVRRVALVTF NSAAHNKPSL IRDLLDTVLP 
1051 HLYNETKVRK ELIREVEMGP FKHTVDDGLD IRKAAFECMY TLLDSCLDRL 
1101 DIFEFLNHVE DGLKDHYDIK MLTFLMLVRL STLCPSAVLQ RLDRLVEPLR 
1151 ATCTTKVKAN SVKQEFEKQD ELKRSAMRAV AAILTIPEAE KSPLMSEFQS 
1201 QISSNPELAA IFESIQKOSS STNLE5MDTS 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3__22g2, frame 2 

TREMBL:AB020636_1 gene: "KIAA0829"; product: "KIAA0829 protein"; Homo 
sapiens mRNA for KIAA0829 protein, partial cds., N = 1, Score » 5986, P 
= 0 

TREMBL:RND67H_1 gene: "tipl20"; product: "TIP120*; Rattus norvegicus 
mRNA for TIP120, complete cds., N = 1, Score « 6203, P 0 


>TREMBL:RND6711_1 gene: •'tipl20"; product: "TIP120"; Rattus norvegicus mRNA 
for TIP120, coEnplete cds. 
Length « 1,230 

HSPs: 

Score = 6203 (930.7 bits). Expect = O.Oe+OO, P = O.Oe+00 
Identities ^ 1227/1230 (99%), Positives * 1228/1230 (99%) 

Query: 1 MASASYHISNLLEKMTSSDKDFRFMATNDLMTELQKDSIKLDDDSERKWKMILKLLEDK 

MASASYHISNLLEKMTSSDKDFRFMATNDLMTELQKDSIKLDDDSERKVVKMILKLLEDK 
Sbjct: 1 MASASYHISNLLEKMTSSDKDFRFMATNDLMTELQKDSIKLDDDSERKWKMILKLLEDK 

Query : 61 NGEVQNLAVKCLGPLVSKVKEYQVETIVDTLCTNMLSDKEQLRDISSIGLKTVIGELPPA 
NGEVQNLAVKCLGPLVSKVKEYQVBTI VDTLCTNMLS DKBQLRDI SS IGLKTVIGELPPA 
Sb j ct : 61 NGEVQNLAVKCLGPLVSKVKEYQVETIVDTLCTNMLSDKEQLRDISSIGLKTVIGELPPA 

Query: 121 ssgsalaanvckkitgrltsaiakqedvsvqlealdimadmlsrqggllvnfhpsiltcl 

SSGSALAANVCKKITGRLTSAIAKQEDVSVQLEALDIMADMLSRQGGLLVNFHPSILTCL 

Sbjct : 121 ssgsalaanvckkitgrltsaiakqedvsvqlealdimadmlsrqggllvnfhpsiltcl 

Query: 181 lpqltsprlavrkrtiialghlvmscgnivfvdliehllselskndsmsttrtyiqciaa 
lpqltsprlavrkrtiialghlvmscgnivfvdliehllselskndsmsttrtyiqciaa 

Sbjct : 181 LPQLTSPRLA VRKRTI I ALGHLVMSCGNIVFVDLIEHLLSELSKNDSMSTTRTYIQCIAA 

query: 241 i srqaghrigeyleki i pl vvkfcnvdddelre yc i qafes fvrrc pke v y ph vst uni 
isrqaghrigeylekiiplwkfcnvdddelreyciqafesfvrrcpkevyphvstiini 
Sbjct : 241 isrqaghrigeylekiiplwkfcnvdddelreyciqafesfvrrcpkevyphvstiini 

Query: 301 clkyltydpnykyddededenamdadggddddqgsddeysddddmswkvrraaakcldav 

CLKYLTYDPNYNYDDEDEDENAMDADGGDDDDQGSDDEYSDDDDMSWKVRRAAAKCLDAV 
Sb j ct : 301 CLKYLTYDPNYNYDDEDEDENAMDADGGDDDDQGSDDE YSDDDDMSWKVRRAAAKCLDAV 

Que ry : 361 VSTRHEMLPEFYKTVS PALI SRFKEREENVKADVFHAYLSLLKQTRPVQSWLCDPDAMEQ 
VSTRHEMLPEFYKTVSPALISRFKEREENVKADVFHAYLSLLKQTRPVQSWLCDPDAMEQ 
Sbjct: 361 VSTRHEMLPEFYKTVSPALISRFKEREENVKADVFHAYLSLLKQTRPVQSWLCDPDAMEQ 


60 
60 
120 
120 
180 
180 
240 
240 
300 
300 
360 
360 
420 
420 
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Query: 421 GETPLTMLQSQVPNIVKALHKQMKEKSVKTRQCCFNMLTELVNVLPGALTQHIPVLVPGI 480 

GETPLTMLQSQVPNIVKALHKOMKEKSVKTRQCCFNMLTELVNVLPGALTQHIPVLVPGI 
Sbjct: 421 GETPLTMLQSQVPNIVKALHKQMKEKSVKTRQCCFNMLTELVNVLPGALTQHIPVLVPGI 480 

Query: 481 IFSLNDKSSSSNLKIDALSCLYVILCNHSPQVFHPHVQALVPPWACVGDPFYKITSEAL 540 

IFSLNDKSSSSNLKIDALSCLYVILCNHSPQVFHPHVQALVPPVVACVGDPFYKITSEAL 
Sbjct: 481 IFSLNDKSSSSNLKIDALSCLYVILCNHSPQVFHPHVQALVPPWACVGDPFYKITSEAL 540 

Query: 541 LVTQQLVKVIRPLDQPSSFDATPYIKDLFTCTIKRLKAADIDQEVKERAISCMGQIICNL 600 

LVTQQLVKVIRPLDQPSSFDATPYIKDLFTCTIKRLKAADIDQEVKERAISCMGQIICNL 
Sbjct: 541 LVTQQLVKVIRPLDQPSSFDATPYIKDLFTCTIKRLKAADIDQEVKERAISCMGQIICNL 600 

Query: 601 GDNLGSDLPNTLQIFLERLKNEITRLTTVKALTLIAGSPLKIDLRPVLGEGVPILASFLR 660 

GDHLG DL NTLQIFLERLKNEITRLTTVKALTLIAGSPLKIDLRPVLGEGVPILASFLR 
Sbjct: 601 GDNLGPDLSNTLQIFLERLKNEITRLTTVKALTLIAGSPLKIDLRPVLGEGVPILASFLR 660 

Query: 661 KNQRALKLGTLSALDILIKNYSDSLTAAMIDAVLDELPPLISESDMHVSQMAISFLTTIA 720 

KNQRALKLGTLSALDILIKNYSDSLTAAMIDAVLDELPPLISESDMHVSQMAISFLTTLA 
Sbjct: 661 KNQRALKLGTLSALDILIKNYSDSLTAAMIDAVLDELPPLISESDMHVSQMAISFLTTIA 720 

Query: 721 KVYPSSLSKISGSILNELIGLVRSPLLQGGALSAMLDFFQALWTGTNNLGYMDLLRMLT 780 

KVYPSSLSKISGSILNELIGLVRSPLLQGGALSAMLDFFQALVVTGTNNLGYMDLLRMLT 
Sbjct: 721 KVYPSSLSKISGSILNELIGLVRSPLLQGGALSAMLDFFQALWTGTNNLGYMDLLRMLT 780 

Query: 781 GPVYSQSTALTHKQSYYSIAKCVAALTRACPKEGPAVVGQFIQDVKNSRSTDSIRLLALL 840 

GPVYSQSTALTHKQSYYSIAKCVAALTRACPKEGPAVVGQFIQDVKNSRSTDSIRLLALL 
Sbjct: 781 GPVYSQSTALTHKQSYYSIAKCVAALTRACPKEGPAVVGQFIQDVKNSRSTDSIRLLALL 840 

Query: 841 slgevghhidlsgqlelksvileafsspseevksaasyalgsisvgnlpeylpfvlqeit 900 

SLGEVGHHIDLSGQLELKSVILEAFSSPSEEVKSAASYALGSISVGNLPEYLPFVLQEIT 
Sbjct: 841 SLGEVGHHIDLSGQLELKSVILEAFSSPSEEVKSAASYALGSISVGNLPEYLPFVLQEIT 900 

Query: 901 SQPKRQYLLLHSLKEIISSASVVGLKPYVENIWALLLKHCECAEEGTRNWAECLGKLTL 960 

SQPKRQYLLLHSLKEIISSASVVGLKPYVENIWALLLKHCECAEEGTRNWAECLGKLTL 
Sbjct: 901 SQPKRQYLLLHSLKEIISSASVVGLKPYVENIWALLLKHCECAEEGTRNVVAECLGKLTL 960 

Query: 961 IDPETLLPRLKGYLISGSSYARSSWTAVKFTISDHPQPIDPLLKNCIGDFLKTLEDPDL 1020 

IDPETLLPRLKGYLISGSSYARSSVVTAVKFTISDHPQPIDPLLKNCIGDFLKTLEDPDL 
Sbjct: 961 IDPETLLPRLKGYLISGSSYARSSWTAVKFTISDHPQPIDPLLKNCIGDFLKTLEDPDL 1020 

Query: 1021 NVRRVALVTFNSAAHNKPSLIRDLLDTVLPHLYNETKVRKELIREVEMGPFKHTVDDGLD 1080 

NVRRVALVTFNSAAHNKPSLIRDLLD+VLPHLYNETKVRKELIREVEMGPFKHTVDDGLD 
Sbjct: 1021 NVRRVALVTFNSAAHNKPSLIRDLLDSVLPHLYNETKVRKELIREVEMGPFKHTVDDGLD 1080 

Query: 1081 IRKAAFECMYTLLDSCLDRLDIFEFLNHVEDGLKDHYDIKMLTFLMLVRLSTLCPSAVLQ 1140 

IRKAAFECMYTLLDSCLDRLDIFEFLNHVEDGLKDHYDIKMLTFLMLVRLSTLCPSAVLQ 
Sbjct: 1081 IRKAAFECMYTLLDSCLDRLDIFEFLNHVEDGLKDHYDIKMLTFLMLVRLSTLCPSAVLQ 1140 

Query: 1141 RLDRLVEPLRATCTTKVKANSVKQEFEKQDELKRSAMRAVAALLTIPEAEKSPLMSEFQS 1200 

RLORLVEPLRATCTTKVKANSVKQEFEKQDELKRSAMRAVAALLTIPEAEKSPLMSEFQS 
Sbjct: 1141 RLDRLVEPLRATCTTKVKANSVKQEFEKQDELKRSAMRAVAALLTIPEAEKSPLMSEFQS 1200 

Query: 1201 QISSNPELAAIFESIQKDSSSTNLESMDTS 1230 

QISSNPELAAIFESIQKDSSSTNLESMDTS 
Sbjct: 1201 QISSNPELAAIFESIQKDSSSTNLESMDTS 1230 

Pedant information for DKFZphtes3_22g2, frame 2 

Report for DKF2phtes3_22g2 . 2 

{LENGTH) 1230 
[MWl 136376.58 
(pi) 5.52 

, TREMBL:RND6711_1 gene: -txpl20"; product: -TIP120-; Rattus norvegicus mRNA for 
TIP120, complete cds . 0.0 
(KW) TRANSMEMBRANE 1 

tKW] L0W_COMPLEXITY 5.28 % 

SEQ MASASYHISNLLEKMTSSDKDFRFMATNDLMTELQKDSIKLDDDSERKVVKMILKLLEDK 
SEG 

PRO cccccchhhhhhhhhcccccceeeeehhhhhhhhhcccccccccchhhhhhhhhhhhhcc 
MEM 

SEQ NGEVQNLAVKCLGPLVSKVKEYOVETIVDTLCTNMLSDKEQLRDISSIGLKTVIGELPPA 

SEG xxxx 

PRD ccccceeeeeeeeceeeeehhhhhhhhhhhhccchhhhhcccccccchhhhhhhhhcccc 
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MEM 


SEQ SSGSALAANVCKKITGRLTSAIAKQEDVSVQLEALDIMADMLSRQGGLLVNFHPSILTCL 
SEG xxxxxxxx 

PRO cccccchhhhhhhccchhhhhhhccccchhhhhhhhhhhhlihhhhccceeeec^^ 

HEM , 

SEQ LPQLTSPRLAVRKRTIIALGHLVMSCGNIVFVDLIEHLLSELSKNDSMSTTRTYIQCIAA 

SEG , 

PRD hcccccchhhhhhhhhhhheeeeecccceeehhhhhhhhhhhhccccchhhhh^ 

MEM MMMMMMMMMMMMMMMMM 

SEQ ISRQAGHRIGEYLEKIIPLWKFCNVDDDELREYCIQAFESFVRRCPKEVYPHVSTIIKI 

SEG ....«,. 

PRD 
HEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 

SEG 


hhhhcccccccchhhhhhhhheeeeccchhhhhhhhhhhhhhhhccccceeecchhh^^ 

CLKYLTYDPNYNYDDEDEDENAMDADGGDDDDQGSDDEYSDDDDMSWKVRRAAAKCLDAV 

*^'««*«^x«xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

hhhhhccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhh 
VSTRHEMLPEFYKTVSPALISRFKEREENVKADVFHAYLSLLKQTRPVQSWLCDPDAMEQ 

SEQ GETPLTMLQSQVPNIVKALHKQMKEKSVKTRQCCFNMLTELVNVLPGALTQHIPVLVPGI 
SEG 


PRD 
MEM 


cccchhhhhhhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhccccccci 


cceeeecce 


SEQ lE'SLNDKSSSSNLKIDALSCLYVILCNHSPQVFHPHVQALVPPVVACVGDPFYKITSEAL 

SEG xxxxxxxxxxxxxxxx 

MEM ®®^®^^'=^^*=*=^^*^^**^^^^^eeeeecccccccccceeeeecc4eeeecccchhhh^^ 

SEQ LVTQQLVKVIRPLDQPSSFDATPYIKDLFTCTIKRLKAADIDQEVKERAISCMGQIICNL 

SEG , 

PRD h^ihhhhhhhhcccccccccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhheee^ 

MEM 

SEQ GDNLGSDLPNTLQIFLERLKNEITRLTTVKALTLIAGSPLKIDLRPVLGEGVPILASFLR 

SEG 

PRD 

MEM 


cccccccccchhhhhhhhhcchhhhhhhhhhhheeeeccccccccceeehhhhhhhhhhh 


SEQ KNQRALKLGTLSALDILIKNYSDSLTAAMIDAVLDELPPLISESDMHVSQMAISFLTTIA 


PRD 
HEM 


hhhhhhhhhhhhhhhhhhhccccchhhhhhhhhhhhcccccccchhhhhhhhhhhhhhh^ 


SEQ KVYPSSLSKISGSILNELIGLVRSPLLQGGALSAMLDFFQALVVTGTNNLGYMDLLRMLT 

SEG 

PRD cccccceeecchhhhhhhhhhhccccccchhhhhhhhhhhheeeecccccc^ 

MEM •••••••••••»....,........ 

SEQ GPVYSQSTALTHKQSYYSIAKCVAALTRACPKEGPAWGQFIQDVKNSRSTDSIRLLALL 

SEG 

PRD cccccccccchhhhhhhhhhhhhhhhhhhhcccchhhhhhhhh^^ 

MEM 

SEQ SLGEVGHHIDLSGQLELKSVILEAFSSPSEEVKSAASYALGSISVGNLPEYLPFVLQEIT 
SEG 

PRD hccccccccccccccccceeeeeeccccchhhhhhhhhhhccccccccccchh^ 
MEM 

SEQ SQPKRQYLLLHSLKEIISSASVVGLKPYVENIWALLLKHCECAEEGTRNWAECLGKLTL 

SEG .' 

PRD cccchhhhhhhhhhhhhhcccceeehhhhhhhhhhhhhhhhcccccceeeeec^ 

MEM ...•.»..,,.,, 

SEQ IDPETLLPRLKGYLISGSSYARSSVVTAVKFTISDHPQPIDPLLKNCIGDFLKTLEDPDL 

SEG 

PRD cccccccccccccccccccccchhhhhhhhhhhccccccccccchhhh^^ 

MEM 

SEQ NVRRVALVTFNSAAHHKPSLIRDLLDTVLPHLYNETKVRKELIREVEMGPFKHTVDDGLD 

SEG 

PRD ccceeeeeeecccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhh^ 

MrIM , 
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SEQ IRKAAFECMYTLLDSCLDRLDIFEFLNHVEDGLKDHYDIKMLTFLMLVRLSTLCPSAVLQ 

SEG 

PRD hhhhhhhhhhhhhhhccccccceeeecccccccccchhhhhhhhhhhhhhhhcccchhhh 

MEM 

SEQ RLDRLVEPLRATCTTKVKANSVKQEFEKQDELKRSAMRAVAALLTIPEAEKSPLMSEFQS 

SEG 

PRD hhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccchhhhh 

MEM 

SEQ QISSNPELAAIFESIQKDSSSTNLESMDTS 
SEG 

PRD hhhccchhhhhhhhhhhccccccccccccc 

MEM 


(No Prosite data available for DKFZphtes3_22g2 .2) 
(No Pfara data available for DKFZphtes3_22g2 . 2) 
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DKFZphtes3_22nl3 


group: testes derived 

DKFZphtes3_22nl3 encodes a novel 677 amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfara or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 
genes. 


dJl042K10.3, complete 
Sequenced by lmu 
Locus: /map="22ql3.1-13.2" 
Insert length: 3353 bp 

Poly A stretch at pos. 3315, polyadenylation signal at pos. 3298 


1 ATGGAACCAC TATCCCCACT GCCAAGTCCA CCCCCACACT CATTAAGCAA 
51 AGCCAACCCA AGTCTGCCAG TGAGAAGTCA CAGCGCAGCA AGAAGGCCAA 
101 GGAGCTGAAG CCAAAGGTGA AGAAGCTCAA GTACCACCAG TACATCCCCC 
151 CGGACCAGAA GCAGGACAGG GGGGCACCCC CCATGGACTC ATCCTACGCC 
201 AAGATCCTGC AGCAGCAGCA GCTCTTCCTC CAGCTGCAGA TCCTCAACCA 
251 GCAGCAGCAG CAGCACCACA ACTACCAGGC CATCCTGCCT GCCCCGCCAA 
301 AGTCAGCAGG CGAGGCCCTG GGAAGCAGCG GGACCCCCCC AGTACGCAGC 
351 CTCTCCACTA CCAATAGCAG CTCCAGCTCG GGCGCCCCTG GGCCCTGTGG 
401 GCTGGCACGT CAGAACAGCA CCTCACTGAC TGGCAAGCCG GGAGCCCTGC 
451 CGGCCAACCT GGACGACATG AAGGTGGCAG AGCTGAAGCA GGAGCTGAAG 
501 TTGCGATCAC TGCCTGTCTC GGGCACCAAA ACTGAGCTGA TTGAGCGCCT 
551 TCGAGCCTAT CAAGACCAAA TCAGCCCTGT GCCAGGAGCC CCCAAGGCCC 
601 CTGCCGCCAC CTCTATCCTG CACAAGGCTG GCGAGGTGGT GGTAGCCTTC 
651 CCAGCGGCCC GGCTGAGCAC GGGGCCAGCC CTGGTGGCAG CAGGCCTGGC 
701 TCCAGCTGAG GTGGTGGTGG CCACGGTGGC CAGCAGTGGG GTGGTGAAGT 
751 TTGGCAGCAC GGGCTCCACG CCCCCCGTGT CTCCCACCCC CTCGGAGCGC 
801 TCACTGCTCA GCACGGGCGA TGAAAACTCC ACCCCCGGGG ACACCTTTGG 
851 TGAGATGGTG ACATCACCTC TGACGCAGCT GACCCTGCAG GCCTCGCCAC 
901 TGCAGATCCT CGTGAAGGAG GAGGGCCCCC GGGCCGGGTC CTGTTGCCTG 
951 AGCCCTGGGG GGCGGGCGGA GCTAGAGGGG CGCGACAAGG ACCAGATGCT 
1001 GCAGGAGAAA GACAAGCAGA TCGAGGCGCT GACGCGCATG CTCCGGCAGA 
1051 AGCAGCAGCT GGTGGAGCGG CTCAAGCTGC AGCTGGAGCA GGAGAAGCGA 
1101 GCCCAGCAGC CCGCCCCCGC CCCCGCCCCC CTCGGCACCC CCGTGAAGCA 
1151 GGAGAACAGC TTCTCCAGCT GCCAGCTGAG CCAGCAGCCC CTGGGCCCCG 
1201 CTCACCCATT CAACCCCAGC CTGGCGGCCC CAGCCACCAA CCACATAGAC 
1251 CCTTGTGCTG TGGCCCCAGG GCCCCCGTCC GTGGTGGTGA AGCAGGAAGC 
1301 CTTGCAGCCT GAGCCCGAGC CGGTCCCCGC CCCCCAGTTG CTTCTGGGGC 
1351 CTCAGGGCCC CGGCCTCATC AAGGGGGTTG CACCTCCCAC CCTCATCACC 
1401 GACTCCACAG GGACCCACCT TGTCCTCACC GTGACCAATA AGAATGCAGA 
1451 CAGGCCTGGC CTGTCCAGTG GGAGCCCCCA GCAGCCCTCG TCCCAGCCTG 
1501 GCTCTCCAGC GCCTGCCCCC TCTGCCCAGA TGGACCTGGA GCACCCACTG 
1551 CAGCCCCTCT TTGGGACCCC CACTTCTCTG CTGAAGAAGG AACCACCTGG 
1601 CTATGAGGAA GCCATGAGCC AGCAGCCCAA ACAGCAGGAA AATGGTTCCT 
1651 CAAGCCAGCA GATGGACGAC CTGTTTGACA TTCTCATTCA GAGCGGAGAA 
1701 ATTTCAGCAG ATTTCAAGGA GCCGCCATCC CTGCCAGGGA AGGAGAAGCC 
1751 ATCCCCGAAG ACAGTCTGTG GGTCCCCCCT GGCAGCACAG CCATCACCTT 
1801 CTGCTGAGCT CCCCCAGGCT GCCCCACCTC CTCCAGGCTC ACCCTCCCTC 
1851 CCTGGACGCC TGGAGGACTT CCTGGAGAGC AGCACGGGGC TGCCCCTGCT 
1901 GACCAGTGGG CATGACGGGC CAGAGCCCCT TTCCCTCATT GACGACCTCC 
1951 ATAGCCAGAT GCTGAGCAGC ACTGCCATCC TGGACCACCC CCCGTCACCC 
2001 ATGGACACCT CGGAATTGCA CTTTGTTCCT GAGCCCAGCA GCACCATGGG 
2051 CCTGGACCTG GCTGATGGCC ACCTGGACAG CATGGACTGG CTGGAGCTGT 
2101 CGTCAGGTGG TCCCGTGCTG AGCCTAGCCC CCCTCAGCAC CACAGCCCCC 
2151 AGCCTCTTCT CCACAGACTT CCTCGATGGC CATGATTTGC AGCTGCACTG 
2201 GGATTCCTGC TTGTAGCTCT CTGGCTCAAG ACGGGGTGGG GAAGGGGCTG 
2251 GGAGCCAGGG TACTCCAATG CGTGGCTCTC CTGCGTGATT CGGCCTCTCC 
2301 ACATGGTTGT GAGTCTTGAC AATCACAGCC CCTGCTTTTT CCCTTCCCTG 
2351 GGAGGCTAGA ACAGAGAAGC CCTTACTCCT GGTTCAGTGC CACGCAGGGC 
2401 AGAGGAGAGC AGCTGTCAAG AAGCAGCCCT GGCTCTCACG CTGGGGTTTT 
2451 GGACACACGG TCAGGGTCAG GGCCATTTCA GCTTGACCTC CTTTTTTGAG 
2501 GTCAGGGGGC ACTGTCTGTC TGGCTACAAT TTGGCTAAGG TAGGTGAAGC 
2551 CTGGCCAGGC GGGAGGCTTC TCTTCTGACC CAGGGCTGAG ACAGGTTAAG 
2601 GGGTGAATCT CCTTCCTTTC TCTCCCTGCT TTGCTGTGAA GGGAGAAATT 
2651 AGCCTGGGCC TCTACCCCCT ATTCCCTGTG TCTGCCAACC CCAGGATCCC 
2701 AGGGCTCCCT GCCATTTTAG TGTCTTGGTG TAGTGTAACC ATTTAGTGGT 
2751 TGGTGGCAAC AATTTTATGT ACAGGTGTAT ATACCTCTAT ATTATATATC 
2801 GACATACATA TATATTTTTG GGGGGGGGCG GACAGGAGAT GGGTGCAACT 
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2851 CCCTCCCATC CTACTCTCAC AGAAGGGCCT GGATGCAAGG TTACCCTTGA 
2901 GCTGTGTGCC ACAGTCTGGT GCCCAGTCTG GCATGCAGCT ACCCAGGCCC 
2951 ACCCATCACG TGTGATTGAC ATGTAGGTAC CCTGCCACGG CCTATGCCCC 
3001 ACCTGCCCTG CTTCCTGGCT CCTTATCAGT GCCATGAGGG CAGAGGTGCT 
3051 ACCTGGCCTT CCTGCCAGGA GCTCTCCACC CACTCACATT CCGTCCCCGC 
3101 CGCCTCACTG CAGCCAGCGT GGCCCTAGGA CAGGAGGAGC TTCGGGCCCA 
3151 GCTTCACCCT GCGGTGGGGC TGAGGGGTGG CCATCTCCTG CCCTGGGGCC 
3201 ACTGGCTTCA CATTCTGGGC TGACTCATAG GGGAGTAGGG GTGGAGTCAC 
3251 CAAAACCAGT GCTGGGACAA AGATGGGGAA GGTGTGTGAA CTTTTTAAAA 
3301 TAAACACAAA AACACAGGAA AAAAAAAAAA AAAAAAAAAA AA7UVAAAAAA 
3351 AAG 


BLAST Results 


Entry HS1042K10 from database EMBL: 

Human DNA sequence from clone 1042K10 on chromosome 22ql3. 1-13 .2. 
Contains the ADSL gene for Adenylosuccinate lyase (EC 4.3.2.2, 
Adenylosuccinase, ASL) and 4 novel genes (one with probable rabGAP 
domains and Src homology domain 3) . Contains ESTs, STSs, GSSs and a 
putative CpG island. 

Score « 7997, p * O.Oe+00, identities = 1617/1645 
7 exons 


Medline entries 


No Medline entry 


Peptide information for frame 3 


ORF from 183 bp to 2213 bp; peptide length; 677 
Category: similarity to unknown protein 
Classification: unclassified 


1 MDSSYAKILQ QQQLFLQLQI LNQQQQQHHN YQAILPAPPK SAGEALGSSG 
51 TPPVRSLSTT NSSSSSGAPG PCGLARQNST SLTGKPGALP ANLDDMKVAE 
101 LKQELKLRSL PVSGTKTELI ERLRAYQDQI SPVPGAPKAP AATSIIJiKAG 
151 EWVAFPAAR LSTGPALVAA GLAPAEVWA TVASSGWKF GSTGSTPPVS 
201 PTPSERSLLS TGDENSTPGD TFGEMVTSPL TQLTLQASPL QILVKEEGPR 
251 AGSCCLSPGG RAELEGRDKD QMLQEKDKQI EALTRMLRQK QQLVERLKLQ 
301 LEQEKRAQQP APAPAPLGTP VKQENSFSSC QLSQQPLGPA HPFNPSLAAP 
351 ATNHIDPCAV APGPPSWVK QEALQPEPEP VPAPQLLLGP QGPGLIKGVA 
401 PPTLITDSTG THLVLTVTNK NADSPGLSSG SPQQPSSQPG SPAPAPSAQM 
451 DLEHPLQPLF GTPTSLLKKE PPGYEEAMSQ QPKQQENGSS SQQMDDLFDI 
501 LIQSGEISAD FKEPPSLPGK EKPSPKTVCG SPLAAQPSPS AELPQAAPPP 
551 PGSPSLPGRL EDFLESSTGL PLLTSGHDGP EPLSLIDDLH SQMLSSTAIL 
601 DHPPSPMDTS ELHFVPEPSS TMGLDLADGH LDSMDWLELS SGGPVLSLAP 
651 LSTTAPSLFS TDFLDGHDLQ LHWDSCL 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_22nl3, frame 3 

TREMBL:HS1042K10_6 gene: ••dJ1042K10 . 3"; product: "dJ1042K10.3 (novel 
protein)"; Human DNA sequence from clone 1042K10 on chromosome 
22ql3. 1-13.2 . Contains the ADSL gene for Adenylosuccinate lyase (EC 
4.3.2.2, Adenylosuccinase, ASL) and 4 novel genes (one with probable 
rabGAP domains and Src homology domain 3) . Contains ESTs, STSs, GSSs 
and a putative CpG island., N « 1, Score « 1285, P = 4.9e-131 

TREMBL:CEUK06A9_3 gene: "KOeAg.la"; Caenorhabditis elegans cosmid 
K06A9., N «= 2, Score « 149, P - 1.3e-09 

TREMBLNEW:SSI132828_1 product: "p210 protein"; Spermatozopsis similis 

mRNA for p2l0 protein, partial, N = 1, Score - 171, P *= 2.8e-09 


>TREMBL:HS1042K10_6 gene: "dJ1042K10. 3"; product: •'dJl042K10 . 3 (novel 
protein)"; Human DNA sequence from clone 1042K10 on chromosome 
22ql3.1-13.2. Contains the ADSL gene for Adenylosuccinate lyase (EC 
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4.3.2.2, Adenylosuccinase, ASL) and 4 novei genes (one with orobabl^ r;.hraD 


putative CpG island. 
Length = 243 


HSPs: 


Score = 1285 (192.8 bits), Expect » 4.9e-131, P = 4 9e-13l 
Identities = 243/243 (100%), Positives « 243/243 (100%) 

Query : 435 PSSQPGSPAPAPSAQMDLEHPLQPLreTPTSLLKKEPPGYEEAMSQQPKQQEl 


Sb3Ct: 61 ODLFDILIQSGErSADFKEPPSLPGKEKPSPKTVCGSPIJU.QPsSpQ^PP^ 120 

Query: 555 f^-PGRLEDFLESSTGLPLLTSGHDGPEPLSLIDDLHSQMLSSTAILDHPPSPMDTSELHF 614 

CK- . ^^P*^^^^^^ESSTGLPLLTSGHDGPEPLSLIDDLHSQMLSSTAILDH^ 

Sb3ct: 121 SLPGRLEDFLESSTGLPLLTSGHDGPEPLSLIDDLHsSksi^MLDHPP^^^^^ 180 


"'"^^ VP^IS^ .74 

VPEPSSTMGLDLADGHLDSMDWLELSSGGPVLSIAPLf?S 


Sbjct: 181 


Query: 675 SCL 677 
SCL 

Sbjct: 241 SCL 243 


Pedant information for DKFZphte33_22nl3, frame 3 
Report for DKFZphtes3_22nl3 . 3 


[LENGTH) 677 

tMW] 70743.01 

[pl] 4.93 
IHOMOL] 


Adenylosuccinate lyase (EC 4.3.2.2, Adenyrosucc!naL IsL. lirt^'^*^™?""'' '^"^ ^"^^ a*''^ ^^"^ 


IKWJ TRANSMEMBRANE 1 

fKW] LOW_COMPLEXITy 21.57 % 

IKW] COILED_COIL 4.58 % 

SEQ MDSSYAKILQQQQLFLQLQILNQQQQQHHNyQAILPAPPKSAGEALGSSGTPPVRSLSTT 

XXXXXXXXXXXXXXXXXXX vvvvi 

co?Ls 

MEM • - ^ . i ' i i 

IIg "^^^^^^^^^^^^Q*»stsltgkpgalpanlddmkvaelkqelklrslpvsgtkteli 

PRO 
COILS 
MEM 


cccccccccccceeecccccccccccccccccccchhhhhhhhhhhhhhcccccchhhhh 


seq 

SEG 
PRD 


ERLRAYQDQISPVPGAPKAPAATSILHKAGEWVAFPAARLSTGPALVAAGLAPAEVVVA 

COILS 

MEM 

• MMMMMMMMMMMMMMMMMMMMMM 

SEQ tvassgwkfgstgstppvsptpsersllstgdenstpgdtfgemvtspltqltlqaspl 

SEG XXXXXXXX. .XXXXXXXXXXXXXX ii't'i.lVLli.yASPL 

coils ^^^^^^"["^^^^^^^^ 

mem m. i ' 


SEQ 

SEG 

PRD 

COILS 

MEM 

SEQ 


qilvkeegpragscclspcgraelegrdkdqmlqekdkqiealtrmlrqkqqlverlklq 

eeeeeccccccccccccccccccccccchhhhhhhhhhAhhh^^ 

cccccccccccccccccccccccc 


leqekraqqpapapaplgtpvkqensfsscqlsqqplgpahpfnpslaapa^ 


TNHIDPCAV 


731 


wo 01/12659 


PCT/IBOO/01496 


SEG xxxxxxxxxx 

PRD hhhhhhhhhcccccccccccccccccceeeeecccccccccccccceeeccccccccccc 

COILS ccccccc 

MEM 

SEQ APGPPSVVVKQEALQPEPEPVPAPQLLLGPQGPGLIKGVAPPTLITDSTGTHLVLTVTNK 

SEG xxxxxxxxxxxx 

PRD cccccceeeeeccccccccccccceeeccccccceeeeecccccccccccceeeeeeecc 

COILS 

MEM /. , . , 

SEQ NADSPGLSSGSPQQPSSQPGSPAPAPSAQMDLEHPLQPLFGTPTSLLKKEPPGYEEAMSQ 

SEG . . .xxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccchhhhhhhhhcccccccccccccccccccccccc 

COILS 

MEM 

SEQ QPKQQENGSSSQQMDDLFDILIQSGEISADFKEPPSLPGKEKPSPKTVCGSPLAAQPSPS 

SEG , . . , .xxxxxxxxxxx 

PRD ccccccccccccchhhhhhhhhcccccccccccccccccccccccccccccccccccccc 

COILS 

MEM . \ , . . 

SEQ AELPQAAPPPPGSPSLPGRLEDFLESSTGLPLLTSGHDGPEPLSLIDDLHSQMLSSTAIL 

SEG xxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccccccccccccchhhhhhhhcccceee 

COILS 

MEM ..,[.....,.,,. 

SEQ DHPPSPMDTSELHFVPEPSSTMGLDLADGHLDSMDWLELSSGGPVLSLAPLSTTAPSLFS 

SEG 

PRD ccccccccccccccccccccccccccccccccccceeeeccccceeeeeecccccccccc 

COILS 

MEI4 

SEQ TDFLDGHDLQLHWDSCL 

SEG 

PRD cccccccceeecccccc 

COILS 

MEM 

(No Prosite data available for DKFZphtes3_22nl3 .3) 
(No Pfam data available for DKFZphtes3_22nl3.3) 
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DKFZphtes3_23111 


group: intracellular transport and trafficking 

DKF2phtes3_23111 encodes a novel 186 amino acid protein nearly identical to mouse AOP- 
ribosylation-like factor homolog 6 (Arl6) , 

Protein secretion through the endoplasmic reticulum and the Golgi vesicular trafficking system 
is initiated by the binding of ADP-ribosylation factors 

(ARFs) to donor membranes, leading to recruitment of cocatomer, bud formation, and eventual 
vesicle release. ARFs are approximately 20-kDa GTPases that are active with bound GTP and 
inactive -with GDP bound. The novel protein contains an ATP/GTP-binding site motif A (P-loop) 
and seems to be a novel ARF. It seems to have an important role in vesicular transport and 
vesicular trafficking. 

The new protein can find application in modulating vesicle transport and trafficking in cells. 


nearly identical to mouse Arl6, ADP-ribosylation-like factor homolog 
start at Bp 15 matches kozak consensus ANNatgG 
Sequenced by LMD 
Locus : unknown 
Insert length: 717 bp 

Poly A stretch at pos. 689, no polyadenylation signal found 

1 ATTTGAATCA CATTATGGGA TTGCTAGACA GACTTTCAGT CTTGCTTGGC 
51 CTGAAGAAGA AGGAGGTTCA TGTTTTGTGC CTTGGGCTAG ATAATAGTGG 
101 CAAAACGACG ATCATTAACA AACTTAAACC TTCAAATGCT CAATCTCAAA 
151 ATATCCTTCC AACAATAGGA TTCAGCATAG AGAAATTCAA ATCATCCAGT 
201 TTGTCATTTA CAGTGTTTGA CATGTCAGGT CAAGGAAGAT ACAGAAATCT 
251 CTGGGAACAC TATTATAAAG AAGGCCAAGC TATTATTTTT GTCATTGATA 
301 GTAGTGATAG ATTAAGAATG GTTGTGGCCA AAGAAGAACT CGATACTCTT 
351 CTGAATCATC CAGATATTAA ACACCGTCGA ATTCCAATCT TATTCTTTGC 
401 AAATAAAATG GATCTTAGAG ATGCAGTGAC ATCTGTAAAA GTGTCTCAGT 
451 TGCTGTGTTT AGAGAACATC AAAGATAAAC CCTGGCATAT TTGTGCTAGT 
501 GATGCCATAA AAGGAGAAGG CTTGCAAGAA GGTGTAGACT GGCTTCAAGA 
551 TCAGATCCAG ACTGTGAAGA CATGAAAAGA TAATAGTTGG AAACCTCAGC 
601 AATTTTCAAT TCAAGGAATC TATCTAAGAC AAATAGAATA CATTTTGTAA 
651 AAGATGTTTA TGCATCAAAA AATATAATTT TCTGCTTGCA AAAAAAAAAA 
701 AAAAAAAAAA AAAAAAG 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 3 


ORF from 15 bp to 572 bp; peptide length: 186 
Category: strong similarity to known protein 
Classification: Intacellular transport and traffic ' 
Prosite motifs: ATP GTP A (24-32) 


1 MGLLDRLSVL LGLKKKEVHV LCLGLDNSGK TTIINKLKPS NAQSQNILPT 
51 IGFSIEKFKS SSLSFTVFDM SGQGRYRNLW EHYYKEGQAI IFVIDSSDRL 
101 RMWAKEELD TLLNHPDIKH RRIPILFFAN KMDLRDAVTS VKVSQLLCLE 
151 NIKDKPWHIC ASDAIKGEGL QEGVDWLQDQ IQTVKT 


BLASTP hits 

No BLASTP hits available 
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Alert BLASTP hits for DKFZphtes3_23111, frame 3 

TREMBL:AF031903_1 gene: "Arl6"; product: "ADP-ribosylation-like factor 
homolog ARL6"; Mus rausculus ADP-ribosylation-like factor homolog ARL6 
(Arl6) mRNA, complete cds., N « 1, Score = 923, P « l.le-92 

TR£MBt.:CEC38D4 5 gene: "03804. 8"; Caenorhabditis elegans cosmid C38D4, 
N = I, Score =~418, P « 3.6e-39 

PIR:S66337 ADP-ribosylation factor 1 - Chlamydomonas reinhardtii, N = 
1, Score = 373, P = 2.1e-34 

SWISSPROf:ARFl_CHLRE ADP-RIBOSYLATION FACTOR 1., N » 1, Score *= 372, P 
« 2.7e-34 


>TREMBL:AF031903_1 gene: *Arl6"; product: "ADP-ribosylation-like factor 

homolog ARL6"; Mus musculus ADP-ribosylation-like factor homolog ARL6 
(Arl6) mRNA, complete cds. 
Length - 186 

HSPs: 


Score = 923 (138.5 bits). Expect = l.le-92, P = l.le-92 
Identities = 178/186 (95%), Positives = 184/186 (98%) 


Query: 

1 

MGLLDRLSVLLGLKKKEVHVLCLGLDNSGKTTIINKLKPSNAQSQNILPTIGFSIEKFKS 

60 



MGLLDRLS LLGLKKKEVHVLCLGLDNSGKTTIINKLKPSNAQSQ+I +PTIGFSIEKFKS 


Sbjct: 

1 

MGLLDRLSGLLGLKKKEVHVLCLGLDNSGKTTIINKLKPSNAQSQDIVPTIGFSIEKFKS 

60 

Query: 

61 

SSLSFTVFDMSGQGRYRNLWEHYYKEGQAII FVI DSSDRLRMWAKEELDTLLNHPDIKH 

120 



SSLS FTVFDMSGQGRYRNLWEH YYK+GQAI I FV I DSS D+LRMWAKEEL DTLLNHPDI KH 


Sbjct: 

61 

SSLSFTVFDMSGQGRYRNLWEHYYKDGQAI I FVI DSSDKLRMWAKEELDTLLNHPDIKH 

120 

Query: 

121 

RRIPILFFANKMDLRDAVTSVKVSQLLCLENIKDKPWHrCASDAIKGEGLQEGVDWLQDQ 180 



RRIPILFFANKMDLRD+VTSVKVSQLLCLE+IKDKPWHICASDAIKGEGLQEGVDWLQDQ 


Sbjct: 

121 

RRI PILFFANKMDLRDSVTSVKVSQLLCLESXKDKPWHICASDAIKGEGLQEGVDWLQDQ 180 

Query: 

181 

IQTVKT 186 




IQ VKT 


Sbjct: 

181 

IQAVKT 186 



Pedant information for DKFZphtes3_23111, frame 3 


Report for DKFZphtes3_23111 .3 


(LENGTH] 186 
(MW) 21097.69 
[pi] 8.72 

IHOMOL] TREMBL:AF031903_1 gene: "Arl6"; product: "ADP-ribosylation-like factor homolog 

ARL6"; Mus musculus ADP-ribosylation-like factor homolog ARL6 (Arl6) mRNA, complete cds. 4e-94 


(FUNCATJ 
(FUNCATJ 
IFUNCAT) 
le-36 
(FUNCAT] 
YDL137W] 
[FUNCAT] 
palmitylat 
(FUNCAT] 
IFUNCAT] 
IFUNCAT) 
IFUNCAT] 
[FUNCAT] 
[FUNCAT) 

IS. 

(FUNCAT] 

[FUNCAT] 

(FUNCAT) 

(FUNCATJ 

2e-04 

IFUNCAT) 

4e-04 

[BLOCKS] 

(BLOCKS) 

I BLOCKS) 


30.08 organization of golgi (S. cerevisiae, YDL192w| le-36 

06.10 assembly of protein complexes [S. cerevisiae, YDL192wl le-36 

08.07 vesicular transport (golgi network, etc.) (S. cerevisiae, YDL192w] 


2e-36 


30.09 organization of intracellular transport vesicles 


(S- cerevisiae. 


06.07 protein modification (glycolsylation, acylation, myristylation. 


IS. 
[S- 
[S. 


cerevisiae, YBR164c] 2e-32 
cerevisiae, YBR164cJ 2e-32 
cerevisiae, YMR138w] 4e-19 
[S. cerevisiae, YMR138wJ 4e-19 
jannaschii, MJ1339) 2e-05 
[S, cerevisiae, YHROOSc] 


4e-05 


farnesylation and processing) 

30.03 organization of cytoplasm 
03.22 cell cycle control and mitosis 

30.04 organization of cytoskeleton 
r general function prediction [M. 
30.02 organization of plasma membrane 

03.07 pheromone response, mating-type determination, sex-specific proteins 
cerevisiae, YHROOSc) 4e-05 

10.05.07 g-proteins [S. cerevisiae, YHROOSc) 4e-05 

08.13 vacuolar transport (S. cerevisiae, YKR014c] 2e-04 

08.19 cellular import [S. cerevisiae, yKR014c) 2e-04 

06.04 protein targeting, sorting and translocation (S. cerevisiae, YKR014c) 
03.04 budding, cell polarity and filament formation [S. cerevisiae, YFLOOSw] 
BL01288C 

BL01020C SARI family proteins 

BL01019C ADP-ribosylation factors family proteins 
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(BLOCKS] BL01019B ADP-ribosylation factors family proteins 

(BLOCKS] BL01019A ADP-ribosylation factors family proteins 

dlas3_2 3.29.1.4.12 Transducin (alpha subunit), insertion domai 26-45 

(SCOPJ dlmhl 3-29.1.4.2 Racl [Human (Homo sapiens) 2e-46 

[SCOP] d5p21 3.29.1-4.1 cH-p21 Ras protein (human (Homo sapiens) 5e-37 

[SCOPl dlhura_ 3.29.1.4.8 ADP-ribosylation factor 1 (ARFl) (human (Horn 4e-61 

(SCOPJ dla2kc_ 3.29.1,4.5 Ran Nuclear transport factor-2 (NTF2) (Do 4e-33 

(PIRKW) glycoprotein 2e-33 

(PIRKW] monomer 3e-31 

(PIRKW] P-loop 2e-35 

(PIRKW] lipoprotein 2e-33 

[PIRKW] GTP binding 2e-35 

(SUPFAMJ ADP-ribosylation factor 2e-35 

( PROS ITE ] ' AT P_GT P_A 1 

[PFAM] ADP-ribosylation factors (Arf family) (contains ATP/GTP binding P-loop) 

(KW] Alpha_Beta 

[KW] 3D 

(KWJ LOW_C0MPLEXITY 5.91 % 


SEQ MGLLDRLSVLLGLKKKEVHVLCLGLDNSGKTTIINKLKPSNAQSQNILPTIGFSIEKFKS 
SEG . . xxxxxxxxxxx 

IhurA CCCCEEEEEETTTTCHHHHHHHHCCCCEEEE--EEETTEEEE^ 


SEQ SSLSFTVFDMSGQGRYRNLWEHYYKEGQAIIFVIDSSDRLRMVVAKEELDTLLNHPDIKH 

SEG 

IhurA TTEEEEEEETTTTTTTCCCHHHHHHCEEEEEEEEETTriTHra 

SEQ RRIPILFFANKMDLRDAVTSVKVSQLLCLENIKDKPWHICASDAIKGEGLQEGVDWLQDQ 


IhurA TTTEEEEEEETTTTTTTCCHHHHHHHHCGGGTTTTCEEEBECBTTTTBTHHHHHHHHHHH 

SEQ IQTVKT 

SEG 

IhurA HHHHC . 


Prosite for DKFZphtes3_23111.3 
PS00017 24->32 ATP_GTP_A PDOC00017 


Pfam for DKFZphtes3_23111 . 3 


HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 


ADP-ribosylation factors (Arf family) (contains ATP/GTP binding P-loop) 

* GMgW f s I Fr )cMWGl WN KEMRI LMLGLDNAGKTTI L YMLKl gE . . I VTT I 
MG++ ++ ++GL +KE+++L LGLDK+GKTTI+++LK+ ++ 
1 -MGLLDRLSVLLGLKKKEVHVLCLGLDNSGKTTIINKLKPSNAQSQNIL 48 

PTIGFNVETVeYKNIKFNVWDVGGQdsIRPYWRHYYpNTDGIIWVVDSaD 
PTIGF +E+ + ++r+V+D GQ + R +W HYY + ++II+V+DS+D 
49 PTIGFSIEKFKSSSLSFTVFDMSGQGRYRNLWEHYYKEGQAIIFVIDSSO 98 

RDRMeEaKqELHaMLNEEEL. . rDAPlLIFANKQDLPgAMSesEIREaLG 
R RM AK+EL+ +LN+ ++ R+ P+L FANK DL++A+++ +++ +L 
99 RLRMWAKEELDTLLNHPDIKHRRIPILFFANKMDLRDAVTSVKVSQLLC 148 

LHelRCnRPWYIQMCCAVtGEGLYEGMDWLSNYInkRkK* 
L++I+ + PW+I +++A++GEGL+EG DWL ++I+ K 
149 LENIK-DKPWHICASDAIKGEGLQEGVDWLQDQIQTVKT 
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DKF2phtes3_23nl9 


group: testes derived 

DKF2phtes3_23nl9 encodes a novel 387 amino acid protein with similarity to rat protein kinase 
C-interacting RBCC protein 1. 

The novel protein contains not the RING-B box-coiled coil (RBCC) motif of RBCC protein 1, and 

thus is not a member of this subgroup of RING finger proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 
genes. 


similarity to rat protein kinase C-interacting RBCC protein 1 

start at Bp 209 matches kozak consensus PyNNatgG 
similarity to of C-terminal part to N-terminus of RBCKl 

Sequenced by LMU 

Locus : unknown 

Insert length: 1579 bp 

Poly A stretch at pos, 1535, polyadenylation signal at pos. 1515 


1 CGGAGACCCT CGGGCCGTGT CCATTTGTGG GCAAAGCCAG CGGGGCAGGC 
51 TTGGCCAGAG TGCACCACTC GGCGCCGTCC CACGCCCGAC GCTCTGGGCG 
101 CGCCCGGAAC CCCAGGTTCG CGGGCCGTGT TTCCGACCGG CGGAGGGGGC 
151 TCAGCGGCCC GATCCCACGG AAGCGCGCTC GGAGGGGTGG GACCCGGCCG 
201 GACCGGAGAT GGCGCCGCCA GCGGGCGGGG CGGCGGCGGC GGCCTCGGAC 
251 TTGGGCTCCG CCGCAGTGCT CTTGGCTGTG CACGCCGCGG TGAGGCCGCT 
301 GGGCGCCGGG CCAGACGCCG AGGCACAGCT GCGGAGGCTG CAGCTGAGCG 
351 CGGACCCTGA GAGGCCTGGG CGCTTCCGGC TGGAGCTGCT GGGCGCGGGA 
401 CCTGGGGCGG TTAATTTGGA GTGGCCCCTG GAGTCAGTTT CCTACACCAT 
451 CCGAGGCCCC ACCCAGCACG AGCTACAGCC TCCACCAGGA GGGCCTGGAA 
501 CCCTCAGCCT GCACTTCCTC AACCCTCAGG AAGCTCAGCG GTGGGCAGTC 
551 CTAGTCCGAG GTGCCACCGT GGAAGGACAG AATGGCAGCA AGAGCAACTC 
601 ACCACCAGCC TTGGGCCCAG AAGCATGCCC TGTCTCCCTG CCCAGTCCCC 
651 CGGAAGCCTC CACACTCAAG GGCCCTCCAC CTGAGGCAGA TCTTCCTAGG 
701 AGCCCTGGAA ACTTGACGGA GAGAGAAGAG CTGGCAGGGA GCCTGGCCCG 
751 GGCTATTGCA GGTGGAGACG AGAAGGGGGC AGCCCAAGTG GCAGCCGTCC 
801 TGGCCCAGCA TCGTGTGGCC CTGAGTGTTC AGCTTCAGGA GGCCTGCTTC 
851 CCACCTGGCC CCATCAGGCT GCAGGTCACA CTTGAAGACG CTGCCTCTGC 
901 CGCATCCGCC GCGTCCTCTG CACACGTTGC CCTGCAGGTC CACCCCCACT 
951 GCACTGTTGC AGCTCTCCAG GAGCAGGTGT TCTCAGAGCT CGGTTTCCCG 
1001 CCAGCCGTGC AACGCTGGGT CATCGGACGG TGCCTGTGTG TGCCTGAGCG 
1051 CAGCCTTGCC TCTTACGGGG TTCGGCAGGA TGGGGACCCT GCTTTCCTCT 
1101 ACTTGCTGTC AGCTCCTCGA GAAGCCCCAG CCACAGGACC TAGCCCTCAG 
1151 CACCCCCAGA AGATGGACGG GGAACTTGGA CGCTTGTTTC CCCCATCATT 
1201 GGGGCTACCC CCAGGCCCCC AGCCAGCTGC CTCCAGCCTG CCCAGTCCAC 
1251 TCCAGCCCAG CTGGTCCTGT CCTTCCTGCA CCTTCATCAA TGCCCCAGAC 
1301 CGCCCTGGCT GTGAGATGTG TAGCACCCAG AGGCCCTGCA CTTGGGACCC 
1351 CCTTGCTGCA GCTTCCACCT AGCAGCCACC AGAGGTTACA AGGGGAGAGT 
1401 GGCCCTTCCC TCACAAGTCC GACATCTCCA GGCCCCCACT GAACTCCGGG 
1451 GACCTCTACT GACTGCTTGC TGGGACAGTC ACCAGGGTTG GGGGGAAGGG 
1501 CCACAAAATG AAACCATTAA AGACCCTTAA GAGCCAAAAA AAAAAAAAAA 
1551 AAAAAAAAAA AAAAAAAAAA AAAAAAAAG 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 2 


ORF from 209 bp to 1369 bp; peptide length: 387 
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category: similarity to known protein 
Classification: Cell signaling/conununication 


1 MAPPAGGAAA AASDLGSAAV LLAVHAAVRP LGAGPDAEAQ LRRLOLSADP 
51 ERPGRFRLEL LGAGPGAVNL EWPLESVSYT IRGPTQHELQ PPPGGPGTLS 
101 LHFLNPQEAQ RWAVLVRGAT VEGQNGSKSN SPPALGPEAC PVSLPSPPEA 
151 STLKGPPPEA DLPRSPGNLT EREELAGSLA RAIAGGDEKG AAQVAAVLAQ 
201 HRVALSVQLQ EACFPPGPIR LQVTLEDAAS AASAASSAHV ALQVHPHCTV 
251 AALQEQVFSE LGFPPAVQRW VIGRCLCVPE RSLASYGVRQ DGDPAFLYLL 
301 SAPREAPATG PSPQHPQKMD GELGRLFPPS LGLPPGPQPA ASSLPSPLQP 
351 SWSCPSCTFI NAPDRPGCEM CSTQRPCTWD PLAAAST 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_23nl9, frame 2 

PIR:JC5983 protein kinase C-interacting RBCC protein 1 - rat, N = 1, 
Score « 353» P = 2.8e-32 

TREMBL:AB011369_1 product: "RBCK2-; Rattus norvegicus mRNA for RBCK2, 
complete cds.; K = l. Score = 353, P = 2.8e-32 

TREMBL:U67322_1 gene: "XAP4"; product: "HBV associated factor"; Human 
HBV associated factor {XAP4) mRNA, complete cds., N = 1, Score = 286, P 
= 8.5e-25 

TREMBLNEW:Ari24 663_l product: "UbcM4 interacting protein 28"; Mus 
musculus UbcM4 interacting protein 28 mRNA, complete cds., N = 1, Score 
= 367, p = 9.3e-34 


>TREMBLNEW:AF124663_1 product: "UbcM4 interacting protein 28"; Mus musculus 
ObcM4 interacting protein 28 mRNA, complete cds. 
Length =498 

HSPs: 

Score =367 (55.1 bits). Expect - 9.3e-34, P - 9.3e-34 

Identities = 95/212 (44%), Positives = 129/212 (60%) 


Query: 175 LAGSLARAIAGGDEKGAAQVAAVLAQHRVALSVQLQEACFPPGPIRLQVTLEDAASAASA 234 

+A SLARA+AGGDE+ A + A LA+ RV L VQ++ P IRL V++EDA 
Sbjct: 1 MALSLARAVAGGDEQAAIKYATWLAEQRVPLRVQVKPEVSPTQDIRLCVSVEDAYM 56 

Query: 235 ASSAHVALQVHPHCTVAALQEQVFSELGFPPAVQRWVIGRCLCVPERSLASYGVRQDGDP 294 

+ + L V P TVA+L++ VF + GFPP++Q+WV+G+ L + +L S+G+R++GD 
Sbjct: 57 -HTVTIWLTVRPDMTVASLKDMVFLDYGFPPSLQQWWGQRLARDQETLHSHGIRRNGDG 115 

Query: 295 AFLYLLSAPREAPATGPSPQHPQK MDGELG— RLFPPSLG-LPPG-PQPAASSLP 345 

A+LYLLSA T +PQ Q+ M +LG L S G L P P+P + P 

Sbjct: 116 AYLYLLSARN TSLNPQELQRQRQLRMLEDLGFKDLTLQSRGPLEPVLPKPRTNQEP 171 

Query: 346 SPLQP—SWSCPSCTFINAPDRPGCEMCSTQRPCTW 379 

+P P W CP CTFIN P RPGCEMC RP T+ 
Sbjct: 172 GQPDAAPESPPVGWQCPGCTFINKPTRPGCEMCCRARPETY 212 

Pedant information for DKFZphtes3_23nl9, frame 2 


Report for DKF2phtes3_23nl9.2 

(LENGTH J 387 

(MW) 39949.29 

Cpl] 5-53 

fhOMOL) TREMBLNEW:AF124663_,1 product: "UbcM4 interacting protein 28"; Mus musculus 

ObcM4 interacting protein 28 mRNA, complete cds. le-22 

[BLOCKS] BL00578B 

[KW] Alpha Beta 

[KW) LOW_COMPLEXITY 17.57 % 


SEQ MAPPAGGAAAAASDLGSAAVLLAVHAAVRPLGAGPDAEAQLRRLQLSADPERPGRFRLEL 

SEG . xxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccchhhhhhhhhhhhhhhhhhhhhccccccccchhhhhhhhhhhccccccccceeee 
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SEQ LGAGPGAVNLEWPLESVSYTIRGPTQHELQPPPGGPGTLSLHFLNPQEAQRWAVLVRGAT 

SEG 

PRD ccccccceeecccceeeeeeccccccccccccccccceeeeeecccchhhhhheeeecce 

SEQ VEGQNGSKSNSPPALGPEACPVSLPSPPEASTLKGPPPEADLPRSPGNLTEREELAGSLA 

SEG 

PRD eecccccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhh 

SEQ RAIAGGDEKGAAQVAAVLAQHRVALSVQLQEACFPPGPIRLQVTLEDAASAASAASSAHV 

SEG xxxxxxxxxxx . - 

PRD hhhhcccchhhhhhhhhhhhhhhhhhccccccccccccceeeccchhhhhhhhhhhhhee 

SEQ ALQVHPHCTVAALQEQVFSELGFPPAVQRWVIGRCLCVPERSLASYGVRQDGDPAFLYLL 

SEG 

PRD eeeccccchhhhhhhhhhhhccccccchhhhhhhhhhhhccccccccccccccceeeeec 

SEQ SAPREAPATGPSPQHPQKMDGELGRLFPPSLGLPPGPQPAASSLPSPLQPSWSCPSCTFI 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx . . . 

PRD cccccccccchhhhhhhhhhhhccccccccccccccccccccccccccccccccccceee 

SEQ NAPDRPGCEMCSTQRPCTWDPLAAAST 

SEG 

PRD ccccccccccccccccccccceeeccc 

{No Prosite data available for DKFZphtes3_23nl9.2) 
{No Pfam data available for DKFZphtes3 23nl9.2) 


similarity to rat protein kinase C-interacting RBCC protein 1 

start at Bp 209 matches kozak consensus PyNNatgG 
similarity to of C-terminal part to N-terminus of RBCKl 

Sequenced by LMU 

Locus : unknovm 

Insert length: 1579 bp 

Poly A stretch at pos. 1535, polyadenylation signal at pos. 1515 

1 CGGAGACCCT CGGGCCGTGT CCATTTGTGG GCAAAGCCAG CGGGGCAGGC 
51 TTGGCCAGAG TGCACCACTC GGCGCCGTCC CAGGCCCGAC GCTCTGGGCG 
101 CGCCCGGAAC CCCAGGTTCG CGGGCCGTGT TTCCGACCGG CGGAGGGGGC 
151 TCAGCGGCCC GATCCCACGG TVAGCGCGCTC GGAGGGGTGG GACCCGGCCG 
201 GACCGGAGAT GGCGCCGCCA GCGGGCGGGG CGGCGGCGGC GGCCTCGGAC 
251 TTGGGCTCCG CCGCAGTGCT CTTGGCTGTG CACGCCGCGG TGAGGCCGCT 
301 GGGCGCCGGG CCAGACGCCG AGGCACAGCT GCGGAGGCTG CAGCT6AGCG 
351 CGGACCCTGA GAGGCCTGGG CGCTTCCGGC TGGAGCTGCT GGGCGCGGGA 
401 CCTGGGGCGG TTAATTTGGA GTGGCCCCTG GAGTCAGTTT CCTACACCAT 
451 CCGAGGCCCC ACCCAGCACG AGCTACAGCC TCCACCAGGA GGGCCTGGAA 
501 CCCTCAGCCT GCACTTCCTC AACCCTCAGG AAGCTCAGCG GTGGGCAGTC 
551 CTAGTCCGAG GTGCCACCGT GGAAGGACAG AATGGCAGCA AGAGCAACTC 
601 ACCACCAGCC TTGGGCCCAG AAGCATGCCC TGTCTCCCTG CCCAGTCCCC 
651 CGGAAGCCTC CACACTCAAG GGCCCTCCAC (TTGAGGCAGA TCTTCCTAGG 
701 AGCCCTGGAA ACTTGACGGA GAGAGAAGAG CTGGCAGGGA GCCTGGCCCG 
751 GGCTATTGCA GGTGGAGACG AGAAGGGGGC AGCCCAAGTG GCAGCCGTCC 
801 TGGCCCAGCA TCGTGTGGCC CTGAGTGTTC AGCTTCAGGA GGCCTGCTTC 
851 CCACCTGGCC CCATCAGGCT GCAGGTCACA CTTGAAGACG CTGCCTCTGC 
901 CGCATCCGCC GCGTCCTCTG CACACGTTGC CCTGCAGGTC CACCCCCACT 
951 GCACTGTTGC AGCTCTCCAG GAGCAGGTGT TCTCAGAGCT CGGTTTCCCG 
1001 CCAGCCGTGC AACGCTGGGT CATCGGACGG TGCCTGTGTG TGCCTGAGCG 
1051 CAGCCTTGCC TCTTACGGGG TTCGGCAGGA TGGGGACCCT GCTTTCCTCT 
1101 ACTTGCTGTC AGCTCCTCGA GAAGCCCCAG CCACAGGACC TAGCCCTCAG 
1151 CACCCCCAGA AGATGGACGG GGAACTTGGA CGCTTGTTTC CCCCATCATT 
1201 GGGGCTACCC CCAGGCCCCC AGCCAGCTGC CTCCAGCCTG CCCAGTCCAC 
1251 TCCAGCCCAG CTGGTCCTGT CCTTCCTGCA CCTTCATCAA TGCCCCAGAC 
1301 CGCCCTGGCT GTGAGATGTG TAGCACCCAG AGGCCCTGCA CTTGGGACCC 
1351 CCTTGCTGCA GCTTCCACCT AGCAGCCACC AGAGGTTACA AGGGGAGAGT 
1401 GGCCCTTCCC TCACAAGTCC GACATCTCCA GGCCCCCACT GAACTCCGGG 
14 51 GACCTCTACT GACTGCTTGC TGGGACAGTC ACCAGGGTTG GGGGGAAGGG 
1501 CCACAAAATG AAACCATTAA AGACCCTTAA GAGCCAAAAA AAAAAAAAAA 
1551 AAAAAAAAAA AAAAAAAAAA AAAAAAAAG 
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BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 2 


ORF from 209 bp to 1369 bp; peptide length: 387 
Category: similarity to known protein 
Classification; Cell signaling/communication 


1 MAPPAGGAAA AASDLGSAAV LLAVHAAVRP LGAGPDAEAQ LRRLQLSADP 

51 ERPGRFRLEL LGAGPGAVNL EWPLESVSYT IRGPTQHELQ PPPGGPGTLS 

101 LHFLNPQEAQ RWAVLVRGAT VEGQNGSKSN SPPALGPEAC PVSLPSPPEA 

151 STLKGPPPEA DLPRSPGNLT EEIEELAGSLA RAIAGGDEKG AAQVAAVLAQ 

201 HRVAL5VQLQ EACFPPGPIR LQVTLEDAAS AASAASSAHV ALQVHPHCTV 

251 AALQEQVFSE LGFPPAVQRW VIGRCLCVPE RSLASYGVRQ DGDPAFLYLL 

301 SAPREAPATG PSPQHPQKMD GELGRLFPPS LGLPPGPQPA ASSLPSPLQP 

351 SWSCPSCTFI NAPDRPGCEM CSTQRPCTWD PLAAAST 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_23nl9, frame 2 

PIR:JC5983 protein kinase C-interacting RBCC protein 1 - rat, N = 1, 
Score = 353, P = 2.8e-32 

TREMBL:AB011369_1 product: •'RBCK2"; Rattus norvegicus mRNA for RBCK2, 
complete cds., N - 1, Score - 353, P » 2.8e-32 

TREMBL:U67322_1 gene: "XAP4"; product: -HBV associated factor"; Human 
HBV associated factor (XAP4) mRNA, complete cds., N = 1, Score = 286, P 
= 8.5e-25 

TREMBLNEW:AF124663_1 product: -UbcM4 interacting protein 28"; Mus 
musculus UbcM4 interacting protein 28 mRNA, complete cds., N = 1, Score 
= 367, P = 9.3e-34 


>TREMBLNEW:AF124663_1 product: -UbcM4 interacting protein 28"; Mus musculus 
UbcM4 interacting protein 28 mRNA, complete cds. 
Length « 498 

HSPs: 

Score = 367 (55.1 bits), Expect = 9.3e-34, P = 9.3e-34 

Identities = 95/212 (44%), Positives = 129/212 (60%) 

Query: 175 LAGSLARAIAGGDEKGAAQVAAVLAQHRVALSVQLQEACFPPGPIRLQVTLEDAASAASA 234 

+A SLARA+AGGDE+ A + A LA+ RV L VQ++ P IRL V++EDA 
Sbjct: 1 MALSLARAVAGGDEQAAIKYATWLAEQRVPLRVQVKPEVSPTQDIRLCVSVEDAYM 56 

Query: 235 assahvalqvhphctvaalqeqvfselgfppavqrwvigrclcvperslasygvrqdgdp 294 

+ + L V P TVA+L++ VF + GFPP++Q+WV+G+ L + +L S+G+R++GD 
Sbjct: 57 -HTVTIWLTVRPDMTVASLKDMVFLDYGFPPSLQQWWGQRLARDQETLHSHGIRRNGDG 115 

Query: 295 AFLYLLSAPREAPATGPSPQHPQK MDGELG— RLFPPSLG-LPPG-PQPAASSLP 345 

A+LYLLSA T +PQ Q+ N +LG L S G L P P+P + P 

Sbjct: 116 AYLYLLSARN TSLNPQELQRQRQLRMLEDLGFKDLTLQSRGPLEPVLPKPRTNQEP 171 

Query: 34 6 SPLQP — SWSCPSCTFINAPDRPGCEMCSTQRPCTW 379 

+P P W CP CTFIN P RPGCEMC RP T4- 
Sbjct: 172 GQPDAAPESPPVGWQCPGCTFINKPTRPGCEMCCRARPETY 212 


Pedant information for DKF2phtes3_23nl9, frame 2 
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Report for DKFZphtes3_23nl9 .2 

(LENGTH] 387 

IMWl 39949.29 

(pi) 5.53 

(HOMOLJ TREMBLNEW:AF124663_1 product: "UbcM4 interacting protein 28"; Mus musculus 

UbcM4 interacting protein 28 ooRNA, complete cds. le-22 
[BLOCKS] BL00578B 
[KW] Alpha_Beta 

[KW] LOWCOMPLEXITY 17.57 % 

SEQ MAPPAGGAAAAASDLGSAAVLLAVHAAVRPLGAGPDAEAQLRRLQLSADPERPGRFRLEL 
SEG . xxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRO cccccchhhhhhhhhlihhlihhhhhhhhccccccccchhhhhhhhhhhccccccccceeee 

SEQ LGAGPGAVNLEWPLESVSYTIRGPTQHELQPPPGGPGTLSLHFLNPQEAQRWAVLVRGAT 
SEG 

PRD ccccccceeecccceeeeeeccccccccccccccccceeeeeecccchhhhhheeeecce 

SEQ VEGQNGSKSNSPPALGPEACPVSLPSPPEASTLKGPPPEADLPRSPGNLTEREELAGSLA 
SEG 

PRO eecccccccccccccccccccccccccccccccccccccccccccccchhhhJihlihhhhh 

SEQ RAIAGGDEKGAAQVAAVLAQHRVALSVQLQEACFPPGPIRLQVTLEDAASAASAASSAHV 

xxxxxxxxxxx . . 

PRD hhhhcccchhhhhhhhhhhhhhhhhhccccccccccccceeeccchhhhhhhhhhhhhee 

SEQ ALQVH PHCTVAALQEQVFSELGFPPAVQRWVIGRCLCVPERSLAS YGVRQDGDPAFLYLL 

SEG 

PRD eeeccccchhhhhhhhhhhhccccccchhhhhhhhhhhhccccccccccccccceeeeec 

SEQ SAPREAPATGPSPQHPQKMDGELGRLFPPSLGLPPGPQPAASSLPSPLQPSWSCPSCTFI 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx - . . 

PRD cccccccccchhhhhhhhhhhhccccccccccccccccccccccccccccccccccceee 

SEQ NAPDRPGCEMCSTQRPCTWDPLAAAST 

SEG 

PRD ccccccccccccccccccccceeeccc 

(No Prosite data available for DKFZphtes3_23nl9.2) 
(No Pfam data available for DKF2phtes3_23nl9 .2) 


740 


wo 01/12659 


PCT/IBOO/01496 


DKF2phtes3_26g22 


group: intracellular transport/trafficking 

DKrZphtes3_26g22 encodes a novel 898 amino acid protein with similarity to kinesins. 

The novel protein contains a ATP/GTP-binding site motif A (P-loop) and a kinesin motor domain 
signature* Kinesin is a microtubule-associated force-producing protein that play a role in 
organelle transport. It is an oligomeric complex composed of two heavy chains and two light 
chains. The kinesin motor activity is directed toward the microtubule's plus end. The heavy 
chain contains a large globular N-terminal domain which is responsible for the motor activity 
of kinesin, which is known to hydrolyze ATP and to bind and move on microtubules. Several 
proteins involved in chromosome segregation and cell divsion contain this motor domain, such 
as drosophila claret segregational protein (ncd) , Drosophila kinesin-like protein (nod), human 
CENP-E and human mitotic kinesin-like protein-1 (MKLP-1) . The novel protein is a new kinesin 
like proptein. 

The new protein can find application in modulating chromosome transport in mitosis and meiosis 
and modulation of cell division. 


strong similarity to kinesins 

Sequenced by EMBL 

Locus : unknown 

Insert length: 3032 bp 

No poly A stretch found, no polyadenylation signal found 


1 CTGAAGCGCT GGGAGGCGGA CATTAAAGTG AAGTGGTTGC GGTAACCTGG 
51 CCTGGGCCTG AAGTGAGTGA GAGGCACATG AAGAGAAGTA TTCAAGTATT 
101 TATACAGATA GGAATCAAGA TAATCAACAA TGTCTGTCAC TGAGGAAGAC 
151 CTGTGCCACC ATATGAAAGT AGTAGTTCGT GTACGTCCGG AAAACACTAA 
201 AGAAAAAGCA GCTGGATTTC ATAAAGTGGT TCATGTTGTG GATAAACATA 
251 TCCTAGTTTT TGATCCCAAA CAAGAAGAAG TCAGTTTTTT CCATGGAAAG 
301 AAAACTACAA ATCAAAATGT TATAAAGAAA CAAAATAAGG ATCTTAAATT 
351 TGTATTTGAT GCTGTTTTTG ATGAAACGTC AACTCAGTCA GAAGTTTTTG 
401 AACACACTAC TAAGCCAATT CTTCGTAGTT TTTTGAATGG ATATAATTGC 
451 ACAGTACTTG CCTATGGTGC CACTGGTGCT GGGAAGACCC ACACTATGCT 
501 AGGATCAGCT GATGAACCTG GAGTGATGTA TCTAACAATG TTACACCTTT 
551 ACAAATGCAT GGATGAGATT AAAGAAGAGA AAATATGTAG TACTGCAGTT 
601 TCATATCTGG AGGTATATAA TGAACAGATT CGTGATCTCT TAGTAAATTC 
651 AGGGCCACTT GCTGTCCGGG AAGATACCCA AAAAGGGGTG GTCGTTCATG 
701 GACTTACTTT ACACCAGCCC AAATCCTCAG AAGAAATTTT ACATTTATTG 
751 GATAATGGAA ACAAAAACAG GACACAACAT CCCACTGATA TGAATGCCAC 
801 ATCTTCTCGT TCTCATGCTG TTTTCCAAAT TTACTTGCGA CAACAAGACA 
851 AAACAGCAAG TATCAATCAA AATGTCCGTA TTGCCAAGAT GTCACTCATT 
901 GACCTGGCAG GATCTGAGCG AGCAAGTACT TCCGGTGCTA AGGGGACCCG 
951 ATTTGTAGAA GGCACAAATA TTAATAGATC ACTTTTAGCT CTTGGGAATG 
1001 TCATCAATGC CTTAGCAGAT TCAAAGAGAA AGAATCAGCA TATCCCTTAC 
1051 AGAAATAGTA AGCTTACTCG CTTGTTAAAG GATTCTCTTG GAGGAAACTG 
1101 TCAAACTATA ATGATAGCTG CTGTTAGTCC TTCCTCTGTA TTCTACGATG 
1151 ACACATATAA CACTCTTAAG TATGCTAACC GGGCAAAGGA CATTAAATCT 
1201 TCTTTGAAGA GCAAT6TTCT TAATGTCAAT AATCATATAA CTCAATATGT 
1251 AAAGATCTGT AATGAGCAGA AGGCAGAGAT TTTATTGTTA AAAGAAAAAC 
1301 TAAAAGCCTA TGAAGAACAG AAAGCCTTCA CTAATGAAAA TCACCAAGCA 
1351 AAGTTAATGA TTTCAAACCC TCAGGAAAAA GAAATCGAAA GGTTTCAAGA 
1401 AATCCTGAAC TGCTTGTTCC AGAATCGAGA AGAAATTAGA CAAGAATATC 
1451 TGAAGTTGGA AATGTTACTT AAAGAAAATG AACTTAAATC ATTCTACCAA 
1501 CAACAGTGCC ATAAACAAAT AGAAATGATG TGTTCTGAAG ACAAAGTAGA 
1551 AAAGGCCACT GGAAAACGAG ATCATAGACT TGCAATGTTG AAAACTCGTC 
1601 GCTCCTACCT GGAGAAAAGG AGGGAGGAGG AATTGAAGCA ATTTGATGAG 
1651 AATACTAATT GGCTCCATCG TGTCGAAAAA GAAATGGGAC TCTTAAGTCA 
1701 AAACGGTCAT ATTCCAAAGG AACTCAAGAA AGATCTTCAT TGTCACCATT 
1751 TGCACCTCCA GAACAAAGAT TTGAAAGCAC AAATTAGACA TATGATGGAT 
1801 CTAGCTTGTC TTCAGGAACA GCAACACAGG CAGACTGAAG CAGTATTGAA 
1851 TGCTTTACTT CCAACCCTAA GAAAACAATA TTGCACATTA AAAGAAGCCG 
1901 GCCTGTCAAA TGCTGCTTTT GAATCTGACT TCAAAGAGAT CGAACATTTG 
1951 GTAGAGAGGA AAAAAGTGGT AGTTTGGGCT GAGGAAACTG CCGAACAACC 
2001 AAAGCAAAAC GATCTACCAG GGATTTCTGT TCTTATGACC TTTCCACAAC 
2051 TTGGACCAGT TCAGCCTATT CCTTGTTGCT CATCTTCAGG TGGAACTAAT 
2101 CTGGTTAAGA TTCCTACAGA AAAAAGAACT CGGAGAAAAC TAATGCCATC 
2151 TCCCTTGAAA GGACAGCATA CTCTAAAGTC TCCACCATCT CAAAGTGTGC 
2201 AGCTCAATGA TTCTCTTAGC AAAGAACTTC AGCCTATTGT ATATACACCA 
2251 GAAGACTGTA GAAAAGCTTT TCAAAATCCG TCTACAGTAA CCTTAATGAA 
2301 ACCATCATCA TTTACTACAA GTTTTCAGGC TATCAGCTCA AACATAAACA 
2351 GTGATAATTG TCTGAAAATG TTGTGTGAAG TAGCTATCCC TCATAATAGA 
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2401 AGAAAAGAAT GTGGACAGGA GGACTTGGAC TCTACATTTA CTATATGTGA 
24 51 AGACATCAAG AGCTCGAAGT GTAAATTACC CGAACAAGAA TCACTACCAA 
2501 ATGATAACAA AGACATTTTA CAACGGCTTG ATCCTTCTTC ATTCTCAACT 
2551 AAGCATTCTA TGCCTGTACC AAGCATGGTG CCATCCTACA TGGCAATGAC 
2601 TACTGCTGCC AAAAGGAAAC GGAAATTAAC AAGTTCTACA TCAAACAGTT 
2651 CGTTAACTGC AGACGTAAAT TCTGGATTTG CCAAACGTGT TCGACAAGAT 
2701 AATTCAAGTG AGAAGCACTT ACAAGAAAAC AAACCAACAA TGGAACATAA 
2751 AAGT^CATC TGTAAAATAA ATCCAAGCAT GGTTAGAAAA TTTGGAAGAA 
2801 ATATTTCAAA AGGAAATCTA AGATAAATCA CTTCAAAACC AAGCAAAATG 
2851 AAGTTGATCA AATCTGCTTT TCAAAGTTTA TCAATACCCT TTCAAAAATA 
2901 TATTTAAAAT CTTTGAAAGA AGACCCATCT TAAAGCTAAG TTTACCCAAG 
2951 TACTTTCAGC AAGCAGAAAA ATGAAACTCT TTGTTTTCTT CTTTTGTGTT 
3001 CTAAAAAAAT AAAATTTCAA AAGAAAAAAA AA 


BLAST Results 


NO BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 1 


ORF from 130 bp to 2823 bp; peptide length: 898 
Category: strong similarity to known protein 

Classification: Cell structure/motility 
Prosite motifs: ATP_GTP_A (113-121) 
KINESIN_MOTOR DOMAIN 1 (252-264) 


1 MSVTEEDLCH HMKVWRVRP ENTKEKAAGF HKWHWDKH ILVFDPKQEE 
51 VSFFHGKKTT NQNVIKKQNK DLKFVFDAVF DETSTQSEVF EHTTKPILRS 
101 FLNGYNCTVL AYGATGAGKT HTMLGSADEP GVMYLTMLHL YKCMDEIKEE 
151 KICSTAVSYL EVYNEQIRDL LVNSGPLAVR EDTQKGVWH GLTLHQPKSS 
201 EEILHLLDNG NKNRTQHPTD MNATSSRSHA VFQIYLRQQD KTASINQNVR 
251 lAKMSLIDLA GSERASTSGA KGTRFVEGTN INRSLLALGN VINALADSKR 
301 KNQHIPYRNS KLTRLLKDSL GGNCQTIMIA AVSPSSVFYD DTYNTLKYAN 
351 RAKDIKSSLK SNVLNVNNHI TQYVKICNEQ KAEILLLKEK LKAYEEQKAF 
401 TNENDQAKLM ISNPQEKEIE RFQEILNCLF QNREEIRQEY LKLEMLLKEN 
451 ELKSFYQQQC HKQIEMMCSE DKVEKATGKR DHRLAMLKTR RSYLEKRREE 
501 ELKQFDENTN WLHRVEKEMG LLSQNGHIPK ELKKDLHCHH LHLQNKDLKA 
551 QIRHMMDLAC LQEQQHRQTE AVLNALLPTL RKQYCTLKEA GLSNAAFESD 
601 FKEIEHLVER KKWVMADQT AEQPKQNDLP GISVLMTFPQ LGPVQPIPCC 
651 SSSGGTNLVK IPTEKRTRRK LMPSPLKGQH TLKSPPSQSV QLNDSLSKEL 
701 QPIVYTPEDC RKAFQNPSTV TLMKPSSFTT SFQAISSNIN SDNCLKMLCE 
751 VAIPHNRRKE CGQEDLDSTF TICEDIKSSK CKLPEQESLP NDNKDILQRL 
801 DPSSFSTKHS MPVPSMVPSY MAMTTAAKRK RKLTSSTSNS SLTADVNSGF 
851 AKRVRQDNSS EKHLQENKPT MEHKRNICKI NPSMVRKFGR NISKGNLR 


BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_26g22, frame 1 

SWISSPR0T:YB3D_SCHP0 PUTATIVE KINESIN-LIKE PROTEIN C2F12.13., N « 3, 
Score =874, P = 9e-93 

TREMBL:DMU89264_1 product: "kinesin lilce protein 67a"; Drosophila 
melanogaster kinesin like protein 67a mRNA, complete cds., N • 1, Score 
= 880, P = 4.2e-88 

TREMBL:SPBC64 9_1 gene: •*SPBC649 . 01c"; product: "putative kinesin-like 
protein"; S.pombe chromosome II cosmid c649., N = 3, Score - 814, P = 
9.8e-86 

PIR:S64238 kinesin-related protein KIP3 - yeast (Saccharomyces 
cerevisiae), N « 2, Score = 802, P = 2.5e-83 


>TREMBL:DMU89264_1 product: "kinesin like protein 67a"; Drosophila 
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melanogaster kinesin like protein 67a mRNA, complete cds. 

Length = 814 


HSPs: 

Score = 880 
Identities = 


(132.0 bits). Expect = 4.2e-88, P = 4.2e-88 
181/345 152%), Positives = 238/345 (68%) 


Query: 11 HMKVWRVRPENTKEKAAGFHKVVHVVDKHILVFDPKQEEVSFF-HGKKTTNQNVIKKQN 69 

++KV VRVRP N +E ++ V+D+ L+FDP +E+ FF G K +++ K+ N 

SbjCt: 8 NIKVAVRVRPYNVRELEQKQRSIIKVMDRSALLFDPDEEDDEFFFQGAKQPyRDITKRMN 67 

Query: 70 KDLKFVFDAVFDETSTQSEVFEHTTKPILRSFLNGYNCTVLAYGATGAGKTHTMLGSADE 129 

K L FD VFD ++ ++FE T P++ + LNGYNC+V YGATGAGKT TMLGS 
SbjCt: 68 KKLTMEFDRVFDIDNSNQDLFEECTAPLVDAVLNGYNCSVFVYGATGAGKTFTMLGSEAH 127 

Query: 130 PGVMYLTMLHLYKCMDEIKEEKICSTAVSYLEVYNEQIRDLLVMSGPLAVREDTQKGVVV 189 
PG+ YLTM L+ + + + VSYLEVYNE + +LL SGPL +RED GVVV 

SbjCt: 128 PGLTYLTMQDLFDKIQAQSDVRKFDVGVSYLEVYNEHVMNLLTKSGPLKLREDNN-GVVV 186 

Query: 190 HGLTLHQPKSSEEILHLLDNGNKNRTQHPTDMNATSSRSHAVFQI YLRQQDKTASINQNV 249 

GL L S+EE+L +L GN +RTQHPTD NA SSRSHA+FQ+++R ++ + V 

SbjCt: 187 SGLCLTPIYSAEELLRMLMLGNSHRTQHPTDANAESSRSHAIFQVHIRITERKTDTKRTV 246 

Query: 250 RIAKMSLIDLAGSERASTSGAKGTRFVEGTNINRSLLALGNVINALADSKRKNQHIPYRN 309 

K+S+IDLAGSERA+++ G RF EG +IN+SLLALGN IN LAD + HIPYR+ 
SbjCt: 247 — KLSMIDLAGSERAASTKGIGVRFKEGASINKSLLALGNCINKLADGLK— HIPYRD 300 

Query: 310 SKLTRLLKDSLGGNCQTIMIAAVSPSSVFYDDTYNTLKYANRAKDI 355 

S LTR+LKDSLGGNC+T+M+A VS SS+ Y+DTYNTLKYA+RAK I 
SbjCt: 301 SNLTRILKDSLGGNCRTLMVANVSMSSLTYEDTYNTLKYASRAKKI 346 


Pedant information for DKFZphtes3_26g22, frame 1 


Report for DKFZphtes3__26g22 .1 


(LENGTH] 

(MWJ 

Ipl) 

(HOMOLl 

IFUNCAT] 

[FUNCATJ 

(FUNG AT) 

[FUNCATl 

(FUKCAT] 

[FUNCAT] 

(FUNCAT) 

(FUNCAT) 

[FUNCAT] 


is. 


[FUNCAT) 

(FUNCAT] 

4e-28 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

(BLOCKS] 

(BLOCKS) 

[SCOP] 

[SCOP] 

[PIRKW] 

[PIRKW] 

(PIRKW) 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

(PIRKW) 

[PIRKW) 

[PIRKW] 

(PIRKW) 

{ PIRKW] 

(PIRKW] 


898 
.102281.63 
9.09 

SWISSPR0T:YB3D_SCHPO PUTATIVE KINESIN-LIKE PROTEIN C2F12,13. 3e-97 

30.04 organization of cytoskeleton [S. cerevisiae, YGL216wJ 2e~88 
03.22 cell cycle control and mitosis [S. cerevisiae, YGL216w] 2e-88 

08.22 cytoskele ton-dependent transport [S. cerevisiae, YGL216w) 2e-88 

30.10 nuclear organization (S» cerevisiae, YGL216wl 2e-88 

09.10 nuclear biogenesis [S. cerevisiae, YPR141c) 5e-42 

06.10 assembly of protein complexes (S. cerevisiae, YPR141cl 5e-42 

03.13 meiosis (S. cerevisiae, YPR141c3 5e-42 

11.01 stress response (S. cerevisiae, YPR141cl 5e-42 

03.07 pheromone response, mating-type determination, sex-specific proteins 
cerevisiae, YPRl41c) 5e-42 

30.05 organization of centrosome fS. cerevisiae, YPR141c) 5e-42 

03.04 budding, cell polarity and filament formation (S. cerevisiae, YKL079w) 

BL00411H 
BL00411G 
BL00411F 

BL00411E Kinesin motor domain proteins 
BL00411C Kinesin motor domain proteins 
BL00411B Kinesin motor domain proteins 
BL00411A Kinesin motor domain proteins 


d2kin.l 3.29.1.5.3 Kinesin 
d3kar_ 3.29.1.5,4 Kinesin 
nucleus 6e-87 
heterodimer 4e-58 
DNA binding 9e-60 
heterotetramer 2e-54 
mitosis 9e-60 
microtubule binding 4e-68 
ATP 6e-87 

phosphoprotein 5e-59 

heterotrimer 4e-68 

purine nucleotide binding le-26 

P-loop 6e-87 

coiled coil 4e-68 

heptad repeat 3e-62 

methylated amino acid 2e-54 

hydrolase 2e-54 

GTP binding le-60 


[Rat (Rattus norvegicus) le-117 
[Baker's yeast (Saccharomyce le-112 
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[PIRKWJ cell division 5e-57 

[SUPFAM] kinesin-related protein KIPl 3e-50 

(SUPFAM) kinesin-related protein CIN8 7e-33 

(SUPFAM} kinesin heavy chain 2e-54 

(SUPFAM] suppressor protein SMYl le-26 

(SUPFAMl kinesin-related protein KIF3 4e-68 

(SUPFAM] kinesin-related protein KIF2 le-46 

ISOPFAM] kinesin-related protein unc-104 7e-60 

[SUPFAM] unassigned kinesin-related proteins 6e-87 

(SUPFAMJ centromere protein E 3e-54 

(SUPFAM] kinesin-related protein KLP61F 5e-57 

(SUPFAM] kinesin-related protein MKLP-1 2e-28 

(SUPFAM] pleckstrin repeat homology 7e-60 

(SUPFAMJ" kinesin-related protein KIFIB 4e-61 

I SUPFAM] kinesin motor domain homology 6e-87 

[SUPFAMJ kinesin-related protein KLPA le-43 

(SUPFAM] kinesin-related protein nodA le-30 

[SUPFAM) kinesin-related protein EgS 5e-59 

(PROSITEJ ATP_GTP_A 1 

[PROSITEJ KINESIN_MOTOR_DOMAIN1 1 

[PFAM] Kinesin motor domain 

[KWJ Irregular 

[KWJ 3D 

[KWJ LOW_COMPLEXITY 8.57 % 

SEQ MSVTEEDLCHHMKVWRVRPENTKEKAAGFHKWHWDKHILVFDPKQEEVSFFHGKKTT 

SEG 

31car^ TBEEE 

SEQ NQNVIKKQNKDLKFVFDAVFDETSTQSEVFEHTTKPILRSFLNGYNCTVLAYGATGAGKT 
SEG 

3kar- EEEETTTTTTEEEEEETEEETTTTCHHHHHHHHHH-HHHGGGGCCCEEEEEECTTTTCHH 

SEQ HTMLGSADEPGVMYLTMLHLYKCMDEXKEEKICSTAVSYLEVYNEQI RDLLVNSGPLAVR 

SEG 

3 ka r - HHHHTTTT- -THHHHHHHHHHHHHHHHGGGCEEEEEEEEEEEETTEEEETT -TCCCCEEE 

SEQ EDTQKGVWHGLTLHQPKSSEEI LHLLDNGNKNRTQHPTDMNATSSRSHAVFQI YLRQQD 
SEG 

3kar- eettteeeeettcceeeccggghhhhhhhhhhhhccttttchhhhhhceeeeeeeeee^ 

SEQ ktasinqnvriakmslidlagserastsgakgtrfvegtninrsllalgnvinaladskr 

SEG 

3kar- TTTTCEE EEEEEEEECCCCCCCCCC HHHHHHHHHHHHHHHHHHHHHHHHTTTT 

SEQ knqhipyrnskltrllkdslggncqtimiaavspssvfyddtyntlkyanrakdiksslk 

SEG xxxxx 

3 ka r- tttccttttthhhhhhgggctttteeeeeeeecccggghhhhhhhhhhhhh 


SEQ snvlnvnmhitqyvkicneqkaei lllkeklkayeeqkaftnenoqaklmisnpqekei e 

SEG xxxxxxxx xxxxxxxxxxxxxxxxxxxxx 

3kar- 

SEQ rfqeilnclfqnreeirqeylklemllkenelksfyqqqchkqiemmcsedkvekatgkr 

SEG xxxxxxxxxxxxx 

3kar- 

SEQ dhrlamlktrrsylekrreeelkqfdentnwlhrvekemgllsqnghi pkelkkdlhchh 


SEG 
3kar- 


.xxxxxxxxxxx 


SEQ lhlqnkdlkaqirhmmdlaclqeqqhrqteavlnallptlrkqyctlkeaglsnaafesd 

SEG XXX 

3kar- 

SEQ fkeiehlverkkvwwadqtaeqpkqndlpgisvlmtfpqlgpvqpipccsssggtnlvk 

SEG 

3kdr- 

SEQ I PTEKRTRRKLMPSPLKGQHTLKSPPSQSVQLNDSLSKELQPI VYTPEDCRKAFQNPSTV 

SEG 

3kar- [\[[ 

SEQ tlmkpssfttsfqaissninsdnclkmlcevaiphnrrkecgqedldstfticedikssk 

SEG 

3kar- 

SEQ cklpeqeslpndnkdilqrldpssfstkhsmpvpsmvpsymamttaakrkrkltsstsns 

SEG 

3kar- 


. XXXXXXXXXXXXX 
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SEQ SLTADVNSGFAKRVRQDNSSEKHLQENKPTMEHKRNICKINPSMVRKFGRNISKGNLR 

SEG XXX 

3kar- 


Prosite for DKF2phtes3_26g22. 1 

PS00017 113->121 ATP_GTP_A PDCX:00017 
PS00411 252->264 KINESIN^MOTOR DOMAINl PDCX;00343 


Pfam for DKFZphtes3_26g22 . 1 


HMM_NAME Kinesin motor domain 

HMM *RCRPlNeREindgcscvVQWPpWtGyktvhnghegds 

R+RP N +E+++G +VV + + + + +++E S 
Query 17 RVRPENTKEKAAGFHKWHWD-KHILVFDPKQEEVSFFHGKKTTNQNV 64 

PhksFtFDHVFWWncTQedVYdtvAHPIVDDcFhGYNCTIFAYGO 

+ F+FD VF+ ++TQ +V++ + PI+ ++++GYNCT++AYG 
Query 65 IKKQNKDLKFVFDAVFDETSTQSEVFEHTTKPILRSFLNGYKCTVLAYGA 114 

TGSGKTYTMMGpggehPDHmGIIPRcCHDiFdrldkfqekDhciFWhVkCS 
TG+GKT+TM G + D+ G+ + +++++ D + + + +S 
Q^ery 115 TGAGKTHTMLG SADEPGVMYLTMLHLYKCKDEIK-EEKIC-STAVS 158 

HMM YMEI YNEe I YDLLCPn PqhMkpLn IHEH PNMGpYVqGCTEf HVcSY eDac 

Y+E+YNE+I+DLL+ N ++PL+++E+ G+ V G+T+ +S E+++ 
Query 159 YLEVYNEQIRDLLV-N— SGPLAVREDTQKGVWHGLTLHQPKSSEEIL 204 

HMM hWIWqGnknRHVAaTnMNdhSSRSHtlFTIHVeOrHk. .qcdehvcHSKM 

H+++ GNKNR+ +T MN++SSRSH++F+I ++Q K + V++ KM 
Query 205 HLLDNGNKNRTQHPTDMNATSSRSHAVFQIYLRQQDKTASINQNVRIAKM 254 

HMM NLVDLAGSERvnrTGAEGQRlKEGcNINqSLttLGnVInaLaDgqTKYmY 

+L+DLAGSER++ +GA G+R+ EG+NIN+SL++IXSNVINALAD + 
Q"«ry 255 SLIDLAGSERASTSGAKGTRFVEGTNINRSLLALGNVINALADSK 299 

HMM ggJigHIPYRDSKLTWlLQDSLGGNcKTcMIACIWPadWNYEETLSTLRYA 
+++HIPYR SKLT+LL+DSLGGNC T MIA+++P+ + Y++T +TL+YA 
Query 300 RKNQHIPYRNSKLTRLLKDSLGGNCQTIMIAAVSPSSVFYDDTYNTLKYA 349 

HMM dRAKnlkNkPQINEDPcaraalWRrYheQIqdMKhqL* 

+RAK+IK +N++ ++Y+ + K++ 
Query 350 NRAKDIKSSLKSNVLNVN-NHITQYVKICNEQKAEI 384 
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DKFZphtes3_27dl 


group: metabolism 

DKFZphtes3_27dl encodes a novel 712 amino acid protein similar to ubiquitin-specific proteases 
<EC 3.1.2.15). 

The novel protein contains both, a ubiquitin carboxyl-terminal hydrolases family 2 signature 1 
and signature 2. Pfam predicts a new member of the ubiquitin carboxyl-terminal hydrolases 
family 2. The ubiquitin system is responsible for the turn over of proteins. Ubiquitin 
carboxyl-terminal hydrolases (EC 3.1.2.15) (UCH) (deubiquitinating enzymes) are thiol 
proteases that recognize and hydrolyze the peptide bond at the C-terminal glycine of 
ubiquitin. These enzymes are Involved in the processing of poly-ubiquitin precursors as well 
as that of ubiquinated proteins. 

The novel protein is a new member of the ubiquitin carboxyl-terminal hydrolases family 2, 
represented by proteins such as yeast UBPl-16, human tre-2, human isopeptidase T and others. 

The novel protein can find application in modulation of ubiquitin- and protein metabolism in 

cells . 


similarity to ubiquitin-specific proteases 

complete cDNA, complete cds, 4 EST hits 

Sequenced by GBF 

Locus: unknown 

Insert length: 2871 bp 

Poly A stretch at pos. 2836, no polyadenylation signal found 


1 CCAAACCTGA AAGAGGTTGA TTTGTAATGA TTTGCAGGGG GGCACTGGAG 
51 GCAGCGGCCA GGACTTTTCA CTTAGGAGAT CAGCATTTGC CCTGATGGAA 
101 ACTGGGCGAT CCTGCAGGGA CTGACCTCTG AGTTATCCAA AGGCCGACCT 
151 GGGGAAAGAC TGATTTTGAG GTTTTAATAG TTTTCAGATG CTTCAAGTGT 
201 TGTGAACAGA GACTTGTTTG GATTATGCAT TTCTCAGCTA GACTAAATAA 
251 ATGCTAGCAA TGGATACGTG CAAACATGTT GGGCAGCTGC AGCTTGCTCA 
301 AGACCATTCC AGCCTCAACC CTCAGAAATG GCACTGTGTG GACTGCAACA 
351 CGACCGAGTC CATTTGGGCT TGCCTTAGCT GCTCCCATGT TGCCTGTGGA 
401 AGATATATTG AAGAGCATGC ACTCAAGCAC TTTCAAGAAA GCAGTCATCC 
451 TGTTGCATTG GAGGTGAATG AGATGTACGT TTTTTGTTAC CTTTGTGATG 
501 ATTATGTTCT GAATGATAAC GCAACTGGAG ACCTGAAGTT ACTACGACGT 
551 ACATTAAGTG CCATCAAAAG TCAAAATTAT CACTGCACAA CTCGTAGTGG 
601 GAGGTTTTTA CGGTCCATGG GTACAGGTGA TGATTCTTAT TTCTTACATG 
651 ACGGTGCCCA ATCTCTGCTT CAAAGTGAAG ATCAACTGTA TACTGCTCTT 
701 TGGCACAGGA GAAGGATACT AATGGGTAAA ATCTTTCGAA CATGGTTTGA 
751 ACAATCACCC ATTGGAAGAA AAAAGCAAGA AGAACCATTT CAGGAGAAAA 
801 TAGTAGTAAA AAGAGAAGTA AAG/^AAAGAC GGCAGGAATT GGAGTATCAA 
851 GTTAAAGCAG AATTGGAAAG TATGCCTCCA AGAAAGAGTT TACGTTTACA 
901 AGGGCTCGCT CAGTCGACCA TAATAGAAAT AGTTTCTGTT CAGGTGCCAG 
951 CACAAACGCC AGCATCACCA GCAAAAGATA AAGTACTCTC TACCTCAGAA 
1001 AATGAAATAT CTCAAAAAGT CAGTGACTCC TCAGTTAAAC GAAGGCCAAT 
1051 AGTAACTCCT GGTGTAACAG GATTGAGAAA TTTGGGAAAT ACTTGCTATA 
1101 TGAATTCTGT TCTTCAGGTG TTGAGTCATT TACTTATTTT TCGACAATGT 
1151 TTTTTAAAGC TTGATCTGAA CCAATGGCTG GCTATGACTG CTAGCGAGAA 
1201 GACAAGATCT TGTAAGCATC CACCAGTCAC AGATACAGTA GTATATCAAA 
1251 TGAATGAATG TCAGGAAAAA GATACAGGTT TTGTTTGCTC CAGACAATCA 
1301 AGTCTGTCAT CAGGACTAAG TGGTGGAGCA TCAAAAGGTA GAAAGATGGA 
1351 ACTTATTCAG CCAAAGGAGC CAACTTCACA GTACATTTCT CTTTGTGATG 
1401 AATTGCATAC TTTGTTCCAA GTCATGTGGT CTGGAAAGTG GGCGTTGGTC 
1451 TCACCATTTG CTATGCTACA CTCAGTGTGG AGACTCATTC CTGCCTTTCG 
1501 TGGTTACGCC CAACAAGACG CTCAGGAATT TCTTTGTGAA CTTTTAGATA 
1551 AAATACAACG TGAATTAGAG ACAACTGGTA CCAGTTTACC AGCTCTTATC 
1601 CCCACTTCTC AAAGGAAACT CATCAAACAA GTTCTGAATG TTGTAAATAA 
1651 CATTTTTCAT GGACAACTTC TTAGTCAGGT TACATGTCTT GCATGTGACA 
1701 ACAAATCAAA TACCATAGAA CCTTTCTGGG ACTTGTCATT GGAGTTTCCA 
1751 GAAAGGTATC AATGCAGTGG AAAAGATATT GCTTCCCAGC CATGTCTGGT 
1801 TACTGAAATG TTGGCCAAAT TTACAGAAAC TGAAGCTTTA GAAGGAAAAA 
1851 TCTACGTATG TGACCAGTGT AACTCAAAGC GTAGAAGGTT TTCCTCCAAA 
1901 CCAGTTGTAC TCACAGAAGC CCAGAAACAA CTTATGATAT GCCACCTACC 
1951 TCAGGTTCTC AGACTGCACC TCAAACGATT CAGGTGGTCA GGACGTAATA 
2001 ACCGAGAGAA GATTGGTGTT CATGTTGGCT TTGAGGAAAT CTTAAACATG 
2051 GAGCCCTATT GCTGCAGGGA GACCCTGAAA TCCCTCAGAC CAGAATGCTT 
2101 TATCTATGAC TTGTCCGCGG TGGTGATGCA CCATGGGAAA GGATTTGGCT 
2151 CAGGGCACTA CACTGCCTAC TGCTATAATT CTGAAGGAGG GTTCTGGGTA 
2201 CACTGCAATG ATTCCAAACT AAGCATGTGC ACTATGGATG AAGTATGCAA 
2251 GGCTCAAGCT TATATCTTGT TTTATACCCA ACGAGTTACT GAGAATGGAC 
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2301 ATTCTAAACT TTTGCCTCCA GAGCTCCTGT TGGGGAGCCA ACATCCCAAT 
2351 GAAGACGCTG ATACCTCGTC TAATGAAATC CTTAGCTGAT CCAAAGACAA 
2401 TGGGGTTTTC TTCCTGTGAT TTATATATAT ACTTTTTAAA AGACTGATGT 
2451 ACCATTTTAA ACTTCATTTT TTCTTGTGAA TCAGTGTATA CTACATTTAT 
2501 ACATTTTATA TCTAACAATT XTTTTTTTTT ACAAAGTATA AATGTATATA 
2551 TC7VACTGAAG GTAACTACTT TTTTCATATT TGGAGTTTTA AACTTTTGGT 
2601 GTTTACCTCA GACTGATGTT ACCTCTTTTA TATTTTTATG TCTTAATTGG 
2651 CTCGGATGAT GAACTTGTGC AATCTTCTAC CAACAAAGTT CAAGTGGCAT 
2701 CATTTTATAT ACATGTATCT TTTTCAGGTA TTTTCTATAC AAATTCTTAA 
2751 TAGATGGAAA ATTAGACTCT AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
2801 AAAAAAAAAA AAAAAAAAAA AAGGGGCGGC CGCTCTAAAA AAAAAAAAAA 
2851 AAAAAAAAAA AAAAAAAAAG G 


BLAST Results 


NO BLAST result 


Medline entries 


98072201: 

Regulation of ubiqui tin-dependent processes by deubiquitinating 
enzymes . 

98431658: 

The ubiquitin system. 


Peptide information for frame 2 


ORF from 251 bp to 2386 
Category: similarity to 
Prosite motifs: UCH__2_1 
aCH_2_2 (619-638) 
UCH 2_2 (619-638) 


bp; peptide length: 712 
known protein 
(274-290) 


1 MLAMDTCKHV GQLQLAQDHS SLNPQKWHCV DCNTTESIWA CLSCSHVACG 
51 RYIEEHALKH FQESSHPVAL EVNEMYVFCY LCDDYVLNDN ATGDLKLLRR 
101 TLSAIKS(2NY HCTTRSGRFL R5HGTGDDSY FLHDGAQSLL QSEDQLYTAL 
151 WHRRRILMGK IFRTWFEQSP IGRKKQEEPF QEKIWKREV KKRRQELEYQ 
201 VKAELESMPP RKSLRLQGLA QSTIIEIVSV QVPAQTPASP AKDKVLSTSE 
251 NEISQKVSDS SVKRRPIVTP GVTGLRNLGN TCYMNSVLQV LSHLLIFRQC 
301 FLKLDLNQWL AMTASEKTRS CKHPPVTDTV VYQMNECQEK DTGFVCSRQS 
351 SLSSGLSGGA SKGRKHELIQ PKEPTSQYIS LCKELHTLFQ VMWSGKWALV 
401 SPFAMLHSVW RLIPAFRGYA QQDAQEFLCE LLDKIQRELE TTGTSLPALI 
451 PTSQRKLIKQ VLNWNNIFH GQLLSQVTCL ACDNKSNTIE PFWDLSLEFP 
501 ERYQCSGKDI ASQPCLVTEM LAKPTETEAL EGKIYVCDQC NSKRRRFSSK 
551 PVVLTEAQKQ LMICHLPQVL RLHLKRFRWS GRNNREKIGV HVGFEEILNM 
601 EPYCCRETLK SLRPECFIYD LSAWMHHGK GFGSGHYTAY CYNSEGGFWV 
651 HCNDSKLSMC TMDEVCKAQA YILFYTQRVT ENGHSKLLPP ELLLGSQHPN 
701 EDADTSSNEI LS 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_27dl, frame 2 

PIR:S57591 hypothetical protein YMR223w - yeast (Saccharomyces 
cerevisiae), N = 4, Score = 218, P = 8.4e-38 

SWISSPROT:UBPB_HUMAN UBIQUITIN CARBOXYL-TERMINAL HYDROLASE 11 (EC 
3.1.2.15) (UBIQUITIN THIOLESTERASE 11) (UBIQUITIN-SPECIFIC PROCESSING 
PROTEASE 13) (DEUBIQUITINATING ENZYME 11) (KIAA0055)., N = 2, Score « 
300, P = 9.3e-31 

TREMBL: AF079565_1 gene: ••Ubp41"; product: "ubiquitin-specif ic protease 
UBP41"; Mus musculus ubiquitin-specif ic protease UBP41 (Ubp41) mRNA, 
complete cds., N = 3, Score = 187, P = 8.7e-30 

PIR: 158376 hypothetical protein unp - mouse, N « 3, Score » 214, P « 
1.2e-28 
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>SWISSPROT:UBPB_HUMAN UBIQUITIN CARBOXYL-TERMINAL HYDROLASE 11 (EC 3.1.2.15) 
(UBIQOITIN THIOLESTERASE 11 > (UBIQOITIN-SPECIFIC PROCESSING PROTEASE 13) 
(DEUBIQUITINATING ENBYME 11) (KIAA0055) . 
Length « 1, 118 

HSPs: 


Score = 300 (45.0 bits). Expect = 9.3e-31, Sum P(2) « 9.3e-31 
Identities = 95/301 (31%), Positives = 149/301 (49%) 

Query: 381 LCHELHTLFQVMWSGKWALVSPFAMLHSVWRLIPAFRGYAQQDAQEFLCELLDKIQREL- 439 

+ E + + +W+G++ +SP +■»- ++ F GY+QQD+QE L L+D + +L 

Sbjct: 826 VAEEFGIIMKALWTGQYRYISPKDFKITIGKINDQFAGYSQQDSQELLLFLMDGLHEDLN 885 

Query: 440 


Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 


ETTGTSLPALIPTSQRKLIKQVLN — WNNIFHGQLLSQVTCLACDNKSMT 488 

E L + LN ++ +F GQ S V CL C KS T 

886 KADNRKRYKEENNDHLDDFKAAEHAWQKHKQLNESIIVALFQGQFKSTVQCLTCHKKSRT 945 

489 IEPFWDLSLEFPERYQCSGKDIASQPCLVTEMLAKFTETEALEGKIYVCDQCNSKRRRFS 548 

E F LSL +C+ +D CL + +K E + + + C C ++R 
946 FEAFMYLSLPLASTSKCTLQD CL— RLFSK— EEKLTDNNRFYCSHCRARR 992 

549 skpvvlteaqkqlmichlpqvlrlhlkrfrwsgrnnrekigvhvgfe-eilnmepycc— 605 
++ k++ i lp vl +hlkrf + gr ++k+ v f e l++ y 

993 DSLKKIEIWKLPPVLLVHLKRFSYDGRW-KQKLQTSVDFPLENLDLSQYVIGP 1044 

606 RETLKSLRPECFIYDLSAVVMHHGKGFGSGHYTAYCYNSEGGFWVHCNDSKLSMCTMDEV 665 
+ LK Y + L +V H+G G GHYTAYC N+ V +D ++S ++ V 

104 5 KNNLKK YNLFSVSNHYG-GLDGGHYTAYCKNAARQRWFKFDDHEVSDISVSSV 1096 

666 CKAQAYILFYTQ RVTE 681 

+ AYILFYT RVT+ 
1097 KSSAAYILFYTSLGPRVTD 1115 


Score = 126 (18.9 bits), Expect = 9.3e-31, Sum P(2) » 9.3e-31 
Identities = 41/116 (35%), Positives = 63/116 (54%) 

Query: 200 QVKAELESMPPR— KSLRLQGLAQSTIIEIVSVQVPAQTPASPAKDKVLSTSENEISQKV 257 

Q+ AE + P + +S + Qr 1+ + P TP ++K + EXS ++ 

Sbjct: 701 QIPAERDREPSKLKRSYSSPDITQA— IQEEEKRKPTVTPTVNRENKPTCYPKAEIS-RL 757 

Query: 258 SDSSVKR-RPIVT PGVTGLRNLGNTCYMNSVLQVLS HLLIF--RQCFLKLDLNQ 308 

S S ++ P+ P +TGLRNLGNTCYMNS + LQ L HL + R C+ D+N+ 

Sbjct: 758 SASQIRNLNPVFGGSGPALTGLRNLGNTCYMNSILQCLCNAPHLADYFNRNCYQD-DINR 816 

Score « 50 (7.5 bits). Expect - 8,3e-23, Sum P(2) = 8.3e-23 
Identities - 29/106 (27%), Positives « 51/106 (48%) 

Query: 173 RKKQEEPFQEKIWKREVKKRRQELEYQVKAELESMPPRKSLRLQGLAQSTIIEIVSVQV 232 

+ KQE+ +E+ +++ K R++E E+K+E+ + QA+++SQ 
Sbjct: 475 KNKQEKELRERQQEEQKEKLRKEEQEQKAKKKQEA-EENEITEKQQKAKEEMEKKESEQA 533 

Query: 233 PAQ TPASPAKD KVLSTSENEIS--QKVSDSSVKRRPIVTPGV 272 

+ T A K+ K S SE+E S +K + KR P TP + 
Sbjct: 534 KKEDKETSAKRGKEITGVKRQSKSEHETSDAKKSVEDRGKRCP—TPEI 580 

Score = 42 (6.3 bits). Expect = 5.7e-22, Sum P(2) = 5.7e-22 
Identities = 13/58 (22%), Positives = 27/58 (46%) 

Query: 167 EQSPIGRKKQEEPFQEKIWKREVKKRRQELEY-QVKAELESMPPRKSLRLQGLAQST 223 

EQ +KKQE E + + + K+ ++ E Q K E + ++ + G+ + + 
Sbjct: 498 EQEQKAKKKQEAEENEITEKQQKAKEEMEKKESEQAKKEDKETSAKRGKEITGVKRQS 555 

Pedant information for DKFZphtes3_27dl, frame 2 


Report for DKF2phtes3_27dl .2 


(LENGTH! 712 

(MW) 81155.71 

(pi] 8.21 

(HOMOLJ SWISSPROT:UBPB_HUMAfJ UBIQUITIN CARBOXYL-TERMINAL HYDROLASE 11 (EC 3.1.2.15) 

(UBIQUITIN THIOLESTERASE 11) (OBIQUITIN-SPECIFIC PROCESSING PROTEASE 13) (DEUBIQUITINATING 
ENZYME 11) (KIAA0055), 4e-32 

IFUNCAT] 06.13.01 cytoplasmic degradation fS. cerevisiae, YMR223wl 5e-33 

(FUNCAT) 06.07 protein modification (glycolsylation, acylation, myristylation, 

palmitylation, farnesylation and processing) [S. cerevisiae, YMR223w] 5e-33 
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[FUNCAT] 

06.13 proteolysis [S. cerevisiae, YBL067c] 3e-19 


(FONCATl 

10.03-99 other osmosensing activities [S. cerevisiae, YDR069cl 

[FUNCAT] 

03.10 sporulation and germination (S. cerevisiae, YDR069cJ 2e- 

17 

(FUNCAT) 

30.10 nuclear organization (S. cerevisiae, YDRG69c] 2e-17 


[FUNCAT] 

30.03 organization of cytoplasm (S. cerevisiae, YDR069cl 2e- 

•17 

[FUKCATJ 

09.25 vacuolar and lysosomal biogenesis [S. cerevisiae, YDR069c] 

[FUNCAT] 

04.05.01.04 transcriptional control (S. cerevisiae, YNL186w] 4e- 

•17 

[FUNCAT] 

99 unclassified proteins [S. cerevisiae, YHLOlOc) 3e-12 


[ BLOCKS 1 

BL00970A Nuclear transition protein 2 proteins 


(BLOCKS! 

BL00972D 


(BLOCKS] 

oLUU? /^C ■ 


[BLOCKS] 

BLUU972B 


f BLOCKS 1 

BL00972A 


(ECJ 

3.1.2.15 Ubiquitin thiolesterase 5e-06 



alternative splicing 2e-ll 


[PIRKW] 

thiolester hydrolase 5e-06 


(PIRKwi 

hydrolase le-14 


( SUPFAM] 

RING finger homology 7e-ll 


[SUPFAM] 

deubiquinating enzyme SSV7 5e-16 


( PROSITE] 

MYRISTYL 5 


(PROSITEl 

AMI DAT I ON 2 


I PROSITE] 

CAMP PHOSPHO SITE 1 


(PROSITEl 

CK2 PHOSPHO SITE 10 


[PROSITEl 

TYR^PHOSPHO SITE 2 


(PROSITE] 

UCH 2 2 1 


(PROSITE] 

PKC PHOSPHO SITE 17 


[PROSITE] 

ASN GLYCOSYLATION 4 


[PROSITE] 

UCH_2__1 1 


[PFAM] 

Ubiquitin carboxyl -terminal hydrolases family 2 


(PFAMJ 

Ubiquitin carboxyl- terminal hydrolases family 2 


tKWj 

Alpha Beta 


(KWI 

LOW C^iPLEXITY 4 . 92 % 



SEQ MLAMDTCKHVGQLQLAQDHSSLNPQKWHCVDCNTTESIWACLSCSHVACGRYI EEHALKH 

SEG , 

PRD ccccccccchhhhhhhhcccccccccceeecccceeeeeeeccccccccchhhhhhhhhh 

SEQ FQESSHPVALBVNEMYVFCYLCDDYVLNDNATGDLKLLRRTLSAIKSQNYHCTTRSGRFL 

SEG 

PRD hhhhccceeecccceeeeeeccccccccccccchhhhhhhhhhhhhcccceeeccccccc 

SEQ R5MGTGDDSYFLHDGAQSLLQSEDQLYTALWHRRRILMGKI FRTWFEQSPIGRKKQEEPF 

SEG 

PRD cccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhchhhhhhhhhhh 

SEQ QEKI WKREVKKRRQELEYQVKAEIiESMPPRKSLRLQGLAQSTIIEIVSVQVPAQTPASP 

SEG xxxxxxxxxxxxxxxx 

PRD hheeehhhhhhhhhhhhhhhhhhhhhhcccccccccccccccceeeeeeccccccccccc 

SEQ AKDKVLSTSENEISQKVSDSSVKRRPIVTPGVTGLRNLGNTCYMNSVLQVLSHLLIFRQC 

SEG 

PRD ccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhh 

SEQ FLKLDLNQWLAMTASEKTRSCKHPPVTDTWYQMNECQEKDTGFVCSRQSSLSSGLSGGA 

SEG xxxxxxxxxxxxxx 

PRD hhhhhhchhhhhhhhhhhhhhccccccceeehhhhhcccccccccccccccccccccccc 

SEQ SKGRKMELIQPKEPTSQYISLCHELHTLFQVMWSGKWALVSPFAMLHSVWRLI PAFRGYA 

SEG xxxxx 

PRD ccccceeecccccccchhhhhhhhhhhhhhhhhccceeeeccchhhhhhhhhhhccccch 

SEQ QQDAQEFLCELLDKIQRELETTGTSLPALIPTSQRKLIKQVLNWNNIFHGQLLSQVTCL 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhccccccccccchhhhhhhhhhhhhccccchhhhhhhJic 

SEQ ACDNKSNTIEPFWDLSLEFPERYQCSGKDIASQPCLVTEMLAKFTETEALEGKIYVCDQC 

SEG 

PRD cccccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhccceeecccc 

SEQ NSKRRRFSSKPWLTEAQKQLMICHLPQVLRLHLKRFRWSGRNNREKIGVHVGFEEILNM 

SEG 

PRD ccccccccccchhhhhhhhhhhhhhchhhhhhhhhhhhhcccccccccceeeeccccccc 

SEQ EPYCCRETLKSLRPECFIYDLSAWMHHGKGFGSGHYTAYCYNSEGGFWVHCNDSKLSMC 

SEG 

PRO ccccccccccccccceeeeeeeeeeeecccccccccceeeeccccccceeeecccccccc 

SEQ TMDEVCKAQAYILFYTQRVTENGHSKLLPPELLLGSQHPNEDADTSSNEXLS 

SEG 

PRD cchhhhhhhhhhhhhheeeecccccccccccccccccccccccccccccccc 
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Prosite for DKF2phtes3_27dl . 2 


PSOOOOl 

33->37 

PSOOOOl 

90->94 

PSOOOOl 

484 

->488 

PSOOOOl 

653 

->657 

PS00004 

545 

->549 

PS00005 


6->9 

PS00005 

113 

->116 

PS00005 

116 

->119 

PS00005 

213 

->216 

PS00005 

254 

->257 

PS00005 

261 

->264 

PS00005 

315 

->318 

PS00005 

320 

->323 

PS00005 

394 

->397 

PS00005 

453 

->456 

PS00005 

506 

->509 

PS00005 

542 

->545 

PS00005 

548- 

->551 

PS00005 

580 

->583 

PS00005 

608- 

->611 

PS00005 

611- 

->614 

PS00005 

676- 

->679 

PS00006 

125- 

->129 

PS00006 

164- 

->16B 

PSO00O6 

223- 

->227 

PS00006 

247- 

->251 

PS00006 

249- 

->253 

PS00006 

313- 

■>317 

PSO00O6 

506- 

■>510 

PS00006 

525- 

■>529 

PS00006 

661- 

>665 

PS00006 

706- 

>710 

PS00007 

193- 

>200 

PS00007 

192- 

>200 

PSO0OO8 

'218- 

>224 

PS00008 

355- 

>361 

PS00008 

359- 

>365 

PS00008 

471- 

>477 

PS00008 

589- 

>595 

PS00009 

171- 

>175 

PS00009 

362- 

>366 

PS00972 

274- 

>290 

PS00973 

619- 

>638 


asn_glycosylation 

asn_glycosylati0n 

asm glycosylation 

asn'glycosylation 

c amp_phos pho_s ite 

pkc_phospho_site 

pkc phospho_site 

pkc'phospho site 

pkc_phospho~site 

pkc_phospho~site 

pkcphosphosite 

PKC_ PHOS PHO_S I TE 

PKC_PHOS PHO_S I TE 

PKC_PHOS PHO_S I TE 

PKCPHOSPHO^SITE 

PKC_ PHOSPHORS ITE 

PKCPHOSPHOSITE 

PKC_PHOS PHO__S I TE 

PKC_PHOSPHO_SITE 

P KC_ PHOS PHO_S I TE 

PKC_PHOS PHO_SITE 

PKC_PHOSPHO SITE 

C K2_PHOS PHO^S I TE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO SITE 

CK2_PHOSPH02siTE 

CK2_PHOS PHOS I TE 

CK2_PH0SPH0_SITE 

CK2_PHOSPH0_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOS PH0_S I TE 

CK2_PH0SPH0_SITE 

TYR_PHOSPH0_SITE 

TYR_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

AMI DAT ION 

AMI DAT I ON 

UCH_2_1 

UCH 2 2 


PDOCOOOOl 

PDOCOOOOl 

PDOCOOOOl 

PDOCOOOOl 

PDOC00004 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00007 

PDOC00007 

PDOC00008 

PDOC00008 

PDOC00008 

PDOC00008 

PDOC00008 

PDOC00009 

POOC00009 

PDOC00750 

PDOC00750 


Pfam for DKFZphtes3_27dl .2 


HMM_NAME Ubiquitin carboxyl- terminal hydrolases family 2 

HMM *GIqNlGNTCYMNSIIQCL* 

G++NLGNTCYMNS++Q+L 
Query 274 GLRNLGNTCYMNSVLQVL 291 


HMMNAME Ubiquitin carboxyl -terminal hydrolases family 2 

HMM *YdLYgVICHYGntldyGHYWayVKNenhHRWkWYYFDDEtV* 

YDL +V+ H+G + -f+GHY-f-AY-H+N -»- ++W+ +D++ 
Query 619 YDLSAVVMHHGKGFGSGHYTAYCYNSE— GGFWVHCNDSKL 
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group: transmernbrane protein 

Summary DKFZphtes3_27k4 encodes a novel 4 90 amino acid protein with similarity to two 
hypothetical C.elegans proteins. 

The novel protein contains 10 transmembrane regions and a leucine zipper. It is a member of 
the new 10 trans-membrane domain containing protein family which is specific for multicellular 
eukariotes . 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 
genes and as a new marker for testicular cells. 


strong similarity to C.elegans K07H8 .2/ZK185 . 2 
membrane regions: 10 

complete cDNA, complete cds potential start at Bp 109, few EST hits 

Sequenced by GBF 

Locus: unknown 

Insert length: 1901 bp 

Poly A stretch at pos. 1866, no polyadenylation signal found 


1 GTGATTTACC AGAAAAACCA AGAAGACAGG CACAAAAAAG CAAACGGCAT 
51 TTGGCAAGAT GGATTATCAA CTGCAGTACA GACTTTTAGT AATAGATCTG 
101 AGCAACACAT GGAGTATCAC AGTTTCTCAG AGCAGTCTTT TCATGCCAAT 
151 AATGGGCACG CATCATCAAG CTGCAGCCAA AAGTATGATG ACTATGCCAA 
201 TTATAATTAC TGTGATGGAA GGGAGACTTC AGAAACCACT GCCATGTTAC 
251 AAGATGAAGA TATATCTAGT GATGGTGATG AAGATGCTAT TGTAGAAGTG 
301 ACCCCAAAAT TACCAAAGGA ATCCAGTGGC ATCATGGCAT TGCAAATACT 
351 TGTGCCCTTT TTGCTAGCTG GTTTTGGAAC AGTTTCAGCT GGCATGGTAC 
401 TGGATATAGT ACAGCACTGG GAGGTGTTCA GAAAAGTTAC AGAAGTTTTC 
451 ATTTTAGTCC CTGCACTTCT TGGTCTCAAA GGGAACTTGG AAATGACATT 
501 GGCATCCAGA TTATCCACTG CAGTAAATAT TGGGAAGATG GATTCACCCA 
551 TTGAAAAGTG GAACCTAATA ATTGGCAACT TGGCTTTAAA GCAGGTTCAG 
601 GCAACAGTAG TGGGTTTTCT AGCAGCTGTG GCAGCAATTA TATTGGGCTG 
651 GATTCCAGAA GGAAAATATT ACCTTGATCA TTCCATACTT CTGTGCTCTA 
701 GCAGTGTGGC AACTGCCTTC ATTGCATCTC TTCTGCAGGG AATAATAATG 
751 GTTGGGGTTA TCGTTGGTTC AAAGAAGACT GGTATAAATC CTGATAATGT 
801 TGCTACACCC ATTGCTGCTA GTTTTGGCGA CCTTATAACT CTTGCCATAT 
851 TGGCTTGGAT AAGTCAGGGC TTATACTCCT GTCTTGAGAC CTATTACTAC 
901 ATTTCTCCAT TAGTTGGTGT ATTTTTCTTG GCTCTAACCC CTATTTGGAT 
951 TATAATAGCT GCCAAACATC CAGCCACAAG AACAGTTCTC CACTCAGGCT 
1001 GGGAGCCTGT CATAACAGCT ATGGTTATAA GTAGCATTGG GGGCCTTATT 
1051 CTGGACACAA CTGTATCAGA CCCAAACTTG GTTGGGATTG TTGTTTACAC 
1101 GCCAGTTATT AATGGTATTG GTGGTAATTT GGTGGCCATT CAGGCTAGCA 
1151 GGATTTCTAC CTACCTCCAT TTACATAGCA TTCCAGGAGA ATTGCCTGAT 
1201 GAACCCAAAG GTTGTTACTA CCCATTTAGA ACTTTCTTTG GTCCAGGAGT 
1251 AAATAATAAG TCTGCTCAAG TTCTACTGCT TTTAGTGATT CCTGGACATT 
1301 TAATTTTCCT CTACACTATT CATTTGATGA AAAGTGGTCA TACTTCTTTA 
1351 ACTATAATCT TCATAGTAGT GTATTTATTT GGCGCTGTGT TACAGGTATT 
1401 TACCTTGCTG TGGATTGCTG ACTGGATGGT CCATCACTTC TGGAGGAAAG 
14 51 GAAAGGACCC GGATAGTTTC TCCATCCCCT ACCTAACAGC ATTGGGTGAT 
1501 CTGCTCGGGA CAGCTCTGTT AGCCTTAAGT TTTCATTTTC TTTGGCTTAT 
1551 TGGAGATCGA GATGGAGATG TTGGAGACTA ATAAATTCTA CAAACTGCTC 
1601 TCAAGTTACC AAGGAAGAAA ATACACGACA ACCACTTATG GCTCTTTTTC 
1651 AAAACTCTTA AATCAGTAGT TTGACTTTTG CCAGGGTAAT CTTCAGTTGG 
1701 CCCTGATTCA ATTAAATGGC CTTAATTTTT TTTTAAGGAA TTTGTGTCAA 
1751 AACCAGAATG AAGAGTATTC GTGCTGCTTT TCATAGAATA AATGATAATT 
1801 TGACATAGAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
1851 AAAAAAAAAA AAGGGGAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAGGG 
1901 G 


BLAST Results 


No BLAST result 


Medline entries 
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No Medline entry 


Peptide information for frame 1 


ORF from 109 bp to 1578 bp; peptide length: 490 
Category: similarity to unknown protein 


1 MEYHSFSEQS 
51 DISSDGDEDA 
101 VQHWEVFRKV 
151 WNLIIGNLAL 
201 ATAFIASLLQ 
251 ISQGLYSCLE 
301 VITAMVISSI 
351 TYLHLHSIPG 
401 LYTIHLMKSG 
451 PDSFSIPYLT 


FHANNGHASS 
IVEVTPKLPK 
TEVFILVPAL 
KQVQATVVGF 
GIIMVGVIVG 
TYYYISPLVG 
GGLILDTTVS 
ELPDEPKGCY 
HTSLTIIFIV 
ALGDLLGTAL 


SCSQKYDDYA 
ESSGIMALQI 
LGLKGl^LEMT 
LAAVAAIILG 
SKKTGINPDN 
VFFLALTPIW 
DPNLVGIVVY 
YPFRTFFGPG 
VYLFGAVLQV 
LALSFHFLWL 


NYNYCDGRET 
LVPFLLAGFG 
LASRLSTAVN 
WIPEGKYYLD 
VATPIAASFG 
IIIAAKHPAT 
TPVINGIGGN 
VNNKSAQVLL 
FTLLWIADWM 
IGDRDGDVGD 


SETTAMLQDE 
TVSAGMVLDI 
IGKHDSPIEK 
HSILLCSSSV 
DLITLAILAW 
RTVLHSGWEP 
LVAIQASRIS 
LLVIPGHLIF 
VHHFWRKGKD 


BLAST? hits 

No BLASTP hits available 

Alert BLASTP hits for DKF2phtes3_27k4, frame 1 

TREMBL:AF0367 04_2 gene: "ZK185.2"; Caenorhabditis elegans cosmid 
EK185,, N = 1, Score = 730, P = 3.1e-72 

TREMBL;AF047 659_9 gene: "K07H8.2'*; Caenorhabditis eleaans cosmid 
K07H8., N = 1, Score - 940, P * 1.7e-94 


>TREMBL:AF047 659_9 gene: 
Length « 507 

HSPs: 


'K07HB.2"; Caenorhabditis elegans cosmid K07H8. 


Score = 940 (141.0 bits). Expect = 1.7e-94, P = 1.7e-94 
Identities = 204/412 (49%), Positives = 271/412 (65%) 

Query: 68 LPKESSGIMALQILVPFLLAGFGTVSAGMVLDIVQHWEVFRKVTEVFILVPALLGLKGNL 127 

+P ESS ++ Q+L PF +AG G V AG+VL IV W +F ++ E+ ILVPALLGLKGNL 
Sbjct: 82 IPAESSYVLFFQVLFPFAVAGLGMVFAGLVLSIVVTWPLFEEIPEILILVPALLGLKGNL 141 

Query: 128 EMTLASRLSTAVNIGKMDSPIEKWNLIIGNLALKQVQATVVGFLAAVAAIILGWIPEGKY 187 

EMTLASRLST N+G MDS ++ + I NLAL QVQATW FLA+ A L -HP G + 
Sbjct: 142 EMTLASRLSTLANLGHMDSSKQRKDVVIANLALVQVQATWAFLASAFAAALAFIPSGDF 201 

Query: 188 YLDHSILLCSSSVATAFIASLLQGIIMVGVIVGSKKTGINPDNVATPIAASFGDLITLAI 247 

H L+C+SS+ATA ASL+ ++MV VIV S+K INPDNVATPIAAS GDL TL + 
Sbjct: 202 DWAHGALMCASSLATACSASLVLSLLMVWIVrSRKYNINPDNVATPIAASLGDLTTLTV 261 

Query: 248 LAWISQGLYSCLETYYYISPLVGVFFLALTPIWIIIAAKHPATRTVLHSGWEPVITAMVI 307 

LA+ T +++ +V V FL L P WI lA ++ T+ L++GW PVI +M+I 

Sbjct: 262 LAFFGSVFLKAHNTESWLNVIVIVLFLLLLPFWIKIANENEGTQETLYNGWTPVIMSMLI 321 

Query: 308 SSIGGLILDTTVSDPNLVGIWYTPVINGIGGNLVAIQASRISTYLHLHSIPGELPDEPK 3S7 

SS GG IL+T V + + Y PV+NG+GGNL A+QASR+STY H G LP+E 

Sbjct: 322 SSAGGFILETAVRRYH— SLSTYGPVLNGVGGNLAAVQASRLSTYFHKAGTVGVLPNEWT 379 

Query: 368 GCYYPF— RTFFGPGVNNKSAQVLLLLVIPGHLIFLYTIHLM KSGHTSLTIIFIVV 421 

+ R FF +++SA+VLLLLV+PGH+ F + I L K+ T -l-F + 

Sbjct: 380 VSRFTSVQRAFFSKEWDSRSARVLLLLVVPGHICFNFLIQLFTLTSKNNVTPHGPLFTSL 439 

Query: 422 YLFGAVLQVFTLLWIADWMVHHFWRKGKDPDSFSIPYLTALGDLLGTALLALSF 4 75 

Y+ A++QV LL++ +V W+ DPD+ IPYLTALGDLLGT LL + F 
Sbjct: 440 YMIAAIIQ'VVILLFVCQLLVALLWKWKIDPDNSVIPYLTALGDLLGTGLLFIVF 4 93 


Pedant information for DKFZphtes3_27k4, frame 1 
Report for DKFZphtes3 27k4.1 


t LENGTH) 
[MWJ 


490 

53266.39 
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[pll 

5.29 


[HOMOL] 

TREMBL:AF047659_9 

gene: "KOVHB.a"; Caenorhabditis elegans cosmid K07H8. 4e-94 

tPROSITE) 

LEUCINE ZIPPER I 


[PROSITE] 

MYRISTYL 7 


[ PROSITE] 

CAMP PHOSPHO SITE 

1 

tPROSITE] 

CK2 PHOSPHO SITE 

7 

[PROSITE] 

PROKAR LIPOPROTEIN 

2 

[PROSITE] 

TYR PHOSPHO SITE 

1 

[ PROSITE] 

PKC PHOSPHO SITE 

3 

[PROSITE] 

ASN GLYCOSYLATION 

1 

[KWI 

TRANSMEMBRANE 10 


CKW] 

LOW COMPLEXITY 

3.06 % 


SEQ 
SEG 
PRD 
MEM 


MEYHSFSEQSFHANNGHASSSCSQKYDDYANYNYCI5GRETSETTAMLQDEDISSDGDEDA 
cccccccceeeccccccccccccccccccceeecccccccchhhhhhhhcccccccccee 


SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEN 


IVEVTPKLPKESSGIMALQILVPFLLAGFGTVSAGMVLDIVQHWEVFRKVTEVFILVPAL 

eeeeeccccccchhlihhhhhhhhhhhlicccchhhhlihhhhcchhhhhcccceeeeeeccc 
MMMMMMMMMMMMMMMMMMMMMMMMMMMM . MMMMMMMMMMMMMMM 

LGLKGNLEMTLASRLSTAVNIGKMDSPIEKWNLIIGNLALKQVQATWGFLAAVAAIILG 

ccccchhhhhhhiihhhhhhccccccccccceeeelihhhhhhhhhhhhhhhhhhhhhhhhh 
MMMMMMM MMMMMMMMMMMMMMMMM 

WIPEGKYYLDHSILLCSSSVATAFIASLLQGIIMVGVIVGSKKTGINPDNVATPIAASFG 

hcccceeecccceeehhhhhhhhhhhhhhhhhhhhheeeecccccccccccccccccccc 
MMMMMMMMMMMMMMMMMMMMMMMMMMMMM MMMMMM 

DLITLAILAWISQGLYSCLETYYYISPLVGVFFLALTPIWIITAAKHPATRTVLHSGWEP 

cchhhhhhhhhhhhhJiiihcceeeeehhhhhhhhhhchhhlihhhJiccccccccchhhhhhh 
MMMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMM MMMMMM 

VITAMVISSIGGLILDTTVSDPNLVGIVVYTPVINGIGGNLVAIQASRISTYLHLHSIPG 

iicchhhhhhcceeeeccccccccceeeeeeceeeecccccceeeeehhhhhhhhhhcccc 
MMMMMMMMMMMMMMMM 

ELPDEPKGCYYPFRTFFGPGVNNKSAQVLLLLVIPGHLI FLYTIHLMKSGHTSLTI IFIV 

cccccccccccceeeeeccccchhhhhhhhhhccccchhhhhhhhcccccccceeeehhh 

MM^4MMMMMMMMMMM^Q1MMMMMMMMMMMM . . .MMMMMMM 

VYLFGAVLQVFTLLWIADWMVHHFWRKGKDPDSFSIPYLTALGDLLGTALLALSFHFLWL 

xxxxxxxxxxxxxxx 

]ihhhhh]ihhhhhhhhhhhhhhhhhhhhcccccccceeeeeecchhhhhhhhhhhhheeee 
MMM^ffQOlMMMMHMHMMMMMMMM MMMMMMMMMMMMMMMMMMMMMMMM 

IGDRDGDVGD 


eecccccccc 
MM 


Prosite for DKFZphtes3_27lc4 . 1 


PSOOOOl 

383->387 

PS00004 

108~>112 

PS00005 

23->26 

PS00005 

65->68 

PS00005 

221->224 

PS00006 

5->9 

PS00006 

54->58 

PS00006 

14 6->150 

PS00006 

238->242 

PS00006 

257->261 

PS00006 

296->300 

PS00006 

318->322 

PS00007 

25->33 

PS00008 

90->96 

PS00008 

122->128 

PS00008 

216->222 

PS00008 

220->226 


ASN_GLYCOSYLATI0N 

CAMP_PHOSPH0_SITE 

PKC_PH0SPH0_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2 PHOSPHO_SITE 

CK2~PH0SPH0_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PH0SPH0_SITE 

CK2_PHOSPHO_SITE 

CK2 PHOSPHORS I TE 

TYR^PHOSPHO SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 


PDOCOOOOl 
PDOC00004 

PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOCD0006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
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PS00008 254->260 MYRISTYL 

PS00008 336->342 MYRISTYL 

PS00008 339->345 MYRISTYL 


PDOC00008 
PDOC00008 
PDOC00008 


PS00013 12->23 PROKAR LIPOPROTEIN PDOC00013 

PSO0013 248->259 PROKAR_LIPOPROTEIN PDOC000i3 
PS00029 459->481 LEUCINE_ZIPPER PDOC0O029 

{No Pfara data available for DKFZphtes3_27k4 . 1) 
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DKF2phtes3_27ol4 


group: testes derived 

C55A6^^^^^"^^°^^ encodes a novel 358 amino acid protein with similarity to C. elegans cosmid 

The new protein contains a C3HC4 zinc finger (RING finger) signature. The ring finqer 
structure binds two atoms of zinc, and is involved in mediating protein-protein interactions. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

genes^'* protein can find application in studying the expression profile of testis-specific 


similarity to C. elegans C55A6,1 
complete cDNA, con^lete cds, EST hits 
Sequenced by GBF 

Locus: /map="6" 
Insert length: 2158 bp 

Poly A stretch at pos. 2137, polyadenylation signal at pos. 2120 


1 CCGAGGCCAG AGAGAAAAGA CTGCGAGGTG GCCGCAGCTG TGGCCGGAGA 
51 GCACAAAGAA TGAACCAGCA GTGGAAGAGA AAATACTGTA AGCTGGCTGA 
101 CTGCTGGTGA AGAAAATGCT TTATTTTTGT GGCAGGCATC TGTGGGATCT 
151 GTAATAGAAA TATATTGGAG TAATTCAAGA TTCTGTGGTT GGCCCTTTTG 
201 ACTGCTCTCT CTACAGGTTT AATTTGGGCA TTTACTCATT TTCATGGCTC 
251 CAAGGACCAT GTATGTGTTG GGGATCTTCA ATATTCATGT TATTTTCTCC 
301 TTTGGTCTTA TATGATTGTT ACCTTTATGA AGCTTTAGTG ATTACAAAGC 
351 ACTTTTTTTG TCCATTTTTA CCTGAGCTTT GTAAACTCTG ATTTGCAGGA 
401 TGGCTGGCTG TGGTGAAATT GATCATTCAA TAAACATGCT TCCTACAAAC 
451 AGGAAAGCGA ACGAGTCCTG TTCTAATACT GCACCTTCTT TAACCGTCCC 
501 TGAATGTGCC ATTTGTCTGC AAACATGTGT TCATCCAGTC AGTCTGCCCT 
551 GTAAGCACGT TTTCTGCTAT CTATGTGTAA AAGGAGCTTC ATGGCTTGGA 
601 AAGCGGTGTG CTCTTTGTCG ACAAGAAATT CCCGAGGATT TCCTTGACAA 
651 GCCAACCTTG TTGTCACCAG AAGAACTCAA GGCAGCAAGT AGAGGAAATG 
701 GTGAATATGC ATGGTATTAT GAAGGAAGAA ATGGGTGGTG GCAGTACGAT 
751 GAGCGCACTA GTAGAGAGCT GGAAGATGCT TTTTCCAAAG GTAAAAAGAA 
801 CACTGAAATG TTAATTGCTG GCTTTCTGTA TGTCGCTGAT CTTGAAAACA 
851 TGGTTCAATA TAGGAGAAAT GAACATGGAC GTCGCAGGAA GATTAAGCGA 
901 GATATAATAG ATATACCAAA GAAGGGAGTA GCTGGACTTA GGCTAGACTG 
951 TGATGCTAAT ACCGTAAACC TAGCAAGAGA GAGCTCTGCT GACGGAGCGG 
1001 ACAGTGTATC AGCACAGAGT GGAGCTTCTG TTCAGCCCCT AGTGTCTTCT 
1051 GTAAGGCCCC TAACATCAGT AGATGGTCAG TTAACAAGCC CTGCAACACC 
1101 ATCCCCTGAT GCAAGCACTT CTCTGGAAGA CTCTTTTGCT CATTTACAAC 
1151 TCAGTGGAGA CAACACAGCT GAAAGGAGTC ATAGGGGAGA AGGAGAAGAA 
1201 GATCATGAAT CACCATCTTC AGGCAGGGTA CCAGCACCAG ACACCTCCAT 
1251 TGAAGAAACT GAATCAGATG CCAGTAGTGA TAGTGAGGAT GTATCTGCAG 
1301 TTGTTGCACA GCACTCCTTG ACCCAACAGA GACTTTTGGT TTCTAATGCA 
1351 AACCAGACAG TACCCGATCG ATCAGATCGA TCGGGAACTG ATCGATCAGT 
1401 AGCAGGGGGT GGAACAGTGA GTGTCAGTGT CAGATCTAGA AGGCCTGATG 
1451 GACAGTGCAC AGTAACTGAA GTTTAAATAA AAATGTCTTC AGCTCCATGC 
1501 TCAAGGTTGA AAGGGTTACC TGTAAATTTC TGCCCACATA ACATTATACT 
1551 CATCCCTAGT AGTGCATTTT GGGAGTTGGG GTGGGAAGGG GTATGGGAAG 
1601 GATAGACTCA TAATTAAAAT GTCTAACATG TCTCTGTTGA GAAATTTATT 
1651 TAATGTAAGG AACTTGGGTG TTAATAGTTG AGAGCTGTTT AGTAATAACC 
1701 CAGTTTTCTT GAGGTCTGTT TACTTTATAC TTTTTAAAAA CTTCTGTAGT 
1751 TCTTTTGGCC AGTGTGTTTG TATTATCTGT GCATTAATGG TCCTCATCTG 
1801 ACTCCTGCAT TGTGTCTTAT TTTTCTGCAT GGATTGGCAT AAGACCATTA 
1851 CTAAAATTTG GCACCTGTGA GATGTTTGAT ATTATGAACA GGAAACATAA 
1901 TTTAATGTAT GAATAGATGT GAATTTGGGA TTTCAAAATA GATGAATAAC 
1951 AACTATTTTA TAGTAAAGTT ATTGAAATGG AAATGAAAAC AGCCAGTAAC 
2001 TTATGTTTCA GAATGTTTGT AACACACTTC ATGGTGTTCC CATAGGCTTT 
2051 GCTGTCTAGT CTTATAGTTT GAGGTTTTTT TGGTCTGCAT TTTTCTTTTT 
2101 GATTACAAAA TTTATAATTT AATAAATACT AGAGTTTATC AAAAAAAAAA 
2151 AAAAAAAG 


BLAST Results 


Entry HSG117 from database EMBL: 
human STS SHGC-36270. 
Score = 1148, P = 8.9e-45, identities = 240/250 
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Medline entries 


No Medline entry 


Peptide information for frame 1 


ORF from 400 bp to 1473 bp; peptide length: 358 
Category: similarity to unknown protein 
Prosite motifs: ZINC FINGER C3HC4 (51-61) 


1 MAGCGEIDHS INMLPTNRKA NESCSNTAPS LTVPECAICL QTCVHPVSLP 
51 CKHVFCYLCV KGASWLGKRC ALCRQEIPED FLDKPTLLSP EELKAASRGN 
101 GEYAWYYEGR NGWWQYDERT SRELEDAFSK GKKNTEMLIA GFLYVADLEN 
151 MVQYRRNEHG RRRKIKRDII DIPKKGVAGL RLDCDANTVN LARESSADGA 
201 DSVSAQSGAS VQPLVSSVRP LTSVDGQLTS PATPSPDAST SLEDSFAHLQ 
251 LSGDNTAERS HRGEGEEDHE SPSSGRVPAP DTSIEETESD ASSDSEDVSA 
301 VVAQHSLTQQ RLLVSNANQT VPDRSDRSGT DRSVAGGGTV SVSVRSRRPD 
351 GQCTVTEV 

BLASrP hits 

No BLASTP hits available 

Alert BLASTP hits for DKrZphtes3_27ol4, frame 1 

TREMBL:CEC55A6__1 gene: "CSSAG.l"; Caenorhabditis elegans cosmid C55A6, 
N = 2, Score - 165, P = 4.2e-15 

SWISSPROT:YW26_CAEEL HYPOTHETICAL 39.3 KD PROTEIN C02B8.6 IN CHRC»!OS0ME 
X., N = 2, Score = 136, P = 3.1e-ll 


>TREMBL:CEC55A6 1 gene: "C55A6.1"; Caenorhabditis elegans cosmid C55A6 
Length = 484 

HSPs: 

Score » 165 (24.8 bits), Expect « 4.2e-15, Sum P(2) « 4.2e-15 
Identities » 42/106 (39%), Positives = 61/106 (57%) 


Query: 75 QEIPEDFLDKPTLLSPEELKAASRGNGEYAWYYEGRN-GW'JQYDERTSRELEDAFSKGKK 133 

Q +P LD ++ PEE K Y W Y G+N GWW+++ R RE+E+A++ GK 

Sbjct: 93 QNVPALDLDA-SICDPEERK Y-WIYSGKNQGWWRFEPRNEREIEEAYNAGKC 142 

Query: 134 NTEMLIAGFLYVADLENMVQYRRNEHGRRRKIKR DIID-IPKKGVAGL 180 

+ E++I G YV D +QY R + R +KR DDI KG+AG+ 
Sbjct: 143 HCEWICGRPYVIDFHQFLQYPRGVPNQARHVKRVSADDFDGIGVKGLAGI 193 

Score » 96 (14.4 bits). Expect = 4.2e-15, Sum P(2) = 4.2e-15 
Identities = 19/54 (35%), Positives - 30/54 (55%) 

Query: 35 ECAICLQTCVHPVSLP-CKHVFCYLCVKGASW — LGKRCALCRQEIPEDFLDKPT 86 

EC IC + P ++P C H FC++C+KG +G C +CR I + +P+ 

Sbjct: 11 ECPICQCKMIVPTTIPACGHKFCFICLKGVYMNDMGG-CPMCRGPIDSNIFAQPS 64 

Pedant information for DKFZphtes3_27ol4, frame 1 


Report for DKFZphtes3_27ol4 . 1 

[LENGTH] 358 

(MW] 38818.90 

[pl] 5.17 

[HOMOD TREMBL:CEC55A6_1 gene: "C55A6.1-; Caenorhabditis elegans cosmid C55A6 2e-12 

{FUNCATl 11.04 dna repair (direct repair, base excision repair and nucleotide excision 

repair) (S. cerevisiae, YCR066wl 3e-04 

(FUNCAT) 03.19 recombination and dna repair {S. cerevisiae, YCR066w] 3e-04 

[FUNCATJ 30.10 nuclear organization (S. cerevisiae, YCR066w] 3e-04 
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[ FUNCAT] 

palmitylation/ 

[ FUNCAT] 

[FUNCAT] 

[BLOCKS] 

tPROSITE] 

[PROSITEJ 

(PROSITE) 

(PROSITE] 

tPROSITE) 

tPROSITE) 

tPROSITE] 

[PROSITE]. 

tPFAM) 

IKW] 

IKW] 

IKW] 


06.07 protein modification (glycolsylation, acylation, myristylation, 

farnesyiation and processing) [s. cerevisiae, yCR066w] 3e-04 ' 

06.10 assembly of protein complexes [S. cerevisiae, YDR265w) 4e-04 

30.19 peroxisomal organization (S. cerevisiae, YDR265w] 4e-04 

BL00518 Zinc finger, C3HC4 type, proteins 

MYRISTYL 2 

AMIDATION 3 

CAMP_PHOSPHO_SITE I 

CK2 PHOSPHO SITE 12 

TYR~PHOSPHO^SITE 1 

2INC_FINGER_C3HC4 1 

PKCPHOSPHOSITE 9 

ASN^GLYCOSYLATION 2 

Zinc finger, C3HC4 type (RING finger) 

Irregular 

3D 

LOW COMPLEXITY 19.83 % 


SEQ 
SEG 
irmd- 

SEQ 
SEG 
Irmd- 

SEQ 
SEG 
Irmd- 

SEQ 
SEG 
Irmd- 

SEQ 
SEG 
Irmd- 

SEQ 
SEG 
Irmd- 


MAGCGEIDHSINMLPTNRKANESCSNTAPSLTVPECAICLQTCVHPVSLPCKHVFCYLCV 

TTTTTEETTTEEEETTTEEEEHHHH 

KGASWLGKRCALCRQEIPEDFLDKPTLLSPEELKAASRGNGEYAWYYEGRNGWWQYDERT 

HHHHHHccBTTTTTCBCGGG-cBcc 

SRELEDAFSKGKKNTEMLIAGFLYVADLENMVQYRRNEHGRRRKIKRDIIOIPKKGVAGL 


. xxxxxxxxxxxxxxx . 


RLDCDANTVNLARESSADGADSVSAQSGASVQPLVSSVRPLTSVDGQLTSPATPSPDAST 
xxxxxxxxxxxx 

SLEDSFAHLQLSGDNTAERSHRGEGEEDHESPSSGRVPAPDTSIEETESDASSDSEDVSA 
^ - xxxxxxxxxxxxxxxxxxxx 

VVAQHSLTQQRLLVSNANQTVPDRSDRSGTDRSVAGGGTVSVSVRSRRPDGQCTVTEV 
xxxxxxxxxxxxxxxxxxxx 


Prosite for DKFZphtes3_27ol4 . 1 


PSOOOOl 
PSOOOOl 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00007 
PS00008 
PS00008 
PS00009 
PS00009 
PS00009 
PS00518 


21->25 
318->322 
132->136 

16->19 
120->123 
217->220 
260->263 
274~>277 
325->328 
330->333 
343->346 
346->349 

32->36 

89->93 
120->124 
195->199 
222->226 
240->244 
282->286 
287->291 
293->297 
320->324 

328- >332 
354->358 

98->107 

329- >335 
337->343 

66->70 
130->134 
159->163 

51->61 


ASNGLYCOSYLATION 

AS N_GL YC OS YLAT I ON 

CAMP_PH0SPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC_PHOS PHO~SITE 

PKC_PHOSPHO_SITE 

PKCPHOSPHO^SITE 

PKC PHOSPHORS ITE 

PKC"PHOSPH0 SITE 

PKC_PHOS PHO^S ITE 

PKCPHOSPHOSITE 

CK2_PHOS PHO_S I TE 

CK2_PH0SPHO_SITE 

C K2_PHOS PHO_S I TE 

CK2_PH0SPH0_SITE 

CK2_PH0SPHO_SITE 

CK2 PH0SPH0_SITE 

CK2~PHOSPHO_SITE 

CK2~PH0SPH0_SITE 

CK2 PHOSPHORS ITE 

C K2~PH0S PHO_S I TE 

CK2'"PH0SPH0_SITE 

CK2~PH0SPHO_SITE 

TYR^PHOSPHO SITE 

MYRISTYL 

MYRISTYL 

AMIDATION 

AMIDATION 

AMIDATION 

ZINC FINGER C3HC4 


PDOCOOOOl 
PDOCOOOOl 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC0O005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00009 
PDOC00009 
PDOC00009 
PDOC00449 
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Pfam for DKFZphtes3_2'7ol4 . 1 
HMM__NAME Zinc finger, C3HC4 type (RING finger) 

HMM *CPICFcTFQlDyPWPFdePmMlPCgHsFCypCTrrW CPmC* 

C+IC L + P++LPC+H+FCY C++ C +C 

Query 36 CMC LQT CVHPVSLPCKHVFCYLCVKGASWLGKRCALC 73 
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DKFZphtes3 28dl4 


group: testes derived 

DKFZphtes3_2edl4 encodes a novel 97 amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfani or SCOP motife. 

^nes*" P"'*^" ^""^ application in studying the expression profile of testis-specific 

un known 

complete cdna, complete cds, EST hits 

Sequenced by GBF 

Locus : unknown 

Insert length: 1279 bp 

Poly A stretch at pos. 1232, no polyadenylation signal found 

1 GGAGCTCAGA AGTTGGGCAA AGGTCACAGC AGACTTCCTG AAAAGCAGAC 
51 ACTGAGGAAC ACAGTGGAGA GCGGGAGTTC ACAGCGACGC AGCTGAGGAC 

101 GACGCAGGAC CTCTCCCAAA GGTGCTGCAG CTCCAGCACC AGGGGCCAGG 

151 GCTGCGGCGA CAGCAGCTCA GCAACCCTTG CTGTGCTCAA GTTCTTGGGG 

201 ATTCAGAGCT AAGTTCAAAA TTTAGAAACA GTGCCTTAAA GACGGGCAAG 

251 AAAACCCGGT GTGGGAGTCT GCTCATCTAT GGTTTGTTAC TGCTCTCGCT 

301 TTGATATTCT TAAATTCCTA GGTACCAATG AAAAAGCCAA GTGAACGTGG 

351 CAGAGTGAGG AGGAGACAGG AGCGTGTGCA CCTTCCATCT GTGAGAGGCA 

401 CACTTCAGTC TGGGTTCAAG ATGCAGAATG GTGCCTACAG CAAAAAAAAA 

451 AAAAACACCC TCCTCCCTTC TTTACCATTT GAATGGACAT TTTCCTTACC 

501 TGTGATCCCA ACAGAAACAG ATCCAGACCT ATCATGTGAA GTCCACGTTC 

551 CAGGATCAGA AGTAACCAGT TTATGGACTG AGCTTACACG GGAAAGTCTA 

601 CCCCCGACTC CTTCTGGATA GTAACATACA CAGCTGCATA AAAACGTCTC 

651 CAAGGGGACA TACGATGCAT TTGCTTGGTG TCCCAGCCAA GCTCCCCACC 

701 GGCGACCTCA CTGTTCCTTA GAGCTCGAGA GCTCGTCTCC TATCAATCAG 

751 AGAACCCCAT CAGCTGTGAC CAACAGAGCT GGAGCCCTCT GTGGAGGGAG 

801 CTGACCCCAC ACACAGGACA GAGCAGAATC CTGATTATTT TACAAACTGC 

851 AAACCTTCTG AGTAAGAAGA CAAAAATATA CATTCCAAGG TATCTGTAAA 

901 GTGCTTGGAA GATGCAGACA GCTGCACCGA GGGGCTCTGA TCCATCCACA 

951 CGCTGCGCTT TGCTGCGGTC ACACACACGG TCTCAGTCAC GTGATGGTTT 
1001 TGCTTTTATT TCTTAAACGG CTGAGTGATA ATCCAGCTAG TGTGCAGTCA 
1051 TTTCATACCT TTCAATGGGC GTCACCGCAG TGACGCTGCC CCAGCCCCAT 
1101 GCTGAGGGCC GACACAATTC ACGGAACAGA TTCATCATAT TTGGTCTTTA 
1151 TGTAAATAAT AAATGTTTTA AAATTGCCTA AATATAAAAA AAAAAAAAAA 
1201 AAAAAAAAAA AAAAAAAAAA AAAGGGCGGC CGAAAAAAAA AAAAAAAAAA 
1251 AAAAAAAAAA AAAAAAAAAA GGGCGGCCG 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 1 


ORF from 328 bp to 618 bp; peptide length: 97 
Category: putative protein 


1 MKKPSERGRV RRRQERVHLP SVRGTLQSGF KMQNGAYSKK KKNTLLPSLP 
51 FEWTFSLPVX PTETDPDLSC EVHVPGSEVT SLWTELTRES LPPTPSG 


BLASTP hits 

No BLASTP hits available 
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Alert BLASTP hits for DKFZphtes3_28di4, frame 1 
No Alert BLASTP hits found 

Pedant information for DKF2phtes3_28dl4, frame 1 


Report for DKF2phtes3_28dl4 .1 


[ LENGTH! 97 

IMWJ 10945.56 

IpU 9.80 

(PROSITEJ MYRISTYL 2 

fPROSITEl CAMP_PH0SPHO_SITE 2 

[PROSITE] CK2_PH0SPH0_SITE 2 

(PROSITEJ PKC_PHOSPHO_SITE 3 

(KWJ All_Alpha 

[KWJ LOW_COMPLEXITy 12.37 % 

SEQ MKKPSERGRVRRRQERVHLPSVRGTLQSGFKMQNGAYSKKKKNTLLPSLPFEWTFSLPVI 
SEG xxxxxxxxxxxx 


PRO cccccchhhhhhhhhhhccccccccccccccccccccccccccccccccccccccccccc 

SEQ PTETDPDLSCEVHVPGSEVTSLWTELTRESLPPTPSG 

SEG 

PRD ccccccccceeeecccccchhhhhhhhhhcccccccc 


Prosite for DKFZphtes3_28dl4 . 1 


PS00004 2->6 CAMPPHOSPHOSITE PDOC00004 

PS00004 41->45 CAMP_PH0SPHO_SITE PDOC00004 

PS00005 5->8 PKC_PHOSPHO_SITE PDOC00005 

PS00005 21->24 PKC_PHOSPHO_SITE PDOC00005 

PS00005 38->41 PKC PHOSPHO_SITE POOCOOOOS 

PS00006 62->66 CK2~PH0SPH0_SITE PDOC00006 

PS00005 64->68 CK2_PH0SPH0_SITE PDOC00006 

PS00008 24->30 MYRISTYL PDOC00008 

PSO0O08 76->82 MYRISTYL PDOC00008 


(No Pfam data available for DKFZphtes3_28dl4 .1) 
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DKFZphtes3_2all 


group: testes derived 

DKFZphtes3_2all encodes a novel 1048 amino acid protein with very weak similarity to mucins. 

NO informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

genes^" ^"^"^^^^ ""^^ ^^''^ application in studying the expression profile of testis-specific 

similarity to mucin 

complete cDNA, complete cds, EST hits 

Sequenced by EMBL 

Locus : unknown 

Insert length: 4082 bp 

Poly A stretch at pos. 4060, polyadenylation signal at pos. 4034 

1 GAGGACTGCG AGCACAGCGG CGGCCGGGTG GCGGGGGTGA GTGGGGCCAG 
51 CGGGGCTGGA CAGCAGCGGG CCCCGGGCGC CGCCGCCGCG ATCCCTCCCC 
101 GCGCCCGCCG AGCACATCGC CGCCGCCGAG ATGGGCCCTC CGCGGCACCC 
151 CCAGGCCGGC GAGATAGAAG CGGGCGGTGC GGGCGGCGGG CGGCGGCTAC 
201 AGGTGGAAAT GAGTTCTCAA CAGTTTCCTC GGTTAGGAGC CCCTTCTACC 
251 GGGCTGAGCC AGGCCCCTTC TCAGATTGCA AACAGTGGTT CTGCTGGATT 
301 GATAAACCCA GCTGCTACAG TCAATGATGA ATCTGGTCGA GATTCTGAAG 
351 TCAGTGCCAG GGAGCACATG AGTTCCAGCA GCTCCCTCCA GTCCCGGGAG 
401 GAGAAGCAAG AGCCTGTTGT GGTAAGGCCC TATCCACAGG TGCAGATGTT 
451 GTCGACACAC CATGCTGTCG CATCAGCCAC ACCTGTTGCA GTGACAGCCC 
501 CGCCAGCACA CCTGACGCCA GCAGTGCCAC TTTCATTTTC GGAGGGACTT 
551 ATGAAGCCGC CCCCGAAGCC CACCATGCCT AGCCGTCCCA TTGCTCCTGC 
601 TCCACCTTCT ACCCTGTCAC TTCCCCCCAA GGTTCCAGGG CAGGTTACCG 
651 TTACCATGGA GAGTAGCATC CCTCAAGCTT CAGCCATTCC TGTGGCAACA 
701 ATCAGTGGAC AACAGGGCCA TCCCAGTAAC CTGCATCACA TCATGACTAC 
751 AAATGTGCAA ATGTCTATCA TCCGCAGCAA TGCTCCTGGG CCCCCTCTTC 
801 ACATTGGAGC TTCTCATTTA CCTCGAGGTG CAGCTGCTGC TGCTGTGATG 
851 TCCAGTTCTA AAGTAACCAC AGTCCTGAGG CCGACCTCAC AGCTGCCAAA 
901 TGCTGCTACT GCTCAGCCAG CAGTACAGCA CATCATTCAC CAACCAATCC 
951 AGTCTCGGCC ACCTGTGACC ACCTCCAATG CCATCCCTCC TGCTGTGGTA 
1001 GCAACTGTCT CAGCCACCAG AGCTCAGTCT CCAGTCATCA CTACGACAGC 
1051 GGCGCATGCT ACTGATTCAG CACTTAGTAG GCCAACCTTG TCTATCCAGC 
1101 ATCCTCCATC TGCAGCAATC AGTATTCAGC GTCCTGCCCA GTCACGAGAT 
1151 GTCACAACAA GAATCACACT ACCATCTCAC CCTGCATTAG GGACGCCAAA 
1201 ACAGCAGCTT CATACAATGG CTCAGAAAAC AATCTTCAGT ACTGGCACGC 
1251 CAGTGGCTGC AGCCACAGTA GCACCTATTT TGGCAACCAA CACCATTCCT 
1301 TCAGCGACCA CAGCTGGATC TGTGTCACAC ACGCAAGCTC CCACAAGTAC 
1351 CATTGTTACC ATGACAGTAC CCTCCCATTC CTCCCATGCT ACTGCTGTGA 
1401 CCACCTCAAA CATCCCAGTC GCCAAGGTGG TGCCCCAGCA GATCACGCAC 
1451 ACTTCTCCTC GGATCCAGCC AGACTACCCT GCCGAGAGGA GTAGCCTGAT 
1501 TCCCATCTCC GGACATCGGG CCTCTCCCAA TCCTGTGGCC ATGGAAACCC 
1551 GAAGTGACAA CAGACCGTCT GTTCCCGTTC AGTTCCAATA TTTTTTGCCA 
1601 ACTTACCCCC CTTCTGCATA CCCACTGGCG GCACATACCT ACACCCCAAT 
1651 CACCAGTTCC GTGTCCACTA TCCGACAGTA TCCAGTTTCA GCTCAGGCTC 
1701 CAAACTCTGC CATCACAGCT CAGACTGGTG TTGGGGTAGC GTCTACCGTC 
1751 CACCTAAACC CCATGCAGTT GATGACAGTG GATGCATCGC ATGCTCGACA 
1801 TATTCAAGGG ATCCAGCCAG CACCCATCAG TACCCAGGGT ATCCAGCCGG 
1851 CCCCCATTGG GACCCCAGGG ATACAGCCTG CACCACTTGG CACACAGGGA 
1901 ATTCACTCAG CAACCCCAAT CAACACACAA GGGCTTCAGC CTGCACCTAT 
1951 GGGTACTCAG CAGCCTCAGC CTGAAGGAAA GACTTCAGCA GTGGTGTTGG 
2001 CAGATGGAGC CACAATTGTG GCCAACCCTA TTAGCAATCC ATTCAGTGCT 
2051 GCTCCAGCAG CAACAACCGT GGTGCAGACC CACAGCCAGA GTGCTAGCAC 
2101 CAACGCTCCC GCCCAGGGCT CATCGCCACG GCCAAGCATA CTCCGGAAGA 
2151 AACCTGCCAC AGATGGTGCC AAACCCAAGT CTGAAATCCA CGTGTCTATG 
2201 GCCACTCCGG TCACTGTGTC CATGGAGACT GTATCCAATC AAAATAATGA 
2251 TCAGCCTACC ATTGCCGTCC CTCCAACTGC CCAGCAGCCC CCACCGACCA 
2301 TTCCAACTAT GATTGCAGCA GCCAGTCCCC CGTCACAACC AGCCGTTGCC 
2351 CTTTCAACCA TTCCTGGAGC GGTCCCCATC ACTCCACCCA TCACCACCAT 
2401 TGCAGCTGCA CCACCTCCAT CAGTCACTGT GGGTGGCAGT CTTTCCTCCG 
2451 TCTTGGGCCC TCCCGTTCCT GAAATTAAAG TGAAAGAAGA AGTAGAACCA 
2501 ATGGATATCA TGAGGCCAGT TTCTGCAGTT CCTCCACTGG CTACCAACAC 
2551 TGTGTCTCCA TCTCTTGCAT TGCTGGCAAA CAACTTGTCC ATGCCTACAA 
2601 GTGACCTACC ACCTGGTGCC TCCCCAAGGA AAAAGCCTCG AAAGCAACAG 
2651 CATGTGATCT CAACAGAAGA AGGTGACATG ATGGAGACAA ACAGCACTGA 
2701 TGATGAGAAG TCCACTGCCA AGAGTCTTCT GGTGAAGGCT GAGAAGCGCA 
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2751 AGTCTCCTCC CAAGGAGTAT ATTGATGAGG AAGGTGTGAG ATATGTCCCA 
2801 GTGCGTCCAA GACCCCCCAT TACTTTGCTT CGTCACTATC GGAACCCCTG 
2851 GAAAGCTGCT TACCACCACT TTCAGAGGTA CAGTGACGTC CGGGTCAAAG 
2901 AGGAGAAGAA AGCTATGCTG CAGGAAATAG CTAATCAGAA AGGAGTATCC 
2951 TGTCGTGCTC AAGGCTGGAA AGTCCACCTC TGTGCTGCCC AGTTACTACA 
3001 GCTGACGAAT CTAGAACATG ATGTCTATGA AAGACTTACT AACCTGCAGG 
3051 AAGGGATTAT CCCAAAGAAA AAAGCAGCAA CAGATGATGA TCTCCACCGA 
3101 ATAAACGAAC TGATACAGGG AAATATGCAG AGGTGTAAAC TTGTGATGGA 
3151 TCAAATCAGT GAAGCCAGAG ACTCCATGCT TAAGGTTTTA GATCATAAAG 
3201 ACCGTGTCCT GAAGCTGCTT AACAAGAACG GGACTGTCAA AAAAGTGTCC 
3251 AAATTGAAGC GAAAGGAAAA AGTCTAGACC CAGAACAATC AGGAGATTGG 
3301 AAGCAAATTT ATGAAGAATG ATGGTGGGGG TGGGGGGAGG GTTTTGGTTT 
3351 TTTCCAAAGT GGAACATTGA AATAAAGGAA GTGTTCCTTA GTTCCCGTGT 
3401 GAAAGCAGAG GAACCCATGA CATCCAAGGG CGTGAAAGGA TCAGAGCTGA 
3451 CTGGACATAG TGAGCTGCCT TCTTGCGTTC GGGTGCACCC CTGTTAAACC 
3501 TGATCTGTGT CATAAGTGAC TCCGGATGCA TCAGTGTCCA CCAGTTGGAA 
3551 GCAATGACAA GGATGGCTGG CTGGTGTTTT TCAGCCTTCC GGTTTATAGA 
3 601 CTGTATTTAT CTAGTGGATT CCTGCAGGCC CCATACTGAG CCTGGACTGA 
3651 AAGTATCCAC TCGGACCATC TGTTATCTCT CTACACTGAA AATAAAACCT 
3701 CTTCCACCCA CCCCATTCGG TTCTTCTGCC TGACCTTCAA ATGCCCATGT 
3751 TGGCCTTTTA CAGCAGTGCC ACGGCACCAA GCGAGCTGCC ACATCTCACA 
3801 CTCTAAAGGG TTTGAACTAT TAGTTCTTGT CATTTTTTAA AAAAAACCAT 
3851 TCCCAAGTGA AATTGTTATA TCGTCTGTCT TGCGTGTGTC AGAACTGGGT 
3901 TTTTGTGGAG GTTCAGAGCA GGCAACACCA TAAGTTGCTC TCAGATCCTT 
3951 GTTCTGAAGT ACATTCTTGG TTATCTGTAC TTCTGTAGCT GGTGTGATGC 
4001 TGTTAATTGT ATGTACCACA CATCTCCAGA CGTTAATAAA GGACTCAAAG 
4051 AGGTTTTTGT AAAAAAAAAA AAAAAAAAAA AA 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 2 


ORF from 131 bp to 3274 bp; peptide length: 1048 
Category: similarity to known protein 


1 MGPPRHPQAG EIEAGGAGGG RRLQVEMSSQ QFPRLGAPST GLSQAPSQIA 
51 NSGSAGLINP AATVNDESGR DSEVSAREHM SSSSSLQSRE EKQEPWVRP 
101 YPQVQMLSTH HAVASATPVA VTAPPAHLTP AVPLSFSEGL MKPPPKPTMP 
151 SRPIAPAPPS TLSLPPKVPG QVTVTMESSI PQASAIPVAT ISGQQGHPSN 
201 LHHIMTTNVQ MSIIRSNAPG PPLHIGASHL PRGAAAAAVM SSSKVTTVLR 
251 PTSQLPNAAT AQPAVQHIIH QPIQSRPPVT TSNAIPPAVV ATVSATRAQS 
301 PVITTTAAHA TDSALSRPTL SIQHPPSAAI SIQRPAQSRD VTTRITLPSH 
351 PALGTPKCML HTMAQKTIFS TGTPVAAATV APILATNTIP SATTAGSVSH 
401 TQAPTSTIVT MTVPSHSSHA TAVTTSNIPV AKVVPQQITH TSPRIQPDYP 
451 AERSSLIPIS GHRASPNPVA METRSDNRPS VPVQFQYFLP TYPPSAYPLA 
501 AHTYTPITSS VSTIRQYPVS AQAPNSAITA QTGVGVASTV HLNPMQLMTV 
551 DASHARHIQG IQPAPISTQG IQPAPIGTPG IQPAPLGTQG IHSATPINTQ 
601 GLQPAPMGTQ QPQPEGKTSA WLADGATIV ANPISNPFSA APAATTWQT 
651 HSQSASTNAP AQGSSPRPSI LRKKPATDGA KPKSEIHVSM ATPVTVSMET 
701 VSNQNNDQPT lAVPPTAQQP PPTIPTMIAA ASPPSQPAVA LSTIPGAVPI 
751 TPPITTIAAA PPPSVTVGGS LSSVLGPPVP EIKVKEEVEP MDIMRPVSAV 
801 PPLATNTVSP SLALLANNLS MPTSDLPPGA SPRKKPRKQQ HVISTEEGDM 
851 METNSTDDEK STAKSLLVKA EKRKSPPKEY IDEEGVRYVP VRPRPPITLL 
901 RHYRNPWKAA YHHFQRYSDV RVKEEKKAML QEIANQKGVS CRAQGWKVHL 
951 CAAQLLQLTN LEHDVYERLT NLQEGIIPKK KAATDDDLHR INELIQGNMQ 
1001 RCKLVMDQIS EARDSMLKVL DHKDRVLKLL NKMGTVKKVS KLKRKEKV 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2all, frame 2 

SWISSPR0T:MUC2_HUMAN MUCIN 2 PRECURSOR (INTESTINAL MUCIN 2)., N 
Score « 334, P « 2,4e-25 
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PIR:A43932 mucin 2 precursor, intestinal - human (fragments), N = 1 
Score = 321, P = 3.2e-24 

TREMBL: 08 84 40^1 product: "high molecular mass nuclear antigen"; Gallus 
gallus mRNA for high molecular mass nuclear antigen, partial cds N = 
1, Score - 312, p = 8.3e-24 

PIR:S48478 glucan 1, 4-alpha-glucosidase (EC 3.2.1.3) - yeast 
(Saccharomyces cerevisiae) , N = 1, Score = 300, P = 2.1e-22 


>SWISSPR0T:MUC2_HUMAN MUCIN 2 PRECURSOR (INTESTINAL MOCIK 2) 
Length = 5,179 

HSPs: 

Score ^ 334 (50.1 bits), Expect = 2.4e-25, P - 2.4e-25 
Identities = 184/770 (23%), Positives = 263/770 (34%) 

Query: 96 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

VPP T++TVTP TP ++PPPT P 

Sbjct: 3471 VTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPT 3530 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P+TPPGTT -+PT+GQP+ TTV + 

SbjCt: 3531 TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT-PTTTPXTTTTTVTPT 3589 

Query: 213 URSNAPGP-— PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P +++ +TT T T P I 

Sbjct: 3590 PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 3649 

Query: 269 ihqpiqsrppvttsnaippavvatvsatraqspvitttaahatdsalsrptlsiqhppsa 328 

+ PT P T + T+PTT T + T++ P 
Sb3ct: 3650 TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT 3706 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT— VAPILA 385 

Q P + TT P+ GT + T + T TP T PI 

Sbjct: 3707 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 3766 

Query: 386 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKWPQQITHTSP 44 3 

T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 

Sbjct: 3767 TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 3825 

Query: 444 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

^ P ++ + +P p +T + + p+ + PT P+ 

Sbjct: 3826 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG—TQTP 3874 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ + T PQP+ITTV T QT 
Sbjct: 3875 TTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP—TGTQTPTTTPITTTTTVT 3932 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ- 613 

P P TQ PI T P p GTQ + TPI T P P GTQ P 

Sbjct: 3933 PTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTPTT 3991 

Query: 614 -PEGKTSAWLADGATIVANPISNPFSAAPAAT-TWQTHSQSASTNAPAQGSSPRPSIL 671 

P T+ V T P + P + T T T +Q+ +T ++ P+ 

Sbjct: 3992 TPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPT 4051 

Query: 672 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP— PTAQQPPPTIPTMI 728 

T P+ TP +T+ TPPTQPTP 

Sbjct: 4052 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 4111 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPS VTVGGSLSSVLGP-PVPEI 782 

P+ T P PIT TT+ P P+ T + ++ + P p p 

Sbjct; 4112 TTTVTPTPTPTGTQT-PTTTPITTT-TTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTG 4169 

Query: 783 kvkeevepmdimrpvsavp-platntvspslallannlsmptsdlppgasprkkprkooh 841 

P+ V+ P P T T P+ A + TS+ PP +S + R 

Sbjct: 4170 TQTPTTTPITTTTTVTPTPTPTGTQTGPPTHTSTAPIAELTTSNPPPESSTPQTSRSTSS 4229 

Query: 842 VISTEEGDMMET 853 

+ TE T 
Sbjct: 4230 PL-TESTTLLST 4240 

Score = 328 (49.2 bits). Expect = l.Oe-24, P = l.Oe-24 
Identities = 180/745 (24%), Positives =» 254/745 (34%) 

Query: 96 WVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPl 154 

VPP T+ + TVTP TP ++PPPT P 

Sbjct: 3540 VTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPT 3599 
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Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSI PQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TTV + 

Sbjct: 3600 TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT-PTTTPITTTTTVTPT 3658 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHl 268 

+ P P+ + P +++ +TT T T P I 

Sbjct: 3659 PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 3718 

Query: 269 IHQPIQSRPPVTTSNAIPPAVVATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT P T + T+PTT T + T++ P 
Sbjct: 3719 TTTTTVTPTPTPTGTOTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT 3775 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT — VAPILA 385 

Q P + TT P+ GT + T + T TP T PI 

Sbjct: 3776 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 3835 

Query: 386 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKWPQQITHTSP 443 

T T+ P+ T G+ + T P +T T+T P4- + T TT VP T T 

Sbjct: 3836 TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 3894 

Query: 444 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ P ++ + +P P +T + + P+ + PT P+ 

Sbjct: 3895 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG — TQTP 3943 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ + T PQP+ITTV T QT 
Sbjct: 3944 TTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP — TGTQTPTTTPITTTTTVT 4001 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ- 613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 4002 PTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTPTT 4060 

Query; 614 -PEGKTSAWLADGATIVANPISNPFSAAPAAT-TWQTHSQSASTNAPAQGSSPRPSIL 671 

PT+V T P + P+ TTT +Q4 +T ++ P+ 

Sbjct: 4061 TPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPT 4120 

Query: 672 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTIPTMI 728 

T P+ TP +T+ TPPTQPTP 

Sbjct: 4121 PTGTQTPTTTPITTTTTVTPTPTE»TGTQTPTTTPirTTTTVTPTPTPTGTQTPTTTPITT 4180 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAA-PPPSVTVGGSLSSVLGPPVPEIKVKEE 787 

P+ TP T PI + + PPP + + S P + 

Sbjct: 4181 TTTVTPTPTPTGTQTGPPTHTSTAPIAELTTSNPPPESSTPQTSRSTSSPLTESTTLLST 4240 

Query: 788 VEPMDIMRPVSAVPPLATNTVSPSLALLANNLSMP— TSDLPPGASPR 833 

+ P M S PP +T T +P+ + LS P T+ PPG R 

Sbjct: 4241 LPPAIEM— TSTAPP-STPT-APTTTSGGHTLSPPPSTTTSPPGTPTR 4284 

Score = 325 (48.8 bits). Expect = 2.2e-24, P = 2.2e-24 
Identities = 186/782 (23%), Positives = 261/782 (33%) 

Query: 96 WVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

VPP T+ + TVTP TP ++PPPT P 

Sbjct: 34 94 VTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPT 3553 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TTV + 

Sbjct: 3554 TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT-PTTTPITTTTTVTPT 3612 

Query: 213 IIRSNAPGP— PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P +++ +TT T TPI 

Sbjct: 3613 PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 3672 

Query: 269 IHQPIQSRPPVTTSNAIPPAVVATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT P T+T+PTT T + T++ P 
Sbjct: 3673 TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT 3729 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT--VAPILA 385 

Q P. + TT P+ GT + T + T TP T PI 

Sbjct: 3730 PTGTQTPTTTPXTTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 3789 

Query: 386 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSP 443 

T T+ P+ T G+ + T P +T T+T P+ + T TT V P T T 

Sbjct: 3790 TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 3B48 

Query: 444 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ P ++ + +p p +T + + P+ + PT P+ 

Sbjct: 3849 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG — TQTP 3897 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 
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T TPIT++ + T PQP+ITTV T QT 
Sbjct: 3898 TTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP — TGTQTPTTTPITTTTTVT 3955 

Query: 561 IQPAPISTQGIQPAPIGTPGI-— QPAPLGTQGIHSATPINTQGL— QPAPMGTQQPQ- 613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 3956 PTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTPTT 4014 

Query: 614 -PEGKTSAVVLADGATIVANPISNPFSAAPAAT-TVVQTHSQSASTNAPAQGSSPRPSIL 671 

PT+V T P+P+ TTT +Q+ +T ++ P+ 

Sbjct: 4015 TPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPT 4074 

Query; 672 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTIPTMI 728 

T P+ TP +7+ TPPTQPTP 

Sbjct: '4075 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 4134 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 

P+ TP PIT TT P P+ T G+ + P I V 
Sbjct: 4135 TTTVTPTPTPTGTQT-PTTTPIT TTTTVTPTPTPT — GTQT PTTTPITTTTTV 4184 

Query: 789 EPMDIMRPVSAVPPLATNTVSPSLALLANNLSMPTSDLPPGASPRKKPRKQQHVISTEEG 848 

P PP T+T +P L+N PSP+ P + + + 

Sbjct: 4185 TPTPTPTGTQTGPPTHTST-APIAELTTSN-PPPESSTPQTSRSTSSPLTESTTLLSTLP 4242 

Query: 849 DMMETNSTDDEKSTAKSLLVKAEKRKSPP 877 

+E ST + SPP 
Sb3Ct: 4243 PAIEMTSTAPPSTPTAPTTTSGGHTLSPP 4271 

Score = 324 (48,6 bits). Expect = 2.8e-24, P = 2.8e-24 

Identities = 170/717 (23%), Positives = 248/717 (34%) 

Query: 95 PVWRPYPQVQMLSTHHAVASATP—VAVTAPPAHLTPAVPLSFSEGLMKPPPKPTMPSR 152 

P P P +T + +P T PP TP+ P++ + + P P+ P 

Sbjct: 1401 PPTTTPSPPPTTTTTLPPTTTPSPPTTTTTTPPPTTTPSPPITTTTTPL-PTTTPSPPIS 1459 

Query: 153 PIAPAPPSTLSLPPKVPGQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNV(»!S 212 

PP+T PP TS +PT+ P I + 

Sbjct: 14 60 TTTTPPPTTTPSPPTTTPSPPTTTPSPPTTTTTTPPPTTTPS PPMTTPITPPASTTT 1516 

Query: 213 IIRSNAPGPPLHIGASHLPRGAAAAAVMSSSKVTTVLRPTSQ— LPNAATAQPAVQHIIH 270 

+ + PPP +P S T+ PTS LP T P 

Sbjct: 1517 LPPTTTPSPPTTTTTTPPP TTTPSPPTTTPITPPTSTTTLPPTTTPSPPPTTTTT 1571 

Query: 271 QPIQSRP-PVTTSNAIPPAVVATVSA-TRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

P + P P TT+ PP+T T SP TTT + S PT + PP++ 

Sbjct: 1572 PPPTTTPSPPTTTTPSPPTITTTTPPPTTTPSPPTTTTTTPPPTTTPSPPTTTPITPPTS 1631 

Query: 329 AISIQRPAQSRDVTTRTTLPSHPALGTPKQQtHTMAQKTIFSTGTPVAAATVAPILATNT 388 

++ T T P P TP T I +T TP T + + T 

Sbjct: 1632 TTTLPPTTTPSPPPTTTTTP—PPTTTPSPPTTTTPSPPITTTTTPPPTTTPSSPITTTP 1689 

Query; 389 IPSATTAGSVSHTQAPTSTIVTMTVPSHSSHATAV-TTSNIPVAKVVPQQITHTSPRIQP 447 

P TT + S T P+S I T T PS ++ + TT P P T T + P 

Sbjct: 1690 SPPTTTMTTPSPTTTPSSPITTTTTPSSTTTPSPPPTTMTTPSPTTTPSPPTTTMTTLPP 1749 

Query: 448 DYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPV-QFQYFLPTYPPSAY-P —LA 500 

+ + P+ P T + P VP+ + +L + P+ + P L 

Sbjct: 1750 TTTSSPLTTTPLPPSITPPTFSPFSTTTPTTPCVPLCNWTGWLDSGKPNFHKPGGDTEH 1809 

Query: 501 AHTYTPITSSVSTIR— QYP-VSAQAPNSAITAQTGVG-VASTVHLNPMQLMTVDASHAR 556 

P ++ + R YP V + VG + P ++ + A 

Sbjct: 1810 GDVCGPGWAANISCRATMYPDVPIGQLGQTVVCDVSVGLICKNEDQKPGGVIPM-AFCLN 1868 

Query: 557 HIQGIQPAPISTQGIQPAPIGTPGIQ-PAPLGTQGIHSATPINTQGLQPAPMGTQQPQ— 613 

+ +Q TQ P+T +PPTI+T+ PP GTQ P 

Sbjct: 1869 YEINVQCCECVTQ PTTMTTTTTENPTPPTTTPITTTTTVTPT PTPTGTQTPTTT 1922 

Query: 614 PEGKTSAVVLADGATIVANPISNPFSAAPAAT-TWQTHSQSASTNAPAQGSSPRPSILR 672 

PT+V T P + P+ TTT +Q+ +T ++ P+ 

Sbjct: 1923 PITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP 1982 

Query: 673 KKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTIPTMI A 729 

T P+ TP +T+ TPPTQPTP 

Sbjct: 1983 TGTQTPTTTPITTTTTVT PTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPl TTT 2042 

Query: 730 AASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEVE 789 

P+ TP PIT TT P P+ T G+ + P V 
Sbjct: 2043 TTVTPTPTPTGTQT-PTTTPIT TTTTVTPTPTPT — GTQTPTTTPITTTTTVTPTPT 2096 

Query: 790 PMDIMRPVSAVPPLATNTVSPS 811 

P P + P T TV+P+ 

Sbjct: 2097 PTGTQTPTTT-PITTTTTVTPT 2117 
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Score = 318 (47.7 bits). Expect ' 1.2e-23, P = 1.2e-23 
Identities = 174/717 (24%), Positives = 243/717 (33%) 

Query: 96 VWRPyPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T ++TVTP TP + +PPPT P 

Sbjct: 2068 VTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPT 2127 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TTV + 

Sbjct; 2128 TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT-PTTTPITTTTTVTPT 2186 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P +++ +TT T T P I 

Sbjct: 2187 PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 224 6 

Query: 269 IHQPIQSRPPVTTSNAIPPAWATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT P T+T+PTT T + T++ P 
Sbjct: 2247 TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT 2303 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT— VAPILA 385 

Q P + TT P+ GT + T + T TP T PI 

Sbjct: 2304 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 2363 

Query: 386 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSP 443 

T T+ P+ T G+ ■•- T P +T T+T P+ + T TT V P T T 

Sbjct: 2364 TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 2422 

Query: 444 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 
+ P ++ + +P P +T + + P+ + PT P+ 

Sbjct: 2423 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG--TQTP 2471 

Query: 503 TYTPITSSVS-TrRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ + T PQP+ITTV T QT 
Sbjct: 2472 TTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP— TGTQTPTTTPITTTTTVT 2529 

Query: 561 IQPAPISTQGIQPAPIGTPGI— QPAPLGTQGIHSATPINTQGL— QPAPMGTQQPQ- 613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 2530 PTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTPTT 2388 

Query: 614 -PEGKTSAVVLADGATIVAMPISNPFSAAPAAT-TWQTHSQSASTNAPAQGSSPRPSIL 671 
PT+V T P + P+ TTT +0+ +T ++ p+ 

Sbjct: 2589 TPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPT 2648 

Query: 672 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTIPTMI 728 

T P+ TP +T+ TPPTQPTP 

Sbjct: 2649 PTGTQTPTTTPITrrTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 2708 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 

P+ TP PIT TT P P+ T G+ + P V 
Sbjct: 2709 TTTVTPTPTPTGTQT-PTTTPIT TTTTVTPTPTPT— GTQTPTTTPITTTTTVTPTP 2762 

Query: 789 EPMDIMRPVSAVPPLATNTVSPS 811 

P P + P T TV+P+ 

Sbjct: 2763 TPTGTQTPTTT-PITTTTTVTPT 2784 

Score = 318 (47.7 bits). Expect = 1.2e-23, P = 1.2e-23 
Identities - 174/717 (24%), Positives - 243/717 (33%) 

Query: 96 VWRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T + + TVTP TP + +PPPT P 

Sbjct: 2206 VTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPT 2265 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSMLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TTV + 

Sbjct: 2266 TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT-PTTTPITTTTTVTPT 2324 

Query: 213 IIRSNAPGP— PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P +++ 4-TT T TPI 

Sbjct: 2325 PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 2384 

Query: 269 IHQPIQSRPPVTTSNAIPPAWATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT P T + T+PTT T t T++ P 
Sbjct: 2385 TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT 2441 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT— VAPILA 385 

Q P + TT P+ GT + T + T TP T Pl 

Sbjct: 2442 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 2501 

Query: 386 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKWPQQITHTSP 443 
T T+ P+ T G+ + T P +T T+T P+ + T TT V P T T 
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Sbjct: 2502 TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 2560 

Query: 4AA RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ P ++ + +P P 4T + + P+ + PT P+ 

Sbjct: 2561 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG--TQTP 2609 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ + T PQP+ITTV T QT 
Sbjct: 2610 TTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP — TGTQTPTTTPITTTTTVT 2667 

Query: 561 IQPAPISTQGIQPAPIGTPGI— QPAPLGTQGIHSATPINTQGL— QPAPMGTQQPQ- 613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 2668 PTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTPTT 2726 

Query: 614 -PEGKTSAWLADGATIVANPISNPFSAAPAAT-TWQTHSQSASTNAPAQGSSPRPSIIi 671 

PT+V T P+P+ TTT +Q+ +T ++ P+ 

Sbjct: 2727 TPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPT 2786 

Query: 672 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTIPTMI 728 

T P+ TP +T+ TPPTQPTP 

Sbjct: 2787 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 284 6 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 

P+ TP PIT TT P P+ T G+ + P V 
Sbjct: 2847 TTTVTPTPTPTGTQT-PTTTPIT TTTTVTPTPTPT — GTQTPTTTPITTTTTVTPTP 2900 

Query: 789 EPMDIMRPVSAVPPLATNTVSPS 811 

P P + P T TV+P+ 

Sbjct: 2901 TPTGTQTPTTT-PITTTTTVTPT 2922 

Score = 318 (47.7 bits). Expect = 1.2e-23, P = 1.2e-23 
Identities = 174/717 (24%), Positives - 243/717 (33%) 

Query: 96 VWRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T + + TVTP TP + +PPPT P 

Sbjct: 2321 VTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPT 2380 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TTV + 

Sbjct: 2381 TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT-PTTTPITTTTTVTPT 2439 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P +++ +TT T TPI 

Sbjct: 2440 PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 2499 

Query: 269 IHQPIQSRPPVTTSNAIPPAWATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

■•-PT P T + T+PTT T + T++ p 
Sbjct: 2500 TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT 2556 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT— VAPILA 385 

Q P + TT P+ GT + T + T TP T PI 

Sbjct: 2557 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 2616 

Query: 386 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKWPQQITHTSP 443 

T T+ P+ T G+ + T P +T T+T P++ TTT VP TT 

Sbjct: 2617 TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 2675 

Query: 444 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ P ++ + +P P +T + + P+ + PT P+ 

Sbjct: 2 676 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG — TQTP 2724 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ + T P QP+ITTV T QT 
Sbjct: 2725 TTTPITTTTTVTPTPTPTGTQTPTTrPITTTTTVTPTPTP—TGTQTPTTTPITTTTTVT 2782 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ- 613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 2783 PTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTPTT 2841 

Query: 614 -PEGKTSAWLADGATIVANPISNPFSAAPAAT-TWQTHSQSASTNAPAQGSSPRPSIL 671 

PT+V T P + P+ TTT +Q+ +T ++ P+ 

Sbjct: 2842 TPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPT 2901 

Query: 672 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTIPTMI 728 

T P+ TP +T+ TPPTQPTP 

Sbjct: 2902 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 2961 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 

P+ TP PIT TT P P+ T G+ + P V 
Sbjct: 2962 TTTVTPTPTPTGTQT-PTTTPIT TTTTVTPTPTPT — GTQTPTTTPITTTTTVTPTP 3015 
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Query: 789 EPMDIMRPVSAVPPLATNTVSPS 811 

P P + P T TV+P+ 

Sbjct: 3016 TPTGTQTPTTT-PITTTTTVTPT 3037 

Score - 318 {47.7 bits). Expect - 1.2e-23, P = 1.2e-23 
Identities = 174/717 (24%), Positives = 243/717 (33%) 


96 WVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSBGLMKPPPKPTMPSRPI 154 
V PP T ++TVTP TP + +PPPT P 

2390 VTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTCyrPT 2449 

155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 
P +T P PGTT +PT+GQP+ TTV + 

24 50 TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT-PTTTPITTTTTVTPT 2508 

213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P +++ +TT T T P I 

2509 PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 2568 

269 IHQPIQSRPPVTTSNAI PPAVVATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 
+ PT P T + T+PTT T + T++ P 
2569 TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT 2625 

329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT—VAPILA 385 
0 P + TT P+ GT + T + T TP T PI 

2 62 6 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 2685 

386 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSP 4 43 
T T+ P+ T G+ + T P +T T+T P++TTT VPTT 
2686 TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 2744 

444 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 
+ P ++ + +P P +T + + P+ + PT P+ 

2745 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG — TQTP 2793 

503 TYTPITSSVS-TIRQYPVSAOAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 
T TPIT++ + T P QP+ITTV T QT 
2794 TTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP— TGTQTPTTTPITTTTTVT 2851 

561 IQPAPISTQGIQPAPIGTPGI— QPAPLGTQGIHSATPINTQGL— QPAPMGTQQPQ- -613 
P P TQ PI T P P GTQ + TPI T P P GTQ P 

2852 PTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTPTT 2910 

614 -PEGKTSAWLADGATIVANPISNPFSAAPAAT-TWQTHSQSASTNAPAQGSSPRPSIL 671 
PT+V T P+P+ TTT +Q+ +T H- P + 

2911 TPI TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPrTTPITTTTTVT PTPT 2970 

672 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTIPTMI 728 

T P+ TP +T+ TPPTQPTP 

2971 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 3030 

729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 
P+ TP PIT TT P P+ T G+ + P V 
3031 TTTVTPTPTPTGTQT-PTTTPIT TTTTVTPTPTPT — GTQTPTTTPITTTTTVTPTP 3084 

789 EPMDIMRPVSAVPPLATNTVSPS 811 
P P + P T TV+P+ 

3085 TPTGTQTPTTT-PITTTTTVTPT 3106 


Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 

Score - 318 (47.7 bits). Expect - 1.2e-23, P •= 1.2e-23 
Identities = 174/717 (24%), Positives « 243/717 (33%) 

Query: 96 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

VPP T++TVTP TP ++PPPT P 

Sbjct: 2459 VTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPT 2518 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TTV + 

Sbjct: 2519 TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT-PTTTPITTTTTVTPT 2577 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P +++ +TT T TPI 

Sbjct: 2578 PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 2637 

Query: 269 IHQPIQSRPPVTTSNAI PPAWATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT P T + T+PTT T + T++ P 
Sbjct: 2638 TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT 2694 

Query: 329 AISIORPAQSRDVTTRITLPSHPALGTPKOQLHTMAQKT-IFSTGTPVAAAT—VAPILA 385 

Q P + TT P+ GT + T + T TP T PI 

Sbjct: 2695 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 2754 
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Query: 386 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSP 443 

T T+ P+ T G+ + T P +T T+T P+ + T TT V P T T 

Sbjct: 2755 TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 2813 

Query: 444 RIQpdypaersslipisghraspnpvametrsdnrpsvpvqfqyfl-ptyppsayplaah 502 

+ P ++ + +P P +T + + P+ + PT P+ 

Sbjct: 2814 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG— TQTP 2862 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ + T P QP+ITTV T QT 
Sbjct: 2863 TTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP— TGTQTPTTTPITTTTTVT 2920 

Query: 561 IQPAPISTQGIQPAPIGTPGI— QPAPLGTQGIHSATPINTQGL— QPAPMGTQQPQ- 613 

P P TQ PI T P P GTQ + TPI T P p GTQ P 

Sbjct: 2921 PTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTPTT 2979 

Query: 614 -pegktsavvladgativanpisnpfsaapaat-twqthsqsastnapaqgssprpsil 671 

PT+V T P + P+ TTT +Q+ +T ++ P+ 

Sbjct: 2980 TPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPT 3039 

Query: 672 RKKPATDGAKPKSEIHVSHATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTI PTMI 728 

T P+ TP +T+ TPPTQPTP 

Sbjct: 3040 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 3099 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 

P+ TP PIT TT P P+ T G+ + P V 
Sbjct: 3100 TTTVTPTPTPTGTQT-PTTTPIT TTTTVTPTPTPT— GTQTPTTTPITTTTTVTPTP 3153 

Query: 789 EPMDIMRPVSAVPPIATNTVSPS 811 

P P + P T TV+P+ 

Sbjct: 3154 TPTGTQTPTTT-PITTTTTVTPT 3175 

Score = 318 (47.7 bits). Expect - 1.2e-23, P « 1.2e-23 
Identities = 174/717 (24%), Positives = 243/717 (33%) 

Query: 96 VWRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

VPP T++TVTP TP ++PPPT P 

Sbjct: 2528 VTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPT 2587 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TTV + 

Sbjct: 2588 TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT-PTTTPITTTTTVTPT 2646 

Query: 213 IIRSNAPGP— PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P +++ +TT T TPI 

Sbjct: 2647 PTPTGTQTPTTTPITTTTTVTPTPTPTtrTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 2706 

Query: 269 IHQPIQSRPPVTTSNAIPPAWATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT P T + T+PTT T + T++ P 
Sbjct: 2707 TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT 2763 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT— VAPILA 385 

Q P + TT P+ GT + T + T TP T PI 

Sbjct: 2764 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 2823 

Query: 386 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSP 443 

T T+ P+ T G+ + T P +T T+T P+ + T TT V P T T 

Sbjct: 2824 TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 2882 

Query: 444 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ P ++ + +P P +T + + P+ + PT P+ 

Sbjct: 2883 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG— TQTP 2931 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ + T P QP+ITTV T QT 
Sbjct: 2932 TTTPITTTTTVTPTPTPTGTQrPTTTPITTTTTVTPTPTP--TGTQTPTTTPITTTTTVT 2989 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL— QPAPMGTQQPQ- 613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 2990 PTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTPTT 3048 

Query: 614 -PEGKTSAWLADGATIVANPISNPFSAAPAAT-TVVQTHSQSASTNAPAQGSSPRPSIL 671 

P T+ V T P + P + T T T +0+ +T ++ P+ 

Sbjct: 3049 TPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPT 3108 

Query: 672 RKKPATD(»KPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTI PTMI 728 

T P+ TP +T+ TPPTQPTP 

Sbjct: 3109 PTGTOTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 3168 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 
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P+ TP PIT TT P P+ T G+ + P V 
Sbjct: 3169 TTTVTPTPTPTGTQT-PTTTPIT TTTTVTPTPTPT— GTQTPTTTPITTTTTVTPTP 3222 

Query: 789 EPMDIMRPVSAVPPLATNTVSPS 811 

P P + P T TV+P+ 

Sbjct: 3223 TPTGTQTPTTT-PITTTTTVTPT 3244 

Score = 318 (47.7 bits). Expect « 1.2e-23, P = 1.2e-23 
Identities = 174/717 (24%), Positives = 243/717 (33%) 

Query: 96 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T + + TVTP TP + +PPPT P 

Sbjct: 3080 VTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPT 3139 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TTV + 

Sbjct: 3140 TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT-PTTTPITTTTTVTPT 3198 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P +++ +TT T T P I 

Sbjct: 3199 PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 3258 

Query: 269 IHQPIQSRPPVTTSNAIPPAWATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT P T + T+PTT T + T++ P 
Sbjct: 3259 TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT 3315 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTKAQKT-IFSTGTPVAAAT— VAPILA 385 

Q P + TT P+ GT + T + T TP T PI 

Sbjct: 3316 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 3375 

Query: 386 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSP 443 

T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 

Sbjct: 3376 TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPXTTTTTVTPTP-TPTGT 3434 

Query: 444 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ P ++ + +P P +T + + P+ + PT P+ 

Sbjct: 3435 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG~T<yrP 3483 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 
T rPIT++ + T PQP+ITTV T QT 

Sbjct: 3484 TTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP — TGTQTPTTTPITTTTTVT 3541 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ- 613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 3542 PTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTPTT 3600 

Query: 614 -PEGKTSAWLADGATIVANPISNPFSAAPAAT-TVVQTHSQSASTNAPAQGSSPRPSIL 671 

PT+V T P + P+ TTT +Q+ +T ++ P+ 

Sbjct: 3601 TPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPT 3660 

Query: 672 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTIPTMI 728 

T P+ TP +T+ TPPTQPTP 

Sbjct: 3661 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 3720 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 

P+ TP PIT TT P P+ T G+ + P V 
Sbjct: 3721 TTTVTPTPTPTGTQT-PTTTPIT TTTTVTPTPTPT — GTQTPTTTPITTTTTVTPTP 3774 

Query: 789 EPMDIMRPVSAVPPLATNTVSPS 811 

P P 4 p T TV+P+ 

Sbjct: 3775 TPTGTQTPTTT-PITTTTTVTPT 3796 

Score = 313 (47.0 bits). Expect = 4.2e-23, P 4.2e-23 
Identities -» 169/695 (24%), Positives - 245/695 (35%) 

Query: 96 VWRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T ++TVTP TP + +PPPT P 

Sbjct: 3655 VTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPT 3714 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PTtGQP+ TTV + 

Sbjct: 3715 TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT-PTTTPITTTTTVTPT 3773 

Query: 213 IIRSNAPGP— PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P +++ +TT T TPI 

Sbjct: 3774 PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 3833 

Query: 269 IHQPIQSRPPVTTSNAIPPAWATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT p T + T+PTT T + T++ P 
Sbjct: 3834 TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT 3890 
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Query; 
Sbjct : 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 


329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT— VAPILA 385 
Q P + TT P+ GT + T + T TP T PI 

3891 PTGTOTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 3950 

386 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSP 443 
T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 

3951 TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 4009 

444 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAyPLAAH 502 
+ P ++ + +P P +T + + P+ + PT P+ 

4010 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG— TQTP 4058 


503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLKTVDASHARHIQG 560 
T TPIT++ + T P QP+ITTV T QT 
Sbjct: 4059 TTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP— TGTQTPTTTPITTTTTVT 4116 

Query: 561 IQPAPISTQGIQPAPIGTPGI— -QPAPLGTQGIHSATPINTQGL— QPAPMGTQQPQP 614 

P P TQ PI T P P GTQ + TPI T p p GTQ P 

Sbjct: 4117 PTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTPT- 4174 


Query: 
Sbjct: 


615 EGKTSAVVLADGATIVANPISNPFSAAPAATTWQTHSQSASTNAPAQGSSPRPSILRKK 674 
+ T+ P P T ++ ++N P + S+P+ S 

4175 TTPITTT— TTVTPTPTPTGTQTGPPTHTSTAPIAELTTSKPPPESSTPQTSRSTSS 4229 


Query: 675 PATDGAKPKSEIH— VSMATPVTVSMETVSNQNNDQPTIAVPP-TAQQPP— PTIPTMIA 729 

PT+ S++M+ ST + T++ PP T PP PT T 

Sbjct: 4230 PLTESTTLLSTLPPAIEMTSTAPPSTPTAPTTTSGGHTLSPPPSTTTSPPGTPTRGTTTG 4289 

Query: 730 AASPPSQPAVALSTI PGAVPITPP—ITTIAAAP-PPSVTVGGSLSSVLGPPVPEI 782 

++S P+ V +T P P++ p I T P P SV + L+ P E+ 

Sbjct: 4290 SSSAPTPSTVOTTTTSAWTPTPTPLSTPSIIRTTGLRPYPSSVLICCVLNDTYYAPGEEV 4349 

Score = 279 (41,9 bits). Expect = l,8e-19, P = l,8e-19 
Identities = 138/540 (25%), Positives « 194/540 (35%) 


Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 


278 PVTTSNAIPPAVVATVSATRAQSPVITTTAAH ATDSALSRP— TLSIOHPPSAA 329 

P+TT+ + P T + T +P+ TTT T + + P T + P 

194 6 PITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP 2005 

330 ISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT— VAPILAT 386 
Q P + TT P+ GT + T + T TP T PI T 

2006 TGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTT 2065 

387 NTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKWPQQITHTSPR 444 
T+ P+ T G+ + T P +T T+T P++ TTT VP TT + 

2066 TTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGTQ 2124 

445 IQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAHT 503 
P ++ + +P P +T + + P+ + PT P+ T 

2125 TPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG— TQTPT 2173 

504 YTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQGI 561 
TPIT++ + TPQP+ITTVT QT 
2174 TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP— TGTQTPTTTPITTTTTVTP 2231 

562 QPAPISTQGIQPAPIGTPGI— QPAPLGTQGXHSATPINTQGL~QPAPMGTQQPQ— 613 
P P TQ PI T P P GTQ + TPI T p p GTQ P 

2232 TPTPTGTQTPTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTPTTT 2290 

614 PEGKTSAVVLADGATIVANPISNPFSAAPAAT-TWQTHSQSASTNAPAQGSSPRPSILR 672 
PT+V T P + P+ TTT +Q+ +T ++ p+ 

2291 PITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP 2350 

673 KKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP— PTAQQPPPTIPTMIA 729 
T P+ TP +T+ TPPTQPTP 

2351 TGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTT 2410 

730 AASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEVE 789 
P+ TP PIT TT P P+ T G+ + p V 

2411 TTVTPTPTPTGTQT-PTTTPIT-— TTTTVTPTPTPT— GTQTPTTTPITTTTTVTPTPT 2464 

790 PMDIMRPVSAVPPLATNTVSPS 811 
P P + P T TV+P+ 

2465 PTGTQTPTTT-PITTTTTVTPT 2485 


Score = 265 (39.8 bits). Expect = 5.8e-18, P » 5.8e-18 
Identities = 179/746 (23%), Positives = 257/746 (34%) 

Query: 96 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T + + TVTP TP + +PPPT P 

Sbjct: 3678 VTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPT 3737 
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Query; 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TTV + 

Sbjct: 3738 TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT-PTTTPITTTTTVTPT 3796 

Query: 213 ITRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 2 68 

+ P P+ + P +++ +TT T T P I 

Sbjct: 3797 PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 3856 

Query: 269 IHQPIQSRPPVTTSNAIPPAVVATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

i- ? T p T + T+PTT T + T++ P 
Sbjct: 3857 TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT 3913 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT— VAPILA 385 

Q P + TT P+ 6T + T + T TP T PI 

Sbjct: 3914 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 3973 

Query: 386 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSP 443 

T T+ P+ T G+ + T P T+T P+ + T TT VP T T 

Sbjct: 3974 TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 4032 

Query: 444 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ P ++ + +P P +T + + P+ + PT P+ 

Sbjct: 4033 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG— TQTP 4081 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ + T PQP+ITTV T QT 
Sbjct: 4082 TTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP — TGTQTPTTTPITTTTTVT 4139 

Query: 561 IQPAPISTQGIQPAPIGTPGI— QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQP 614 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 4140 PTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTGPP 4198 

Query: 615 EGKTSAVVLADGATIVANPISNPFSAAPA ATTVVQTHSQSA-STNAPA— QGSSPRP 668 

TS +A+ T +NP P S+ P +T+ T S + ST PA S+ P 
Sbjct: 4199 T-HTSTAPIAELTT — SNP — PPESSTPQTSRSTSSPLTESTTLLSTLPPAIEMTSTAPP 4253 

Query: 669 SILRKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVPPTAQQPPPTIPTMI 728 

S T G S + +p + +-+ PT + T T PT 

Sbjct: 4254 STPTAPTTTSGGHTLSPPPSTTTSPPGTPTRGTTTGSSSAPTPSTVQTTTTSAWT-PTPT 4312 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 

++P L P +V I 4- AP V G+ + E 

Sbjct: 4313 PLSTPSIIRTTGLRPYPSSVLICCVLNDTYYAPGEEV-YNGTYGDTCYFVNCSLSCTLEF 4371 

Query: 789 EPMDIMRPVSAVPPLATNTVSPSLALLANNLSMPTSDLPPGASPRKKPRKQQH 841 

S P + +T +PS ++ S PT P P P +Q++ 
Sbjct: 4372 YNWSCPSTPSPTPTPSKSTPTPSKP— SSTPSKPTPGTKPPECPDFDPPRQEN 4422 

Score « 254 (38.1 bits). Expect = 8.7e-17, P = 8.7e-17 
Identities « 167/697 (23%), Positives ^ 245/697 (35%) 


Query: 
Sbjct: 
Query : 
Sbjct : 
Query; 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 


115 SATPVAVTAPPAHLTPAVPLSFSEGLMKPPPK— PTMPSR-PIAPAPPSTLSLPPKV-PG 170 
S + T PP TP+ P + + Ppp p+ p+ PI p p ST +LPP P 

1587 SPPTITTTTPPPTTTPSPPTTTTT TPPPTTTPSPPTTTPITP-PTSTTTLPETTTPS 1642 

171 QVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMSIIRSNAPGPPLHIGASHL 230 
T + P + PT+ + TT I + PPP + 

1643 PPPTTTTTPPPTTTPSPPTTTTPSPPITTTTTPPPTTTPSSPI— TTTPSPPTTTMTTPS 1700 

231 PRGAAAAAVMSSSKVTTVLRPTSQLPNAATAQPAVQHIIHQPIQS-RPPVTTSNAIPPAV 289 
P SS +TT P+S + P P + PP TT +PP 

1701 P TTTPSSPITTTTTPSS TTTPSPPPTTMTTPSPTTTPSPPTTTMTTLPPTT 1751 

290 VATVSATRAQSPVITT-TAAHATDSALSRPTLSIQH PPSAAISIQRPAQSRDVTTR 344 

++ T PITT++++P + + + s ++P ++ 

1752 TSSPLTTTPLPPSITPPTFSPFSTTTPTTPCVPLCNWTGWLDSGKPNFHKPGGDTELIGD 1811 


345 ITLPSHPALGTPKQQLHTMAQKTI FSTGTPVAAATVAPILATN- 
+ PA++++ IGV ++N 


TIPS ATT AGS 397 
IP A 


1812 VCGPGWAANISCRATMYP— DVPIGQLGQTVVCDVSVGLICKNEDQKPGGVIPMAFCLNY 1869 

398 VSHTQAPTSTI— VTMTVPSHSSHATAVTTSNIPVAKWPQQITHTSPRIQPDYPAERSS 455 
+ Q T MT + + + T TT+ I V T T + P ++ 

1870 EINVQCCECVTQPTTMTTTT-TENPTPPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTT 1928 

456 LIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAHTYTPITSSVS-T 513 
+ +P P +T + + P+ + PT P+ T TPIT++ + T 

1929 TVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG — TQTPTTTPITTTTTVT 1977 

514 IRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQGIQPAPISTQGIQ 572 
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PQP+ITTV T QT PPTQ 

Sbjct: 1978 PTPTPTGTQTPTTTPITTTTTVTPTPTP — TGTQTPTTTPITTTTTVTPTPTPTGTQTPT 2035 

Query: 573 PAPIGTPGI — -QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ—PEGKTSAVVLA 624 

PI T P P GTQ + TPI T P P GTQ P P T+ V 

Sbjct: 2036 TTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPT 2094 

Query: 625 DGATIVANPISNPFSAAPAAT-TWQTHSQSASTNAPAQGSSPRPSILRKKPATDGAKPK 683 

T P+P+ TT T +Q+ +T ++ P+ TP 

Sbjct: 2095 PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 2154 

Query: 684 SEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTIPTMIAAASPPSQPAVA 740 

+ TP +T + T P PT Q P T P P+ 

SbjCt: 2155 TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTG 2214 

Query: 741 LSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEVEPMDIMRPVSAV 800 

T P PIT TT P P+ T G+ + P V P P + 

Sbjct: 2215 TQT-PTTTPIT TTTTVTPTPTPT — GTQTPTTTPITTTTTVTPTPTPTGTQTPTTT- 2267 

Query: 801 PPLATNTVSPS 811 

P T TV+P+ 
Sbjct: 2268 PITTTTTVTPT 2278 

Score =• 243 (36.5 bits). Expect = 1.3e-15, P = 1.3e-15 
Identities « 110/406 (27%), Positives « 154/406 (37%) 

Query: 121 VTAP-PAHLTPAVPLSFSEGLMKPPPKPTMPSRPIAPAPPSTLSLPPKVPGQVTVTMESS 179 

+T P P TP+ P + + L P P+ P+ PP+T PP T + ++ 

Sbjct: 1396 ITTPSPPTTTPSPPPTTTTTL-PPTTTPSPPTTTTTTPPPTTTPSPPITT— TTTPLPTT 1452 

Query: 180 IPQASAIPVATISGQQGHPSNLHHIMTTNVQMSIIRSNAPGPPLHIGASHLPRGAAAAAV 239 

P P++T + P+ TT + P PP + P 
Sbjct: 1453 TPSP PISTTTTPP — PTTTPSPPTTTPSPP TTTPSPPTTTTTTPPP TT 1498 

Query: 240 MSSSKVTTVLRP TSQLPNAATAQPAVQHIIHQPIQSRP-PVTTSNAIPPAWATVSA 295 

S +TT + P T+LP TP P+PP TT+ PP T+ 

Sbjct: 1499 TPSPPMTTPITPPASTTTLPPTTTPSPPTTTTTTPPPTTTPSPPTTTPITPPTSTTTLPP 1558 

Query: 296 TRAQSPVITTTAAHATDSALSRPTLSIQHPPSAAISIQRPAQSRDV-TTRITLPSHPALG 354 

T SP TTT + S PT + PP+ + p + TT T P P 

Sbjct: 1S59 TTTPSPPPTTTTTPPPTTTPSPPTTTTPSPPTITTTTPPPTTTPSPPTTTTTTP— PPTT 1616 

Query: 355 TPKQQLHTMAQKTIFSTGTPVAAATVAPILATNTIPSATTAGSVSHTQAPTSTIVTMTVP 414 

TP T +TP T+PTTPTTSTP+ITTP 

Sbjct: 1617 TPSPPTTTPITPPTSTTTLP-PTTTPSPPPTTTTTPPPTTTPSPPTTTTPSPPITTTTTP 1675 

Query: 415 SHSSHATA-VTTSNIPVAKVVPQQITHTSPRIQPDYPAERSSLIPISGHRASPNPVAMET 473 

++ ++ ♦TT* P + TSP PP ++PS SPPMT 
Sbjct: 1676 PPTTTPSSPITTTPSPPTTTM TTPSPTTTPSSPITTTTT-PSSTTTPSPPPTTMTT 1730 

Query: 474 RSDNR-PSVPVQFQYFLPTYPPSAYPLAAHTyTPITSSVSTIRQYPVSAQAPNS 526 

S PS P LP S+ PL T TP+ S++ PS P + 

Sbjct: 1731 PSPTTTPSPPTTTMTTLPPTTTSS-PL TTTPLPPSITPPTFSPFSTTTPTT 1780 

Score « 189 (28.4 bits), Expect - 8.0e-09, P = 8.0e-09 
Identities = 92/374 (24%), Positives » 133/374 (35%) 

Query: 439 THTSPRIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYF-LPTYPPSAY 497 

T + P P P ++ +p + + p PS P+ LPT PS 

Sbjct: 1398 TPSPPTTTPSPPPTTTTTLPPTTTPSPPTTTTTTPPPTTTPSPPITTTTTPLPTTTPSP- 1456 

(3uery: 498 PLAAHTYTPITSSVSTIRQYPVSAQAPNSAITAQTGVGVASTVHLNPMQL-MTVDASHAR 556 

P++ T P T++ S P S T T +T PM +T AS 

Sbjct: 1457 PISTTTTPPPTTTPSPPTTTPSPPTTTPSPPTTTTTTPPPTTTPSPPMTTPITPPASTTT 1516 

Query: 557 HIQGIQPAPISTQGIQPAPIGTPGIQPAPLGTQGIHSATPINTQGLQPAPMGTQQPQPEG 616 

P+P +T P P TP +P T I P +T L P T P P 
Sbjct: 1517 LPPTTTPSPPTTTTTTPPPTTTP SPPTTTPI — TPPTSTTTLPP TTTPSPPP 1566 

Query: 617 KTSAWLADGATIVANPISNPFSAAPAATTVVQTHSQSASTNAP— AQGSSPRPSILRKK 674 

T+ ■ T +P P + P+ T+ T +T +P ++P P+ 

Sbjct: 1567 ttTTT PPPTTTPSP PTTTTPSPPTITTTTPPPTTTPSPPTTTTTTPPPTTTPSP 1620 

Query: 675 PATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAV-PPTAQQPPPTIPTMIAA— A 731 

PT P+ + PT + PT PPT P P I T 

Sbjct: 1621 PTTTPITPPTS— TTTLPPTTTPSPPPTTTTTPPPTTTPSPPTTTTPSPPITTTTTPPPT 1678 

Query: 732 SPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPV PEIKVK 785 

+ PS P + P TP TT ++P + T S ++ PP P 

Sbjct: 1679 TTPSSPITTTPSPPTTTMTTPSPTTTPSSPITTTTTPSSTTTPSPPPTTMTTPSPTTTPS 1738 
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Query: 786 EEVEPMDIMRPVSAVPPLATNTVSPSL 812 

M + P + PL T + PS + 
Sbjct: 1739 PPTTTMTTLPPTTTSSPLTTTPLPPSI 1765 

Score « 185 (27.8 bits). Expect = 1.6e-09, P 1.6e-09 
Identities = 71/270 (26%), Positives « 99/270 (36%) 

Query: 563 PAPISTQGIQPAPIGTPGIQPAPLGTOGIHSATP INTQGLQPAPMGTQQPQ PEG 616 

P+P +T P P TP P T + + TP I+T P P T P P 
Sbjct: 1422 PSPPTTTTTTPPPTTTPS-PPITTTTTPLPTTTPSPPISTT-TTPPPTTTPSPPTTTPSP 1479 

Query: 617 KTSAVVLADGATIVANPISNPFSAAPAATTVVQTHSQSASTNAPAQGSSPRPSILRKKPA 676 

T+ T p + P +P TT + T S +T P SP + P 
Sbjct: 1480 PTTTPSPPTTTTTTPPPTTTP SPPMTTPI-TPPASTTTLPPTTTPSPPTTTTTTPPP 1535 

Query: 677 TDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVPPTAQQPPPTIPTMIAAASPPSQ 736 

T P + TP+T T + P+ P T PPPT + PS 

Sbjct: 1536 TTTPSPPT TTPITPPTSTTTLPPTTTPS-PPPTTTTTPPPTTTPSPPTTTTPSP 1588 

Query: 737 PAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEVEPMDIMRP 796 

P + +T P +PP TT PPP+ T ++ + PP + p p 

Sbjct: 1589. PTITTTTPPPTTTPSPPTTT-TTTPPPTTTPSPPTTTPITPPTSTTTLPPTTTPSP— PP 1645 

Query: 797 VSAVPPLATNTVSPSLALLANNLSMPTSDLPPGASP 832 

P T T SP + T+ PP +P 

Sbjct: 1646 TTTTTPPPTTTPSPPTTTTPSPPITTTTTPPPTTTP 1681 

Score = 183 (27.5 bits). Expect = 3.4e-09, P = 3.4e-Q9 
Identities «= 91/390 (23%), Positives = 139/390 (35%) 

Query: 326 PSAAISIQRPAQSRDVTTR-ITLPSHPALGTPKQQLHTMAQKTIFSTGTPVAAATVAPIL 384 

PS+P+T TPSPT T I+T TP+ T +p + 

Sbjct: 1399 PSPPTTTPSPPPTTTTTLPPTTTPSPPTTTTTTPPPTTTPSPPITTTTTPLPTTTPSPPI 1458 

Query: 385 ATNTIPSATTAGSVSHTQAPTSTIVTMTVPSHSSHATAVTTSNIP— VAKVVPQQITHTS 442 

+T T P TT S T P+ T + P+ ++ TT+ P + P T T 

Sbjct: 1459 STTTTPPPTTTPSPP-TTTPSPPTTTPSPPTTTTTTPPPTTTPSPPMTTPITPPASTTTL 1517 

Query: 443 PRIQPDYPAERSSLIPISGHRASP NPVAMETRSDNRP— SVPVQFQYFLPTYPPSAY 497 

P P++P SPP+T+P+P T PP+ 

Sbjct: 1518 PPTTTPSPPTTTTTTPPPTTTPSPPTTTPITPPTSTTTLPPTTTPSPPPTTTTTPPPTTT 1577 

Query: 4 98 PLAAHTYTPITSSVSTIRQYPVSAQAPNSAITAQTGVGVASTVHLNPMQL-MTVDASHAR 556 

P T TP + + +T P + +P T T +T P +T S 

Sbjct: 1578 PSPPTTTTPSPPTITTTTPPPTTTPSPP TTTTTTPPPTTTPSPPTTTPITPPTSTTT 1634 

Query: 557 HIQGIQPAPISTQGIQPAPIGTPGIQPAPLGTQGIHSATPINTQGLQPAPMGTQQPQPEG 616 

P+P T PPTPPPT TT P p 

Sbjct: 1635 LPPTTTPSPPPTTTTTPPPTTTPS — P-PTTTTPSPPITTTTTPPPTTTPSSPITTTPSP 1691 

Query: 617 KTSAVVLADGATIVANPISNPFSAAPAATTVVQTHSQSASTNAPAQGSSPRPSILRKKPA 676 

T+ + T ++PI+ + P++TT + +T +P SP + + P 

Sbjct: 1692 PTTTMTTPSPTTTPSSPITT--TTTPSSTTTPSPPPTTMTTPSPTTTPSPPTTTMTTLPP 1749 

Query: 677 TDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVPP 715 

T + P + + P ++-•- T S + PT P 
Sbjct: 1750 TTTSSPLT TTPLPPSITPPTFSPFSTTTPTTPCVP 1784 

Score « 176 (26.4 bits). Expect = 1.8e-07, p = 1.8e-07 
Identities - 101/402 (25%), Positives - 142/402 (35%) 

Query: 34 5 XTLPSHPALGTPKQQLHTMAQKTIFSTGTPVAAATVAPILATNTIPSATTAGSVSHTQAP 404 

IT PS P TP T -♦■T +P T P T P TT + T P 

Sbjct: 1396 ITTPSPPTT-TPSPPPTTTTTLPPTTTPSPPTTTTTTPPPTTTPSPPITTTTTPLPTTTP 1454 

Query: 405 TSTIVTMTVPSHSSHATAVTTS-NIPVAKWPQQITHTSPRIQPDYPAERSSLIPISGHR 463 

+ I T T P ++ + TT+ + P P T T+P P PI+ 

Sbjct: 1455 SPPISTTTTPPPTTTPSPPTTTPSPPTTTPSPPTTTTTTP — PPTTTPSPPMTTPITPP- 1511 

Query: 464 ASPNPVAMETRSDNRPSVPVQFQYFLPTYPPSAYPLAAHTYTPITSSVSTIRQYPVSAQA 523 

AS + T PS P T PP+ P + T TPIT ST P + + 

Sbjct: 1512 ASTTTLPPTTT PSPPTTTT TTPPPTTTP-SPPTTTPITPPTSTTTLPPTTTPS 1563 

Query: 524 PNSAITAQ TGVGVASTVHLNPMQLMTVDASHARHIQGIQPAPISTQGIQPAPIGTP 579 

P T T +T +P + T Pfp +T P P TP 

Sbjct: 1564 PPPTTTTTPPPTTTPSPPTTTTPSPPTITTTTPPPTT TPSPPTTTTTTPPPTTTP 1618 

Query: 580 G IQPAPLGTQGIHSAT PINTQGLQPAPMGTQQPQPEGKTSAWLADGATIV 630 

IPPT + T PT PPTP S + 

Sbjct: 1619 SPPTTTPITP-PTSTTTLPPTTTPSPPPTTTTTPPPTTTPSPPTTTTPSPPITTTTTPPP 1677 
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Query: 

Sbjct : 
Query: 
Sbjct: 


631 ANPISNPFSAAPAA-TTVVQTHSQSASTNAP-AQGSSPRPSILRKKPATDGAKPKSEIHV 688 
S+P + P+ TT + T S + + ++P ++P + P T p 
1678 TTTPSSPITTTPSPPTTTMTTPSPTTTPSSPITTTTTPSSTTTPSPPPTTMTTPSP T 1734 

689 SMATPVTVSMETVSNQNNDQPTIAVPPTAQQPPPTIPTMIAAASPPSQPAVALSTIPG 746 
+ +P T +M T+ P p PPT + 4 p+ p V L G 

1735 TTPSPPTTTMTTLPPTTTSSPLTTTPLPPSITPPTFSPF— STTTPTTPCVPLCNWTG 1790 


Score = 168 (25.2 bits), Expect = 9.3e-08, P = 9.3e-08 
Identities = 89/387 (22%), Positives = 133/387 (34%) 


Query: 
Sb j ct : 
Query: 
Sbjct: 
Query: 
Sbjct: 
Que ry : 
Sb j ct : 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 


448 DyPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYPLPTYPPSAYPLAAHTYTPI 507 
DY + P+ +P+P T + +PP PTPSP TP 

1381 DYKIRVNCCWPMDKCITTPSP PTTTPSPP— PTTTTTLPPTTTPSP-PTTTTTTPPP 1434 

508 TSSVS TIRQYPVSAQAPNSAITAQTGVGVASTVHLNPMQLMTVDASHARHIQGIQPA 564 

T++ S T P+ p+ 1+ T +T P T + p+ 
1435 TTTPSPPITTTTTPLPTTTPSPPISTTTTPPPTTT PSPPTTTPSPPTT TPS 1485 

565 PISTQGIQPAPIGTPGI-QPAPLGTQGIHSATPINTQGLQPAPMGTQQPQ— PEGKTSA 620 
P +T P P TP P+ + P T P TP P T+ 

1486 PPrTTTTTPPPTTTPSPPMTTPITPPASTTTLPPTTTPSPPTTTTTTPPPTTTPSPPTTT 1545 

621 WLADGATIVANPISNPFSAAPAATTWQTHSQSA-STNAPAQGS SPRPSILRKKP 675 

+ +T P + P TT T + S 4T P+ + +P P+ p 

1546 PITPPTSTTTLPPTTTPSPPPTTTTTPPPTTTPSPPTTTTPSPPTITTTTPPPTTTPSPP 1605 

676 ATDGAKPKSEIHVS — MATPVTVSMETVSNQNNDQPTIAVPPTAQQPPPTIPTMIAAASP 733 
TP S TP+T T + P-l- P T PPPT + 

1606 TTTTTTPPPTTTPSPPTTTPITPPTSTTTLPPTTTPS-PPPTTTTTPPPTTTPSPPTTTT 1664 

734 PSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGP PVPEIKVKEEVE 789 

PS P +T P + PITT + P ++T ++ P p 

1665 PSPPITTTTTPPPTTTPSSPITTTPSPPTTTMTTPSPTTTPSSPITTTTTPSSTTTPSPP 1724 

790 PMDIMRPVSAVPPLATNTVSPSLALLANNLSMPTSDLPPGASP 832 
P + P P T+L ++T+ LPP +P 

1725 PTTMTTPSPTTTPSPPTTTMTTLPPTTTSSPLTTTPLPPSITP 1767 


Score = 154 (23.1 bits). Expect = 2.7e-06, P « 2.7e-06 
Identities - 70/277 (25%), Positives = 92/277 (33%) 


Query : 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 


565 PISTQGIQPAPIGTPGIQPAPLGTQGIHSATPINTQGLQPAPMGTQQPQPEGKTSAWLA 624 
PIST PPTPPPT +TP PTPPT + 

1457 PISTT-TTPPPTTTPS~P-PTTTPSPPTTTPSPPTTTTTTPPPTTTPSPPMTTP — ITP 1510 

625 DGATIVANPISNPFSAAPAATTWQTHSQSASTNAP AQGSSPRPSILRKKPATDGA 680 

+T P + P TT T + S T P ++ P+ p T 

1511 PASTTTLPPTTTPSPPTTTTTTPPPTTTPSPPTTTPITPPTSTTTLPPTTTPSPPPTTTT 1570 

681 KPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVPPTAQQ— PPPTIPTMIAAASPPSQPA 738 
P S T T S T++ T PPT PPPT T + P P 

1571 TPPPTTTPSPPTTTTPSPPTITTTTPPPTTTPSPPTTTTTTPPPTT-TPSPPTTTPITPP 1629 

739 VALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEVEPMDIMRPVS 798 
+ +T+P +PP TT PPP+ T ++ PP+ + 

1630 TSTTTLPPTTTPSPPPTT-TTTPPPTTTPSPPTTTTPSPPITTTTTPPPTTTPSSPITTT 1688 

799 AVPPLATNTV SPSLALLANNL—SMPTSDLPPGASPRKKP 836 

PP T T +PS + S T PP P 

1689 PSPPTTTMTTPSPTTTPSSPITTTTTPSSTTTPSPPPTTMTTPSP 1733 


Score e 148 (22.2 bits), Expect = l.le-05, P - l.le-05 
Identities * 62/254 (24%), Positives « 89/254 (35%) 

Query: 583 PAPLGTQGIHSATPINTQGLQPAPMGTQQPQPEGKTSAV VLADGATIVANPXSNP 637 

P+P T S P T L P T P P T+ + T P+ 

Sbjct: 1399 PSPPTTTP—SPPPTTTTTLPP TTTPSPPTTTTTTPPPTTTPSPPITTTTTPLPTT 1452 

Query: 638 FSAAPAATTWQTHSQSASTNAPAQGSSPRPSILRKKPATDGAKPKSEIHVS— MATPVT 695 

+ P +TT T+ + + P SPP+ PT P SM TP+T 

Sbjct: 1453 TPSPPISTTT—TPPPTTTPSPPTTTPSP-PTTTPSPPTTTTTTPPPTTTPSPPMTTPIT 1509 

Query: 696 VSMETVSNQNNDQPTIAVPPTAQQPPPTIPTMIAAASPPSQPAVALSTIPGAVPITPPIT 755 

T + P+ T PP T P+ + p p + +T+P 4.pp T 

Sbjct: 1510 PPASTTTLPPTTTPSPPTTTTTTPPPTTTPS— PPTTTPITPPTSTTTLPPTTTPSPPPT 1567 

Query: 756 TIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEVEPMDIMRPVSAVPPLATNTVSPSLALL 815 

T PPP+ T ++ PP + PP T P+ + 

Sbjct: 1568 T-TTTPPPTTTPSPPTTTTPSPPTITTTTPPPTTTPSPPTrTTTTPPPTTTPSPPTTTPI 1626 
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Query: 816 ANNLSMPTSDLPPGASPRKKP 836 
S T+ LPP +P P 

Sbjct: 1621 TPPTS — TTTLPPTTTPSPPP 1645 

Score = 131 (19.7 bits), Expect = 1.2e-03, P = 1.2e-03 
Identities = 112/492 (22%), Positives = 174/492 (35%) 

Query: 96 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

VPP T+ + TVTP TP ++PPPT P 

Sbjct: 3977 VTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPT 4036 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TTV + 

Sbjct: 4037 TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT-PTTTPITTTTTVTPT 4095 

Query: 213 IIRSNAPGP— PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P +++ fTT T T P I 

Sbjct: 4096 PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 4155 

Query: 269 IHQPIQSRPPVTTSNAIPPA— VVATVSATRAOSPVITTTA— AHATDSALSRPTLSIQH 324 

+ PT P + T + T +PTT H + + ++TS 
Sbjct: 4156 TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTGPPTHTSTAPIAELTTSNPP 4215 

Query: 325 PPSAAISIQRPAQS— RDVTTRI-TLPSHPALGTPKQQLHTMAQKTIFSTGTPVAAATVA 381 

P S+ R S -r TT + TLP PA+ + T T + T T++ 
Sbjct: 4216 PESSTPQTSRSTSSPLTESTTLLSTLP— PAI EMTSTAPPSTPTAPTTTSGGHTLS 4 269 

Query: 382 PII*ATNTIPSAT-TAGSVS-HTQAPTSTIVTMTVPSHSSHATAVTTSNIPVAKVVPQQIT 439 

P +TTP TTG++ 4- APT + V T S A T + P++ P I 

Sbjct; 4270 PPPSTTTSPPGTPTRGTTTGSSSAPTPSTVQTTTTS AWTPTPTPLS—TPSIIR 4321 

Query: 440 HTSPRIQPDYPAERSSLIPISGHRASPNP-VAHETRSDN RPSVPVQFQYFLPTYp- 493 

T ++P YP+ ++ +P V T D S+ +++ + p 

Sbjct: 4322 TTG— LRP-YPSSVLICCVLNDTYYAPGEEVYNGTYGDTCYFVNCSLSCTLEFYNWSCPS 4378 

Query: 494 -PSAYPLAAHTYTPITSSVSTIRQYPVSAQAPNSAITAQTGVGVASTVHLNPMQLMTVDA 552 

PSP+ + TPSS+ P P TL +T 

Sbjct: 4379 TPSPTPTPSKS-TPTPSKPSSTPSKPTPGTKPPECPDFDPPRQENETWWLCOCFMATCKY 4437 

Query: 553 SHARHIQGIQ PAPISTQGIQPAPIGTP 579 

++ I ++ P P + G+QP + P 
Sbjct: 4438 NNTVEIVKVECEPPPMPTCSNGLQPVRVEDP 44 68 

Score = 117 (17.6 bits). Expect = 1.8e-02, P = l,8e-02 
Identities - 41/156 (26%), Positives - 55/156 (35%) 

Query: 710 TIAVPPTAQQPPPTIPTMIAAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGG 769 

T + P T PPPT T + + PS P +T P +PPITT P P+ T 

Sbjct: 1398 TPSPPTTTPSPPPTTTTTLPPTTTPSPPTTTTTTPPPTTTPSPPITT-TTTPLPTTTPSP 1456 

Query: 770 SLSSVLGPPVPEIKVKEEVEPMDIMRPVSAVPPLATNTVSPSLALLANKLSMPTSDLPPG 829 

+S+ PP P P + P T T SP T+ PP 

Sbjct: 1457 PISTTTTPP PTTTPSPPTTTPSPPTTTPSPPTTTTTTP-PPTTTPSPPM 1504 

Query: 830 ASPRKKPRKQQHVISTEEGDMMETNSTDDEKSTAKS 865 

+P P + T T +T +T S 

Sbjct: 1505 TTPITPPASTTTLPPTTTPSPPTTTTTTPPPTTTPS 1540 

Score = 61 (9.2 bits). Expect » 1.6e-09, P = 1.6e-09 
Identities » 23/93 (24%), Positives = 41/93 (44%) 

Query: 397 SVSHTQAPTSTIVTMTVPSHSSHATAVTTSNIPVAKVV PQQITHTSPRIQPDYPAE 452 

S++ + +T T+T+P+ + T TT+ P + V P+ SI D+P+ 

Sbjct: 1257 SITTRP3TLTTFTTITLPTTPTSFTTTTTTTTPTSSTVLSTTPKLCCLWSDWINEDHPSS 1316 

Query: 453 RSS LIPISGHRASPNPVAMETRSDNRPSVPVQ 484 

S P G +P + E RS P + ++ 

Sbjct: 1317 GSDDGDREPFDGVCGAPEDI— ECRSVKDPHLSLE 1349 

Score " 50 (7.5 bits), Expect - 8.0e-09, P = 8.0e-09 
Identities = 16/41 (39%), Positives = 19/41 (46%) 

Query: 334 RPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKTIFSTGTP 374 

RP+ TT ITLP+ P T T T+ ST TP 

Sbjct: 1261 RPSTLTTFTT-ITLPTTPTSFTTTTTTTTPTSSTVLST-TP 1299 

Score = 46 (6.9 bits). Expect = 5.4e-08, P - 5.4e-08 
Identities « 24/106 (22%), Positives = 37/106 (34%) 

Query: 324 hppsaaisiqrpaqsrdvttritlpshpalgtpkqqlhtmaqktifstgtpvaaatvapi 383 

+ PP A++ ++ST+PGQA G I 
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Sbjct: 1196 YPPGASVPTEETCKSCVCTNSSQVVCRPEEGKILNQTQDGAFCYWEICGPNGTVEKHFNI 1255 

Query: 384 LATNTIPSA-TTAGSVSHTQAPTSTIVTMTVPSHSSHATAVTTSNI 428 

+ T PS TT + + + PTS T T + +S TT + 

Sbjct: 1256 CSITTRPSTLTTFTTZTLPTTPTSFTTrTTTTTPTSSTVLSTTPKL 1301 

Score = 44 (6.6 bits). Expect = 8.7e-08, P = 8.7e-08 
Identities = 14/34 (41%), Positives = 17/34 (50%) 

Query: 478 RPSVPVQFQYF-LPTYPPSAYPLAAHTYTPITSSV 511 

RPS F LPT P S + T TP +S+V 

Sbjct: 1261 RPSTLTTFTTITLPTTPTS-FTTTTTTTTPTSSTV 1294 

Pedant information for DKFZphtes3_2all, frame 2 


Report for DKFZphtes3 2all.2 


[LENGTH] 

[MW] 

fpU 

[HOMOLJ 

IFUNCATJ 

[FUNCAT] 

( FUNCAT ) 

(FUNCAT 3 

[FUNCAT] 

4e-09 

{ FUNCAT 1 

[FUNCAT] 

le-05 

(FUNCAT] 

(FUNCAT] 

[EC] 

[PIRKW] 

(PIRKW) 

I PIRKW] 

(PIRKW] 

(PIRKW] 

[PIRKW] 

[PIRKW] 

[SUPFAM] 

[SUPFAM] 

(SUPFAM] 

(SUPFAM] 

(SUPFAM] 

(PROSITE) 

(PROSITEJ 

[PROSITE] 

(PROSITE] 

(PROSITE) 

[PROSITEJ 

(KWJ 

(KW) 


1048 

110324.04 
9.83 

PIR: 147141 gastric mucin (clone PGM-2A) - pig (fragment) 8e-15 

30.90 extracellular/secretion proteins [S. cerevisiae, YIR019c] le-09 

30.01 organization of cell wall (s. cerevisiae, yiR019c] le-09 
01.05.01 carbohydrate utilization (S. cerevisiae, YIR019c] le'09 

30.02 organization of plasma membrane (S. cerevisiae, YDR420wl 4e-09 


01.05.04 regulation of carbohydrate utilization 


[S. cerevisiae, YDR420w] 


98 classification not yet clear«cut [S. cerevisiae, YJRlSlc) 4e-06 

03,04 budding, cell polarity and filament formation (S. cerevisiae, YGR014w] 

11.01 stress response (S. cerevisiae, YHL028mJ le-04 

09.01 biogenesis of cell wall (S. cerevisiae, YHL028w] le-04 

3.2.1.3 Giucan 1, 4-alpha-glucosidase 3e-08 

glycosidase 3e-08 

transmembrane protein 3e-08 

polysaccharide degradation 3e-08 

glycoprotein 9e-08 

calcium binding 96-08 

hydrolase 3e-08 

cytoskeleton 7e-08 

equine herpesvirus glycoprotein X 2e-07 

yeast giucan 1, 4-alpha-glucosidase homolog 3e~08 

polymorphic epithelial mucin 7e-08 

giucan 1, 4-alpha-glucosidase homology 3e-08 

equine herpesvirus 1 glycoprotein homology 2e-D7 

MYRISTYL 9 

AMIDATION 1 

CAMP_PHOSPHO_SITE 2 

CK2_PH0SPHDSITE 10 

PKC_PHOSPHO_SITE 12 

ASN^GLYCOSYLATION 3 

Irregular 

LOW COMPLEXITY 20.04 % 


SEQ 
SEG 
PRO 

SEQ 
SEG 
PRO 

SEQ 
SEG 
PRO 

SEQ 
SEG 
PRO 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 


MGPPRHPQAGEIEAGGAGGGRRLQVEMSSQQFPRLGAPSTGLSQAPSQIANSGSAGLINP 
xxxxxxxxxxxx 

ccccccccccccccccccccceeeeeeccccccccccccccccccccccccccccccccc 

AATVNDESGRDSEVSAREHMSSSSSLQSREEKQEPWVRPYPQVQMLSTHHAVASATPVA 

xxxxx xxxxxxxxxxxx 

cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

VTAPPAHLTPAVPLSFSEGLMKPPPKPTMPSRPIAPAPPSTLSLPPKVPGQVTVTMESSI 
xxxxxxxxxxxxx xxxxxxxxxx . . xxxxxxxxxxxxxx 

cccccccccccccccccccccccccccccccccccccccccccccccccccceeeccccc 

PQASAIPVATISGQQGHPSNLHHIMTTNVQMSIIRSNAPGPPLHIGASHLPRGAAAAAVM 

xxxxx . . 

cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SSSKVTTVLRPTSQLPNAATAQPAVQHIIHQPIQSRPPVTTSNAIPPAWATVSATRAQS 

cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

PVITTTAAHATDSALSRPTLSIQHPPSAAISIQRPAQSRDVTTRITLPSHPALGTPKQQL 

cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 
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SEQ HTMAQKTirSTGTPVAAATVAPILATNTIPSATTAGSVSHTQAPTSTIVTMTVPSHSSHA 

SEG xxxxxxxxxx xxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ TAVTTSNIPVAKVVPCXJITHTSPRIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPS 

SEG xxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccceeeecccccccc 

SEQ VPVQFQYFLPTYPPSAYPLAAHTYTPITSSVSTIRQYPVSAQAPNSAITAQTGVGVASTV 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ HLNPMQLMTVDASHARHIQGIQPAPISTQGIQPAPIGTPGIQPAPLGTQGIHSATPINTQ 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ GLQPAPMGTQQPQPEGKTSAVVLADGATIVANPISNPFSAAPAATTWQTHSQSASTNAP 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ AQGSSPRPSILRKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVPPTAQQP 

SEG xxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ PPTIPTMIAAASPPSQPAVALSTI PGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVP 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccceeeccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ EI KVKEEVEPMDIMRPVSAVPPLATNTVSPSLALLANNLSMPTSDLPPGASPRKKPRKQQ 

SEG xxxxxxxxxx xxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ HVISTEEGDtmETNSTDDEKSTAKSLLVKAEKRKSPPKEYIDEEGVRYVPVRPRPPITLL 

SEG xxxxxxxxxxx. . . . 

PRD ccccccccccccccccccccchhhhhhhhhccccccccccccccccccccccccccccee 

SEQ RHYRNPWKAAYHHFQRYSDVRVKEEKKAMLQEIANQKGVSCRAQGWKVHLCAAQLLQLTN 

SEG 

PRD eeccccchhhhhhhccccchhhhhhhhhhhhhhhhhccceeecccceeehhhhhhhhhhc 

SEQ LEHDVYERLTNLQEGIIPKKKAATDDDX^HRINELIQGNHQRCKLVMDQISEARDSMLKVL 

SEG 

PRD cchhhhhhhhhhhceeeeccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ DHKDRVLKLLNKNGTVKKVSKLKRKERV 

SEG xxxxxxxxxxxxx 

PRD hhhhhhhhhhccccceeeeeeeeccccc 


Prosite for DKFZphtes3_2all .2 


PSOOOOl 

ei8->822 

PSOOOOl 

854->858 

PSOOOOl 

1033->1037 

PS00004 

872->876 

PS00004 

1037->1041 

PS00005 

68->71 

PS00005 

75->78 

PS00005 

242->245 

PS00005 

342->345 

PS00005 

355->368 

PS00005 

442->445 

PS00005 

513->516 

PS00005 

665->668 

PS00005 

831->834 

PS00005 

862->865 

PS00005 

940->943 

PS00005 

1035->1038 

PSOQ006 

63->67 

PS00006 

68->72 

PS00006 

75->79 

PS00006 

88->92 

PS00006 

135->139 

PS00006 

473->477 

PS00006 

844->848 

PS00006 

855->859 

PS00006 

959->963 

PS00006 

984->988 

PS00008 

15->21 


ASN_GL YCOS YLAT ION 
ASN_GLYCOS YLAT ION 
ASNGLYCOS YLAT ION 
CAHP_PHOS PHO_S XTE 
CAMP PHOSPHORS ITE 
PKC_PHOS PHO_S I TE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPH0_SITE 
PKC_PHOS PHO_S ITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC_PHOS PHO_S ITE 
PKC_PHOSPHO_SITE 
PKC_PHOS PHO_S ITE 
CK2_PHOSPHO_SITE 
C K2_PH0S PHO_S ITE 
CK2_PH0S PHO_S I TE 
CK2_PH0SPH0_SITE 
CK2_PH0S PHO_S I TE 
CK2_PHOS PHOS ITE 
CK2_PH0SPH0_SITE 
CK2_PHOSPHO SITE 
CK2_PH0SPH0~SITE 
C K2_PHOS PHO~S ITE 
MYRISTYL 


PDOCOOOOl 
PDOCOOOOl 
PDOCOOOOl 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
POOC00005 
POOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOCDG006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
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PS00008 

16->22 

MYRISTYL 

PDOC00008 

PS00008 

36->42 

MYRISTYL 

PDOC00008 

PS00008 

233->239 

MYRISTYL 

POOC00008 

PS00008 

372->378 

MYRISTYL 

PDOC00008 

PS00008 

533->539 

MYRISTYL 

PDOC00008 

PSO0OO8 

535->541 

MYRISTYL 

PDOC00008 

PS00008 

590->596 

MYRISTYL 

PDOC0O008 

PSO0O08 

768->774 

MYRISTYL 

PDOC00008 

PS00009 

19->23 

AMI DAT ION 

PDCX:00009 


(No Pfara data available for DKFZphtes3_2all.2» 
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group: metabolism 

DKFZphtes3_2al7 encodes a novel 574 amino acid protein without similarity to known proteins. 
The novel protein contains a thiol protease cys pattern. Eukaryotic thiol proteases {EC 
lelonl'to ?hLy:teaL^'aSI?y!°'^''' containing an active site cysteine. Cathepsins 

The new protein can find application in modulation of proteolytic processes and as a new 
enzyme for proteomic analysis and bio technologic production processes. 


unknown 

complete cDNA, complete cds, EST hits 

Sequenced by EMBL 

Locus : unknown 

Insert length: 2312 bp 

Poly A stretch at pos. 2300, polyadenylation signal at pos. 2273 


1 GTTTTCACCT GATCATTAGA AACTAATGAA ACACCTTTTA AGTCTTATGA 
51 ATTCAGGTTA CACTGTTTTC CAGATGCCTT GGCAGCTGGT ACAGGGCCTC 
101 TGAAAAATGG AACCAAATTC TCTGAGGACT AAAGTCCCAG CTTTCTTATC 
151 TGATTTGGGG AAGGCCACAT TGAGGGGAAT CAGAAAGTGT CCCCGATGTG 
201 GCACATACAA TGGAACCCGG GGACTGAGCT GTAAGAACAA GACATGTGGA 
251 ACCATATTCC GCTACGGTGC ACGCAAGCAG CCTAGTGTTG AAGCTGTCAA 
301 AATCATTACA GGCTCTGATC TTCAGGTCTA CTCAGTGCGG CAAAGAGACC 
351 GGGGCCCTGA TTACCGATGC TTTGTGGAGC TCGGGGTTTC AGAGACAACA 
401 ATCCAGACAG TGGATGGGAC GATCATCACT CAGCTGAGCT CTGGACGGTG 
4 51 TTATGTCCCC TCATGCCTGA AAGCTGCCAC TCAAGGCGTT GTGGAAAACC 
501 AGTGCCAGCA CATCAAGCTG GCGGTGAACT GCCAGGCAGA GGCCACCCCT 
551 CTGACCCTGA AGAGCTCGGT CCTGAATGCA ATGCAGGCCT CCCCGGAAAC 
601 CAAAGAGACC ATCTGGCAGT TGGCCACGGA ACCCACAGGT CCTCTGGTGC 
651 AGAGAATTAC TAAAAACATC TTGGTGGTGA AATGCAAGGC AAGCCAGAAG 
701 CACAGTTTGG GGTATTTGCA TACATCTTTT GTGCAGAAAG TCAGTGGCAA 
751 AAGCTTGCCT GAGCGCCGCT TCTTCTGCTC CTGTCAGACT CTGAAATCGC 
801 ACAAGTCAAA TGCCTCCAAG GATGAGACAG CCCAGAGATG CATTCATTTC 
851 TTTGCTTGCA TCTGTGCCTT TGCCAGTGAT GAGACACTGG CTCAGGAATT 
901 CTCAGACTTC CTAAATTTTG ATTCCAGCGG TCTTAAAGAG ATTATTGTAC 
951 CCCAGTTAGG TTGCCATTCA GAATCAACAG TATCTGCTTG TGAGTCTACT 
1001 GCCTCTAAGT CAAAGAAGAG GAGAAAGGAT GAAGTATCTG GTGCACAGAT 
1051 GAACAGTTCA CTACTGCCTC AAGATGCAGT GAGCAGTAAT CTAAGGAAAA 
1101 GTGGCCTGAA AAAGCCTGTG GTTGCTTCCT CGTTAAAAAG GCAGGCCTGT 
1151 GGTCAGCTGT TAGATGAGGC ACAAGTGACT TTATCCTTCC AAGACTGGCT 
1201 GGCCAGTGTC ACAGAACGCA TCCATCAAAC CATGCACTAT CAGTTTGATG 
1251 GCAAACCAGA ACCATTGGTG TTCCACATTC CTCAGTCATT TTTTGATGCC 
1301 CTGCAACAAA GAATATCTAT AGGAAGTGCA AAAAAACGGC TCCCCAACTC 
1351 CACCACAGCT TTTGTTCGGA AAGATGCCTT GCCACTGGGA ACCTTTTCCA 
1401 AGTATACTTG GCATATCACT AATATCCTGC AAGTTAAACA AATCTTAGAT 
1451 ACCCCAGAGA TGCCCTTGGA AATCACCCGT AGCTTTATCC AGAACCGAGA 
1501 TGGGACTTAT GAGCTATTTA AATGCCCTAA AGTGGAAGTA GAAAGCATAG 
1551 CAGAAACCTA CGGTCGTATA GAAAAACAAC CAGTGCTGCG ACCCTTGGAA 
1601 CTAAAAACTT TTCTCAAAGT TGGCAACACT TCCCCAGATC AAAAGGAGCC 
1651 AACACCTTTC ATCATCGAGT GGATCCCAGA TATCCTTCCC CAATCTAAGA 
1701 TTGGCGAGCT GCGGATCAAG TTTGAGTATG GCCACCACCG GAATGGGCAT 
1751 GTGGCGGAGT ACCAAGACCA GCGGCCCCCC TTGGACCAGC CCTTGGAACT 
1801 GGCCCCTCTG ACCACTATTA CTTTCCCTTA AAGCAAAACA AGATAATAAT 
1851 CTTTTGCTGC TTAATTTGCA CATCCCCACC CCTTGACAAC TTTAAATGCT 
1901 AGTTAGGCAC TTAGATGGCC CTGTTCCTTG GTAAACTGCT CTTAGCTAAG 
1951 ATGCAAATTC TCAGTGCTTT CAAGTGGATT CTGTTGAAGA AAATCTCTTG 
2001 TAAATAGCCT TTTTGATGCT GCTGTGTACA GTCTTCATTA TGCATTGGGC 
2051 AGTATTTCTG GCTAGAGTTT TAAAAGGAAC AGAAAGAAAA CCAGCTTATT 
2101 TTCCTTCTTA CGGACTCATC TTTAGCGTTT ATTTCAACCT TTTGCTAATT 
2151 CTCTGAGAAA TCTGCAGCAC TCAGCCATAC ACCAACAGTG TTGGAAAGTT 
2201 AACACCCTGG TTAGGGCAGA ATGTTAAAGA CCATCTTGGC AGAGTTCCAG 
2251 CCACGCTCTT TATTCTGTTC TCAAATAAAG CAGTGTCACT AGTTTTTCCT 
2301 AAAAAAAAAA AA 


BLAST Results 


No BLAST result 
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Medline entries 

No Medline entry 

Peptide information for frame 2 


ORF from 107 bp to 1828 bp; peptide length: 574 
Category:- putative protein 


1 MEPNSLRTKV PAFLSDLGKA TLRGIRKCPR CGTYNGTRGL SCKNKTCGTI 
51 FRYGARKQPS VEAVKIITGS DLQVYSVRQR DRGPDYRCFV ELGVSETTIQ 
101 TVDGTIITQL SSGRCYVPSC LKAATQGWE NCJCQHIKLAV NCQAEATPLT 
151 LKSSVLNAMQ ASPETKQTIW QLATEPTGPL VQRITKNILV VKCKASQKHS 
201 LGYLHTSFVQ KVSGKSLPER RFFCSCQTLK SHKSNASKDE TAQRCIHFFA 
251 CICAFASDET LAQEFSDFLN FDSSGLKEII VPQLGCHSES TVSACESTAS 
301 KSKKRRKDEV SGAQMNSSLL PQDAVSSNLR KSGLKKPWA SSLKRQACGQ 
351 LLDEAQVTLS FQDWLASVTE RIHQTMHYQF DGKPEPLVFH IPQSFFDALQ 
401 QRISIGSAKK RLPNSTTAFV RKDALPLGTF SKYTWHITNI LQVKQILDTP 
451 EMPLEITRSF TQNRDGTYEL FKCPKVBVES lAETYGRIEK QPVLRPLELK 
501 TFLKVGNTSP DQKEPTPFII EWIPDILPQS KIGELRIKFE YGHHRNGHVA 
551 EYQDQRPPLD QPLELAPLTT ITFP 


BLASTP hits 


No BLASTP hits available 


Alert BLASTP hits for DKFZphtes3_2al7, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_2al7, frame 2 

Report for DKF2phtes3_2al7.2 

[LENGTH) 574 

[MW] 64076.89 

[pl] 9.15 

(PROSITEj MYRISTYL 5 

fPROSITEJ CK2_PH0SPH0__SITE 9 

(PROSITEJ PKCPHOSPHO^SITE 14 

[PROSITEJ ASN^GLYCOSYLATION 5 

I PROSITEJ THIOLPROTEASE CYS 1 

(KWJ Alpha_Beta 

SEQ MEPNSLRTKVPAFLSDLGKATLRGIRKCPRCGTYNGTRGLSCKNKTCGTIFRYGARKQPS 
PRD ccccccccccchhhhhcccchhhhhcccccccccccccccccccccccceeeeccccccc 

SEQ VEAVKIITGSDLQVYSVRQRDRGPDYRCFVELGVSETTIQTVDGTIITQLSSGRCYVPSC 
PRD ceeeeeeecccceeeeeccccccccceeeeeecccccceeeccceeeeeecccccccchh 

SEQ 
PRD 


LKAATQGVVENQCQHIKLAVNCQAEATPLTLKSSVLNAMQASPETKQTIWQLATEPTGPL 
hhhhhhhhcchhhhheeehhhhhhhcccccchhhhhhhhhcccchhhhhhhhhcccccch 

VQRITKNILVVKCKASQKHSLGYLHTSFVQKVSGKSLPERRFFCSCOTLKSHKSNASKDE 
hhhhhhheeeeeecccccccccccceeeeeeecccccccceeeecccccccccccccccc 

TAQRCIHFFACICAFASDETLAQEFSDFLNFDSSGLKEIIVPQLGCHSESTVSACESTAS 
hhhhhhhhhhhhhhhhhchhhhhhhhhhhccccccceeeeeecccccccceeeccccccc 

KSKKRRKDEVSGAQMNSSLLPQDAVSSNLRKSGLKKPWASSLKRQACGQLLDEAQVTLS 
ccchhhhhccccccccccccccccchhhhhhhccccceeehhhhhhhhhchhhhhhhhhh 

FQDWLASVTERIHQTMHYQFDGKPEPLVFHIPQSFFDALQQRISIGSAKKRLPNSTTAFV 
hhhhhhhhhhhhhhhhhhhcccccccceeehhhhhhhhhhhhhhhhcccccccccceeee 

RKDALPLGTFSKYTWHITNILQVKQILDTPEMPLEITRSFIQNRDGTYELFKCPKVEVES 
ecccccccccceeeeehhhhhhhhhhhccccccccceeeeeeccccceeeecccceeeeh 

II?. JAETYGRIEKQPVLRPLELKTFLKVGNTSPDQKEPTPFIIEWIPDILPQSKIGELRIKFE 
PRD nhhhhhhhhccccccccccceeeeecccccccccccceeeeecccccccccccceeeeee 
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PRD 

SEQ 
PRD 

SEQ 
PRD 

SEQ 
PRO 

SEQ 
PRD 
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SEQ YGHHRNGHVAEYQDQRPPLDQPLELAPLTTITFP 
PRO ecccccceeeeccccccccccccccccceeeccc 
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PSOOOOl 

35->39 

ASH 

GLYCOSYLATION 

PSOOOOl 


4->48 

ASN 

GLYCOSYLATION 

PSOOOOl 

235 

->239 

ASN 

""GLYCOSYLATION 

PSOOOOl 

316 

->320 

ASN 

"GLYCOSYLATION 

PSOOOOl 

414' 

->418 

ASN 

"GLYCOSYLATION 

PS00005 


5->8 

PKC" 

PHOSPHO 

SITE 

PS00005 

21->24 

PKC 

"pHOSPHO' 

"site 

PS00005 

4: 

l->44 

PKC' 

'PHOSPHO' 

'site 

PS00005 

7i 

5->79 

PKC 

"PHOSPHO" 

"site 

PS00005 

112- 

->115 

PKC 

'PHOSPHO" 

"site 

PS00005 

150' 

->153 

PKC 

'PHOSPHO' 

'site 

PS00005 

196->199 

PKC' 

"PHOSPHO' 

'site 

PS00005 

213- 

->216 

PKC" 

'PHOSPHO' 

site 

PS00005 

228- 

->231 

PKC' 

"PHOSPHO 

'site 

PS00005 

231- 

->234 

PKC' 

"pHOSPHO" 

'site 

PS00005 

302- 

->305 

PKC' 

"PHOSPHO" 

"site 

PS00005 

342- 

->345 

PKC" 

"PHOSPHO* 

"site 

PS00005 

369- 

->372 

PKC' 

"PHOSPHO" 

"site 

PS00005 

407->410 

PKC PHOSPHO 

"site 

PS00006 

68->72 

CK2 PHOSPHO" 

"site 

PS00006 

216- 

->220 

CK2 

PHOSPHO 

"site 

PS00006 

237- 

■>241 

CK2~ 

"PHOSPHO" 

"site 

PS00006 

293- 

->297 

CK2' 

PHOSPHO 

'site 

PS00006 

360- 

'>364 

CK2" 

PHOSPHO 

"site 

PS00006 

367- 

■>371 

CK2" 

PHOSPHO" 

SITE 

PS00006 

394- 

■>398 

CK2 

PHOSPHO 

site 

PS00006 

480- 

■>484 

CK2~ 

■pHOSPHO" 

SITE 

PS00006 

508- 

>512 

CK2' 

PHOSPHO" 

site 

PS00008 

32 

:->38 

MYRISTYL 


PS00008 

93 

l->99 

MYRISTYL 


PS00008 

104- 

>110 

MYRISTYL 


PS00008 

127- 

>133 

MYRISTYL 


PS00008 

312- 

>318 

MYRISTYL 


PS00139 

109- 

>12i 

THIOL_PROTEASE_CYS 


PDOCOOOOl 

PDOCOOOOl 
PDOCOOOOl 
PDOCOOOOl 
PDOCOOOOl 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 

pDocodooe 

PDOC00006 
PDOC0D006 
PDOC00006 

PEocooooe 

PDOCCOOOS 
PDOC00008 
PDOC00008 
PDOCCOOOS 
PDOC00008 
PDOC00126 


(No Pfam data available for DKFZphtes3_2al7, 2) 
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DKF2phtes3_2dl5 


group: testes derived 

?'e"Sa^l'?of"/"r!?Sri" P-'-" Similarity to 

The novel protein contains a Pfam predicted C2-domain. 

NO informative BLAST results; No predictive prosite. pfam or SCOP motife. 

genes'" ^PPiication in studying the expression profile of testis-specific 


similarity to C.elegans F25H2.1 

complete cDNA, complete cds, EST hits 

Sequenced by EMBL 

Locus: unknown 

Insert length: 3615 bp 

Poly A stretch at pos. 3603, polyadenylation signal at pos. 3578 

^f^'^^l CGAGGTGACA ACTGTCTCCG TCGCAGGCTC CGGCGGGGGC 
,nJ GCCCGGCGCG TCACTGTCGG GTCGGCGAGC CACGGGGGCC 

I?, ^^S^^*^**^ CATGGCGACC ACCGTCAGCA CTCAGCGCGG GCCGGTGTAC 
151 ATCGGTGAGC TCCCGCAGGA CTTCCTCCGC ATCACGCCCA CACAGCAGCA 
201 GCGGCAGGTC CAGCTGGACG CCCAGGCGGC CCAGCAGCTG CAGTACgSg 
251 GCGCAGTGGG CACCGTGGGC CGACTGAACA TCACGGTGG? ACAG^^I 
«J rr^""^"" *TTACGGCAT GACCCGCATG GACCCCTACT gS^^G 
An, GCGGTGTACG AGACGCCCAC GGCACACAAT GGCGCCAAGA 

22? iJSS^S^IS S^^^^GCTC ATCCACTGCA CGGTGCCCCC AGGCGTGGAC 
451 TCTTTCTATC TCGAGATCTT CGATGAGAGA GCCTTCTCCA TGGACGACCG 
501 CATTGCCTGG ACCCACATCA CCATCCCGGA GTCCCTGAGG CMG^gI 
«m rrr^^^^ GTGGTACAGC CTGAGCGGGA GGCAGgI^ 

601 ggcatgatca acctcgtcat gtcctacgcg ctgcttccag ctgccatggt 

?m SJ^.SS^'^^'^ CAGCCCGTGG TCCTGATGCC AACAGTGTAC CAGCAGGGCG 

E??E™^^t gcccatcaca gggatgcccg ctgtctgtag ccccggcatg 

anJ ^^^^^^SJ*^ CCCTGCCCCC GGCCGCCGTG AACGCCCAGC CCCGCTGTAG 
801 CGAGGAGGAC CTGAAAGCCA TCCAGGACAT GTTCCCCAAC ATGGACCAGG 
851 AGGTGATCCG CTCCGTGCTG GAAGCCCAGC GAGGGAACAA gSS^C 

«i tJ™'''^^ tgctgcagat gggggaggag ccatagagcc tctgccSga 
inm IS^I^IJS cccccgctct ttggacacgc cgacccggcg ctccccaagg 
^l^l?^'^ caacaagatt cccgtgaaag agcacccgtg tcgccccctc 

^ES^*^^^ CTGTGCCGCC CCGTCCACAC CTGrTCTTGG GTGCATGTGG 
JJ?, ^nr^^'^^ CCTGGCGGTC CAGGACGGGG CGGGGGCTCC CCTCCCATCT 
1151 CGTGCTGGGA GGTCTCAGCG CGCTCTCCTG TCCCTGGGAC GTGCGTCTCT 
f^l^^l^ CCGTTCTGGA AAATGCTCTT GCTGTAGAGA GCAGCTGCTT 
1251 CTGCCAGGGT GTTGGAGGTG GTGGAGCGCC TTCCGATTCC ATTCATGGCA 
llVi I^JS?^? TGATGTAATT GGAATAGAGC TGTTGATT?A A^S 
1351 AATCCCTCAC ACTGTGGGTT TTTTTTAGAA CTTCCCAGAC GAAAACTCAC 
1401 GCCCTTGCCC TAACGCGCTT TGCTGTGAGC CTGGCCCCTG CCCAGGGCTT 
^12'"'^°^° AGCTGAGCAG CTTCCTGTGG ATGGTGTGGG GCCGGCCtS 
^n^^'''''"' *CCTGGCCAC TGTCCAGCCA GCCTTGTGAC AGACTCCGGC 
1551 CTGAAGGCAG AATGAACCCA CACCTGGAGT GAGGAAGGGG GCCTGGCACG 
^S^^^ CTCTGCCTGA TTGCCAGCCA GCGGGCATCT GAAGCCGGGT 
^.^'^'^^ CCGGAGGCTG CCGTCCGTCT CTCCTGCTGC GCTCGTGCCA 
Ml] ^l^^nl?^ TGTCCTCCCA GGGAGCTTCT CTTCTCAACA GGCCTTGCGA 
1751 GGCTGGGGTG AGAGGTGATA GAGGCAGCAC TGTGCATGAT TCCGAGAGGG 
1801 TGTGGTGGCA CTGCCAGCCG ACTGCTGACA GCTTGGGAGC TG^TCT^^ 
1851 AGGACGTGGG TTCAGCGTGG GCGAGGAAAG CCTGGCGAGC GTGGCCCTGT 
i J.] ^^^"^"^^ TGAGGCGGGA GGCGCTCACT TACCTCTGAC TGCCTGGGCG 
^^^^l^i CATCTTGGCC TACAG«SACAG ATTTTAGGTG ACACCTGGT? 
2001 ATGACAGTCA GAAATTTGAG AAGCTTCTCA CAAGTGATGC ACTTTAAATA 
tl^'^^.^l'^'^ CATTGAGACA CCTGCATGTC TGGTGTTTGT GGTTCAAGTG 
I s, ISJ^n^f^^ GGCCTTCGGA TGTAAACCCA CTGATAACGG ACAGAAAGAG 
oi^? AATGCCCACA AGTGGGTCTT CTGTGGAAGA TGCAGAAGGA GGAAGTTAGT 
IIV; ^JI^S^IJI ™='rCTTTTT CTCCCTCAAA AAAATAGGTT AAGTTTCAGT 
2251 GCCAGCTAGA AAATACTGCT TTCTGCCATC GATTGGGGGT GGTTTTTGTC 
IIV; ^'^l^^'^ TTGATAAATA TTTATTTTTG TAAACTTGAA GTGTGTGGTG 
2351 GCCGTCGGGG AGGGACATGC TGGCAGCAGG CGCCTTCTTC AGCTGTGGGT 
»rl?^^^^ TTTGATCCTT TGAAGAAGAA AGaSJa J^tcScaS 
2451 AGACGCCGAC CACTCAGACG GAGGGGCCCC TGGGATTCCC TGTCTCAGAT 
2501 GGCCTGGTCT TACGCCTGTG TAGATTTCTT CTCCATTGGG AMGAA^TG 
2551 TCAGGCGGGA CTGGAACGTT CTAGATGGTA TGTTCCGTGA TATTAACAAC 
2601 TCTAACCCAG GACAGACCAC AAGCCACACT CAGAGGCCTC ACTGTgS^G 
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2651 GGGCTTCGGT GTCCAGGCGC CCAGGTGTGG CCACCAGCAC CGGTTTCTGC 
2701 CTTCGCGTTG CTGGGGTGCA GTGAGACTGC CACACGCGTG CACATGTGGC 
2751 TCTGTGGGTG TCTCCTAGAG AGGACGTGGC CCCTGCTGCC AGCCCTTGAG 
2801 CAGCCCGTGT GGGGGCCCGA GGGACCCACA CAGTGGGGGC CAGCCTCGCT 
2851 GGAGGGAGAG CAACCCTTTG CCGATGACCA CGCTTGCCGC CATCTCTTAG 
2901 TTTTCTTTTT CACAAGCGCT TTATTTTTTT AATAGACAAA TCACATTTTG 
2951 CAAGGCCTTT AATTAAATAA GATTCTTCTT TCCTTCATTT TATGCTTTAT 
3001 TTCCTGTTTG AAGGCTTACT GTAGAAGTGG CTTACTGTAG AAGCAGCTTG 
3051 CTGAGCCCCT CCGAGCGGTC CCCAGAATTA GCTGGTTCAC AACCCCCACC 
3101 CTCCCCCGCC CCCGCCTGTG TCAGGTGTGG ATGAGGTCGT CACACTCAGA 
3151 AGGACAGGCT TGTCTGCCAG CTCACAAGGG GAGGCTGCAG TGGGTTTGGG 
3201 AGCTGGGTTT AGGCCCCTGG TGTCTGAGGG CCCAGGCCTT GCCAGCCTCT 
3251 GCTGCTCCTG CTCCTGGGTT TGAAGATGCA GGCCGATCGC CAGCTCCGTG 
3301 GCAGCGGTCA CTAAGGACAG CCTGACTGTG CCATCTTGGA GCCTCAGGCG 
3351 GGGCTCCGGA GATAGAAGAC AGGTCGCCGG AGGCTCCCCC TCCTCTCCTC 
3401 TCCCCTCTGC AGATGCTCCC TGGGCGCTAC CCTGCAGGGT GCCAGGCAGG 
3451 AGTGGTCTCA GAACGTGCGC TTCTGATTAT TTTACTGGGG TCCATTGTCC 
3501 AGATTTTTCT TTGATTGTAA AATATATTTT TACTTTTTAG TCTTCTAATT 
3551 TAATAAATGA TCCATATAAA AATAGAGAAA TAAAGTCCTT TAAGGGAAGG 
3601 TTTAAAAAAA AAAAA 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 1 


ORF from 112 bp to 933 bp; peptide length: 274 
Category: similarity to unknown protein 
Classification: no clue 


1 MATTVSTQRG PVYIGELPQD FLRITPTQQQ RQVQLDAQAA QQLQYGGAVG 
51 TVGRLNITW QAKLAKNYGM TRMDPYCRLR LGYAVYETPT AHNGAKNPRW 
101 NKVIHCTVPP GVDSFYLEIF DERAFSMDDR lAWTHlTIPE SLRQGKVEDK 
151 WYSLSGRQGD DKEGMINLVM SYALLPAAMV MPPQPWLMP TVYQQGVGYV 
201 PITGMPAVCS PGMVPVALPP AAVNAQPRCS EEDLKAIQDM FPNMDQEVIR 
251 SVLEAQRGNK DAAINSLLQM GEEP 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3__2dl5, frame 1 

TREMBL:CEF25H2_1 gene: "F25H2.1"; Caenorhabditis elegans cosmid F25H2, 
N « 1, Score = 385, P = l.le-35 


>TREMBL:CEF25H2 1 gene: "F25H2.1"; Caenorhabditis elegans cosmid F25H2 
Length « 457 

HSPs: 

Score « 385 (57.8 bits). Expect = l.le-35, P = l.le-35 
Identities = 77/182 (42%), Positives « 118/182 (64%) 

Query: 4 TVSTQRGPVYIGELPQDFLRIT-PTQQQRQVQLDAQAAQQLQYGGAVGTVGRLNITWQA 62 

TV+ +R V+GELP FLR+ PQQ+++Q+ + + T GRL++T+++A 

Sbjct: 5 TVAERRRQVLVGELPPHFLRLAVPIQQTAEPEI-VQP-RMVSFVPP-NTRGRLSVTILEA 61 

Query: 63 KLAKNYGMTRMDPYCRLRLGYAVYETPTAHNGAKNPRWNKVIHCTVPPGVDSFYLEIFDE 122 

L KNYG+ RMDPYCR+R+G ++T AN + P WN+ ++ +P V+S Y++IFDE 
Sbjct: 62 NLVKNYGLVRMDPYCRVRVGNVEFDTNVAANAGRAPTWNRTLNAYLPMNVESIYIQIFDE 121 

Query: 123 RAFSMDDRIAWTHITIPESLRQGKVEDKWYSLSGRQGDDKEGMINLVMSYAL— LPAAMV 180 

+AF D+ I AW HI +P ++ 6 D+++ LSG+QG+ KEGMI+L S+A LP 
Sbjct: 122 KAFGPDEVIAWAHIMLPLAIFNGDNIDEYFQLSGQQGEGKEGMIHLHFSFAPIDLPLQQA 181 
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Query: 181 MPPQp 185 

P +p . 

Sbjct: 182 APAEP 186 

Score = 92 (13.8 bits). Expect = 1.8e-01, P = l,7e-01 
Identities = 26/68 (38%), Positives = 38/68 (55%) 

Query: 194 QQGVGYVPITGMPAVCSPGMVPV— ALP-PAAVNAQPRCSEEDLKAIQDMFPNMDQEVI 249 

G + + +p +p+ A P PA +EED K IQ+MFP +D+EVT 

Sbjct: 156 QQGEGKEGMIHLHFSFAPIDLPLQQAAPAEPAPAPLPVEITEEDTKEIQEMFPIVDKEVI 215 
Query: 250 RSVLEAQR 257 

+ +LE +R 
Sbjct: 216 KCILEERR 223 

Pedant information for DKFZphtes3_2dl5, frame 1 
Report for DKF2phtes3_2dl5. 1 

(LENGTHl 274 
(MW] 30281.97 
tpl] 5.68 

XHOMOLJ TREMBL:CEF25H2^1 gene: "F25H2.1"; Caenorhabditis elegans cosmid F25H2 4e>36 

[PFAMl C2 domain 

£KWJ Alpha_Beta 

tKWl LOW^COMPLEXITY 16.42 % 


MATTVSTQRGPVYIGELPQDFLRITPTQQQRQVQLDAQAAQQLQYGGAVGTVGRLNITW 
xxxxxxxxxxxxxxxxx 

cccccccccceeeeeccccceeeecccchhhhhhhhhhhhhhhhhcccccceeeeceeeh 
QAKLAKNYGMTRMDPYCRLRLGYAVYETPTAHNGAKNPRWNKVIHCTVPPGVDSFYLEIF 

PRD hhhhhhhhcccccccchhhhheeeeeecccccccccccccceeeeeccccc^ 

SEQ 
SEG 
PRD 


SEQ 
SEG 
PRD 

SEQ 
SEG 


DERAFSMDDRIAWTHITIPESLRQGKVEDKWYSLSGRQGDDKEGMINLVMSYALLPAAMV 
cccccccccceeeeccccccccccccccceeeeeccccccccccceeeeehhhhhhhS^ 


SEQ 
SEG 
PRD 


MPPQPWLMPTVYQQGVGYVPITGMPAVCSPGMVPVALPPAAVNAQPRCSEEDLKAIQDM 
xxxxxxxxxx •.•.•■«•••.«,,«,,,,,,,, xxxxxxxxxx 

ccccceeeeeeeeecccccccccccceeecccccccccccceeeeccccchhhhhhhhhc 

SEQ FPNMDQEVIRSVLEAQRGNKDAAINSLLQMGEEP 
SEG 

PRD ccccchhhhhhhhhhhccccchhhhhhhhhhccc 
(No Prosite data available for DKrZphtes3_2dl5.1) 
Pfam for DKF2phtes3_2dl5.1 
HMM_NAME C2 domain 

HMM *LtVrIIeARNLWkMDMnGfSDPYVKVdMdPdpkDtkKWKTkTiWNNGLN 
L++++++A+ + + M+ DPY+++ + + + +T T +N N 

Q"®^y 55 LNITVVQAKLAKNYGMT-RMDPYCRLRLGYAVY ETPTAHNGAKN 97 

HMM PVWNEEeFvFedlPyPdlqrkMLRFaVWDWDRFSRBDFIGHCi* 

P+WN + +p + + ++++D+ FS +D 1+ + 
^^^y 98 PRWN-KVIHCT-VPPGVDSF YLEIFDERAFSMDDRIAWTH 135 
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DKFZphtes3_2el2 


group: Transcripcion Factors 

DKFZphtes3_2el2 encodes a novel 849 amino acid protein with similarity to Zinc finger 
proteins . 

The new protein is a putative transcription factor with three C2H2 zinc fingers. Additionally, 
a cytochrome C family heme-binding site signature is present in the protein, which is only 
found in cytochrom C related proteins. 

The new protein can find application in modulating /bloc king the expression of genes controlled 
by this transcription factor. 


similarity to finger proteins 

complete cDNA, complete cds, 5 EST hits 

Sequenced by EMBL 

Locus: unknown 

Insert length: 3205 bp 

Poly A stretch at pos. 3192, polyadenylation signal at pos. 3171 


1 GGCACGGCCG GGTCCTGGCT GGCCAAACGA GGCTCGCGGA AGCAGCAGCC 
51 GCCGCCTGAC CGCAGCTGGA TTTTGAAGAT TGATCCAAGG GACTGTATTA 
101 ATTTCAGGAA TTGATTTGAA AGACACTGGC TCTGCCACTT AACAGCCATG 
151 TAACCTTGGA TATGGAAGAA AGTAGCAGTG TTGCCATGTT GGTGCCAGAT 
201 ATTGGGGAAC AGGAAGCTAT ACTGACTGCT GAAAGTATCA TCAGTCCTTC 
251 ATTGGAAATT GATGAACAAA GAAAAACTAA ACCAGATCCA TTAATCCATG 
301 TTATCCAGAA GTTAAGCAAG ATAGAAAAAT GAAAAGTCAC AAAAATGTCT 
351 TTTAATTGGG AAGAAACGCC CACGTTCAAG TGCTGCAACA CACTCTCTTG 
401 AAACCCAAGA ACTTTGTGAG ATTCCGGCTA AAGTAATCCA GTCACCTGCT 
4 51 GCTGATACTA GAAGGGCTGA GATGTCACAA ACAAATTTTA CCCCTGACAC 
501 TCTTGCCCAG AATGAAGGGA AGGCTATGTC TTATCAGTGT AGCCTTTGTA 
551 AGTTTCTATC ATCATCCTTT TCCGTGTTAA AAGATCATAT TAAGCAACAT 
601 GGTCAGCAAA ATGAAGTGAT ACTGATGTGC TCAGAGTGCC ATATTACATC 
651 TAGAAGCCAG GAGGAACTTG AAGCCCACGT GGTGAATGAC CATGACAATG 
701 ATGCCAATAT CCACACCCAA TCCAAAGCCC AACAGTGCGT AAGCCCCTCC 
751 AGCTCTTTGT GTCGGAAAAC CACAGAAAGA AATGAAACCA TTCCAGATAT 
801 CCCAGTAAGT GTGGACAATC TACAGACTCA TACTGTCCAA ACTGCATCTG 
851 TGGCAGAAAT GGGTAGGAGG AAATGGTATG CATACGAACA GTACGGCATG 
901 TATCGATGCT TGTTTTGTAG TTATACTTGT GGCCAGCAGA GAATGTTGAA 
951 AACACACGCT TGGAAACATG CTGGGGAGGT TGATTGCTCC TATCCAATCT 
1001 TTGAAAATGA AAATGAACCC CTAGGCCTGC TGGATTCTTC AGCAGCTGCT 
1051 GCGCCTGGTG GGGTCGATGC AGTCGTCATT GCTATTGGAG AGAGTGAACT 
1101 GAGTATCCAC AATGGGCCAT CAGTGCAAGT GCAGATTTGC AGCTCAGAAC 
1151 AGTTATCATC TTCATCTCCT TTAGAACAGA GTGCAGAAAG AGGAGTACAC 
1201 CTAAGTCAGT CAGTTACCCT GGACCCCAAT GAGGAAGAAA TGCTAGAAGT 
1251 GATTTCTGAT GCAGAGGAGA ATCTGATTCC TGATAGCCTG CTTACATCAG 
1301 CACAGAAAAT CATCAGCAGC AGCCCC/^TA AAAAAGGGCA TGTTAACGTG 
1351 ATAGTGGAGC GATTGCCAAG TGCTGAAGAA ACCCTTTCAC AGAAGCGCTT 
1401 CCTCATGAAC ACTGAAATGG AAGAAGGGAA GGACCTGAGC CTGACAGAAG 
1451 CTCAGATT6G GCGCGAAGGA ATGGATGATG TTTATCGTGC TGATAAATGT 
1501 ACTGTTGATA TTGGGGGATT GATCATAGGC TGGAGCAGTT CAGAGAAAAA 
1551 AGACGAGTTA ATGAATAAAG GCCTGGCTAC TGATGAGAAT GCCCCACCAG 
1601 GCCGGAGAAG GACAAATTCT GAGTCTCTTC GATTACACTC ATTAGCTGCA 
1651 GAAGCCCTTG TCACAATGCC TATAAGAGCT GCAGAGTTGA CAAGAGCCAA 
1701 CCTGGGGCAC TATGGAGATA TAAACCTTTT AGATCCAGAT ACTAGTCAAA 
1751 GGCAAGTAGA TAGTACATTG GCAGCGTACT CAAAAATGAT GTCGCCACTT 
1801 AAAAACTCTT CAGATGGATT AACTAGTCTT AACCAAAGCA ACTCCACCTT 
1851 GGTAGCACTC CCAGAGGGTA GGCAGGAATT GTCAGATGGG CAGGTTTUIGA 
1901 CAGGCATCAG CATGTCCTTA CTCACCGTCA TTGAAAAATT GAGAGAAAGG 
1951 ACAGACCAAA ACGCTTCAGA CGATGACATT TTGAAAGAGT TGCAGGACAA 
2001 CGCCCAGTGC CAACCCAACA GCGATACAAG TTTGTCCGGA AACAATGTGG 
2051 TGGAATACAT CCCGAATGCT GAACGACCCT ACCGTTGCCG CCTGTGTCAC 
2101 TACACAAGTG GCAACAAGGG CTACATCAAG CAGCACTTAC GAGTCCATCG 
2151 ACAGAGACAG CCTTATCAGT GTCCTATCTG CGAGCACATA GCGGACAACA 
2201 GCAAAGATTT GGAGAGTCAC ATGATCCACC ACTGTAAGAC AAGAATATAC 
2251 CAGTGCAAGC AGTGTGAAGA ATCCTTCCAT TATAAGAGTC AATTGAGGAA 
2301 CCATGAGAGA GAACAGCACA GTCTTCCAGA TACCTTGTCA ATAGCAACTT 
2351 CTAATGAGCC AAGAATTTCC AGTGATACAG CTGATGGAAA ATGTGTCCAG 
2401 GAAGGGAATA AGTCTTCAGT CCAGAAACAA TATAGATGTG ATGTGTGTGA 
24 51 TTATACAAGT ACAACATATG TTGGTGTCAG AAACCACAGG CGAATCCATA 
2501 ACTCTGATAA GCCGTACAGA TGCTCTCTGT GTGGGTATGT GTGTAGCCAT 
2551 CCTCCTTCTT TGAAGTCTCA TATGTGGAAA CATGCAAGTG ACCAAAATTA 
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2601 CAACTACGAA CAAGTAAACA AGGCTATTAA CGACGCGATT TCACAAAGTG 
2651 GCAGAGTTCT GGGGAAATCC CCTGGAAAGA CTCAATTAAA GAGCAGTGAA 
2701 GAGAGTGCAG ATCCCGTCAC TGGAAGTTCG GAAAATGCAG TGTCATCTTC 
2751 AGAACTGATG TCCCAGACTC CCAGTGAAGT TCTGGGTACC AACGAGAATG 
2801 AGAAACTGAG CCCTACAAGT AATACCTCAT ATAGTTTAGA AAAAATCTCC 
2851 AGTCTGGCCC CTCCTAGCAT GGAGTACTGC GTTTTACTCT TCTGCTGTTG 
2901 TATTTGTGGT TTTGAATCAA CCAGCAAAGA AAACCTCTTG GATCATATGA 
2951 AAGAGCACGA GGGTGAAATT GTAAACATCA TCCTGAATAA GGACCACAAT 
3001 ACAGCTCTAA ACACAAATTA GGTGGAATAA TGACTCGAGC AGGAAAGCAG 
3051 TAGAAGAGGA TTCCTTCACC ACAGTTTCAC CTTTACGCTG TCAGACAACT 
3101 TCCTGCCACA GAAGAAGTCG TTGATGTGAT TTTTGAGGAA ATGACAGATG 
3151 TGACTTTGGA ACCAAACTTG TAATAAAAGG AATTCCAAAT GGAAAAAAAA 
3201 AAAAA 


BLAST Results 


No BLAST result 


Medline entries 


90301500: 

Cloning and sequencing of a zinc finger cDNA expressed in mouse testis. 

92310982: 

2fp-37, a new murine zinc finger encoding gene, is expressed in a 
developmental ly regulated 

pattern in the male germ line. 


Peptide information for frame 1 


ORF from 472 bp to 3018 bp; peptide length: 849 
Category: similarity to known protein 


1 MSQTNFTPDT LAQNEGKAMS YQCSLCKFLS SSFSVLKDHI KQHGQQNEVI 
51 LMCSECHITS RSQEELEAHV VNDHDNDANI HTQSKAQQCV SPSSSLCRKT 
101 TERNETIPDI PVSVDNLQTH TVQTASVAEM GRRKWYAYEQ YGMYRCLFCS 
151 YTCGQQRMLK THAWKHAGEV DCSYPIFENE NEPLGLLDSS AAAAPGGVDA 
201 WIAIGESEL SIHNGPSVQV QICSSEQLSS SSPLEQSAER GVHLSQSVTL 
251 DPNEEEMLEV ISDAEENLIP DSLLTSAQKI ISSSPNKKGH VNVIVERLPS 
301 AEETLSQKRF LMNTEMEEGK DLSLTEAQIG REGMDDVYRA DKCTVDIGGL 
351 IIGWSSSEKK DELMNKGLAT DENAPPGRRR TNSESLRLHS LAAEALVTMP 
401 IRAAELTRAN LGHYGDINLL DPDTSQRQVD STLAAYSKMM SPLKNSSDGL 
451 TSLNQSNSTL VALPEGRQEL SDGQVKTGIS MSLLTVIEKL RERTDQNASD 
501 DDILKELQDN AQCQPNSDTS LSGNNVVEYI PNAERPYRCR LCHYTSGNKG 
551 YIKQHLRVHR QRQPYCXTPIC EHIADNSKDL ESHMIHHCKT RIYQCKQCEE 
601 SFHYKSQLRN HEREQHSLPD TLSIATSNEP RISSDTADGK CVQEGNKSSV 
651 QKQYRCDVCD YTSTTYVGVR NHRRIHNSDK PYRCSLCGYV CSHPPSLKSH 
701 MWKHASDQNY NYEQVNKAIN DAISQSGRVL GKSPGKTQLK SSEESADPVT 
751 GSSENAVSSS ELMSQTPSEV LGTNENEKLS PTSNTSYSLE KISSLAPPSM 
801 EYCVLLFCCC ICGFESTSKE NLLDHMKEHE GEIVNIILNK DHNTALNTN 

BLASTP hits 

Entry SI 024 5 from database PIR: 
finger protein, testis - mouse 

Score " 265, P = 8.4e-23, identities « 61/205, positives « 91/205 

Entry 322954 from database PIR: 
finger protein zfp-37 - mouse 

Score = 265, P = 9.1e-22, identities « 61/205, positives = 91/205 
Entry Ar031657_l from database TREMBL: 

gene: "Zfp94";'"product: "zinc-finger protein 94"; Rattus norvegicus 

z|nc-finger protein 94 {Zfp94) gene, partial cds. 

Score « 243, P - 1.6e-21, identities » 57/190, positives «= 85/190 


Alert BLASTP hits for DKFZphtes3_2el2, frame 1 
No Alert BLASTP hits found 
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Pedant information for DKFZphtes3_2el2, frame 1 


Report for DKFZphtes3_2el2 . 1 


{LENGTH] 849 

(MW) 94325.42 

[pIJ 5-47 

IHOMOLI PIR:A54 661 zinc finger protein ZNF41 - human (fragment) 2e-22 

[FUNCATJ 04.05.01.04 transcriptional control {S. cerevisiae, YJL056cl 3e-09 

[FUNCAT] 30.10 nuclear organization (S. cerevisiae, YJL056c] 3e-09 

(FUNCATl 04.03,01 trna synthesis (S. cerevisiae, YPR186c PZFl - TFIIIA) le-07 

(FUNCATJ 04.01.01 rrna synthesis [S. cerevisiae, YPR186c PZFl - TFIIIA} le-07 

{FUNCAT} 04.99 other transcription activities {S. cerevisiae, Y0RH3w] 4e-07 

[FUNCAT} 01.05.04 regulation of carbohydrate utilization [S. cerevisiae, YGL209w) 

2e-04 

(FUNCATJ 13.04 homeostasis of other ions [S. cerevisiae, YNL027w] 2e-04 

{FUNCATJ 11.01 stress response (S. cerevisiae, YMR037cJ 3e-04 

{BLOCKSJ BL00028 Zinc finger, C2H2 type, domain proteins 

{SCOP} dlmeyg 9.6.1.1.1 a designed zinc finger protein {syntheti 8e-06 

[PIRKWJ nucleus 8e-18 

{PIRKW} RNA binding 5e-13 

IPIRKWJ duplication 7e-13 

(PIRKWJ tandem repeat le-21 

(PIRKW] spermatogenesis 6e-16 

(PIRKWJ zinc 9e-21 

(PIRKWJ zinc finger le-21 

{PIRKW] DNA binding le-21 

[PIRKWJ metal binding 3e-15 

[PIRKWJ phosphoprotein 5e-13 

(PIRKWJ leucine zipper le-13 

(PIRKWJ alternative splicing 6e-lB 

(PIRKWJ eye lens 2e-16 

(PIRKWJ oocyte le-12 

(PIRKWJ transcription factor 6e-18 

[PIRKWJ segmentation 7e-13 

[PIRKW] embryo le-12 

(PIRKW] transcription regulation 2e-19 

[PIRKW] homeobox 2e-08 

{SUPFAM} P02 domain homology 7e-15 

(SUPFAMJ transcription factor Krueppel 7e-13 

[SUPFAMJ zinc finger protein ZFP-36 le-21 

[SUPFAMJ homeobox homology 2e-08 

(SUPFAMJ unassigned homeobox proteins 2e-08 

{PROSITE] CYTOCHROME_C 1 

(PROSITEj MYRISTYL 10 

(PROSITE} ZINC_FINGER_C2H2 3 

[PROSITE] AMI DAT ION 2 

(PROSITE) CAMP_PHOSPHO_SITE 2 

(PROSITE) CK2_PH0SPH0__SITE 18 

(PROSITE) TYR_PHOSPHO~SITE 3 

(PROSITE] PKC_PHOSPHO_SITE 10 

(PROSITEJ ASN_GLyCOSYLATION 7 

(PFAMJ Zinc finger, C2H2 type 

[KW] Irregular 

[KWJ 3D 

(KW) LOWCOMPLEXITY 5 . 65 % 


SEQ MSQTNFTPDTLAQNEGKAMSYQCSLCKFLSSSFSVLKDHIKQHGQQNEVILMCSECHITS 

SEG xxxxxxxxxxxxxxx 

ImeyF 

SEQ RSQEELEAHWNDHDNDANIHTQSKAQQCVSPSSSLCRKTTERNETIPDIPVSVDNLQTH 

SEG 

ImeyF 

SEQ TVQTASVAEMGRRKWYAYEQYGMYRCLFCSYTCGQQRMLKTHAWKHAGEVDCSYPIFENE 

SEG 

ImeyF 

SEQ NEPLGLLDSSAAAAPGGVDAWIAIGESELSIHNGPSVQVQICSSEQLSSSSPLEQSAER 

SEG xxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxx . . . 

ImeyF 

SEQ GVHLSQSVTLDPNEEEMLEVISDAEENLIPDSLLTSAQKIISSSPNKKGHVNVIVERLPS 

SEG 

ImeyF ; 
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SEQ 
SEG 
ImeyF 


AEETLSQKRFLMNTEMEEGKDLSLTEAQIGREGMDDVYI^DKCTVDIGGLIIGWSSSEKK 


SEQ DELMNKGLATDENAPPGRRRTNSESLRLHSLAAEALVTMPIRAAELTRANLGHYGDINLL 

SEG 

ImeyF ' \ 

SEQ DPDTSQRQVDSTLAAYSKMMSPLKNSSDGLTSLNQSNSTLVALPEGRQELSDGQVKTGIS 

SEG 

litieyF V. . , 

SEQ MSLLTVIEKLRERTDQNASDDDILKELQDKAQCQPNSDTSLSGNNWEYIPNAERPYRCR 

SEG 

ImeyF . . : TTTEETT 

SEQ LCHYTSGNKGYIKQHLRVHRQRQPYQCPICEHIADNSKDLESHMIHHCKTRIYQCKQCEE 


ImeyF TTTCEETTHHHHHHHHHHHHTTCCEEETTTTEEECCHHHHHHHHHHHHCCCCEEETTTTE 

SEQ SFHYKSQLRNHEREQHSLPDTLSIATSNEPRISSDTADGKCVQEGNKSSVQKQYRCDVCD 

SEG 

ImeyF EECCHHHHHHHHHHHC 

SEQ YTSTTYVGVRNHRRIHNSDKPYRCSLCGYVCSHPPSLKSHMWKHASDQNYNYEQVNKAIN 

SEG 

ImeyF 

SEQ DAISQSGRVLGKSPGKTQLKSSEESADPVTGSSENAVSSSELMSQTPSEVLGTNENEKLS 

SEG 

ImeyF * 

SEQ PTSNTSYSLEKISSLAPPSMEYCVLLFCCCICGFESTSKENLLDHMKEHEGEIVNIILNK 

SEG 

ImeyF ] !!!!!!!! 

SEQ DHNTALNTN 

SEG : . . 

ImeyF 


Prosite for DKFZphtes3_2el2 . 1 


PSOOOOl 
PSOOOOl 
PSOOOOl 
PSOOOOl 
PSOOOOl 
PSOOOOl 
PSOOOOl 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PSO0O05 
PS00005 
PS00005 
PS00005 
PSOOOOS 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PSOOOOS 
PSOOOOS 
PS00006 
PS00006 
PS00006 
PSOOOOS 
PSOOOOS 
PSOOOOS 
PSOOOOS 
PSOOOOS 
PSOOOOS 


10S->110 
126->130 
232->236 
262->266 
300->304 
314->318 
323->327 
355->359 
381->385 
485->489 
499->503 
617->621 
626->630 
741->745 
758->7S2 
766->770 
817->821 


101->104 
30S->309 
357->360 
385->388 
425->428 
678->681 
696->699 
726->729 
817">820 


104->108 
445->449 
454->458 
457->461 
497->501 
S4S->650 
784->788 
98->102 
378->382 


62->66 


59->62 


ASNGLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATI0N 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATI0N 

ASN_GLYCOSYLATION 

ASNGLYCOSYLATION 

CAMP_PHOSPHO_SITE 

CAMP_PH0SPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOS PHO_S I TE 

PKC_PHOSPHO_$ITE 

PKC_PHOSPH0_SITE 

PKC_PHOSPHO_S I TE 

PKC_PHOS PHO_S I TE 

PKC_PHOS PHO_S I TE 

PKC_PHOSPHO_SITE 

PKC_PHOSPH0 SITE 

PKC_PHOSPHO~SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PHOSPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0S PHO_S I T E 

CK2_PH0S PHO_S I TE 

CK2_PHOS PHO_S ITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PHOSPHO_SITE 

CK2_PH0S PH0_S ITE 

CK2_PH0S PHOS ITE 

CK2_PH0SPH0_SITE 

CK2_PHOS PHO_S ITE 

CK2_PH0SPH0 SITE 


PDOCOOOOl 

PDOCOOOOl 
PDOCOOOOl 
PDOCOOOOl 
PDOCOOOOl 
PDOCOOOOl 
PDOCOOOOl 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
FDOC00006 
PDOCOOOOS 
PDOC00006 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
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PS00007 

331- 

->339 

TYR_PHOSPHO 

_SITE 

PDOC00007 

no A A Art T 

703- 

->711 

TYR_PH0SPHO^ 

"site 

PDOC00007 

r> O A A A A *7 

596- 

->605 

TyR_PHOSPHO] 

"site 

PDOC00007 

n A A A A O 

PSUUUUo 

142- 

->148 

MYRISTYL 


PDOC00008 

nc» A A A A o 

185- 

->191 

MYRISTYL 


pixx:oooo8 

PbUUUUo 

196- 

•>202 

MYRISTYL 


PDOC00008 

O O A A A A O 

PSUUUUo 

241- 

->247 

MYRI STYL 


PDOC00008 

nonn Arto 

349- 

->355 

MYRISTYL 


PDOC00008 

t O U W VJU o 

ATi. 


MYRISTYL 


PDOC00008 

PS00008 

478- 

■>484 

MYRISTYL 


PDOC00008 

PS00008 

645- 

'>651 

MYRISTYL 


PDOC00008 

PS00008 

751- 

->757 

MYRISTYL 


PDOC00008 

PSO0OO8 

772- 

■>778 

MYRISTYL 


PDOC00008 

PS00009 

130- 

•>134 

AMI DAT ION 


PDOC00009 

PS00009 

376- 

■>380 

AMI DAT I ON 


PDOC00009 

PS00028 

146- 

>167 

ZINC FINGER 

C2H2 

PDOC00028 

PS00028 

684- 

>705 

ZINC finger" 

C2H2 

PDOC00O28 

PS00028 

595- 

>617 

ZINC finger" 

■C2H2 

PDOC00028 

PS00190 

53 

l->59 

CYTOCHROMES 

PDOC00169 


Pfam for DKFZphtes3_2el2. 1 


HMM_NAME Zinc finger, C2H2 type 

HMM *CpwPDCgKtFrrwsNLrRHMR.T.H* 

C-^+ C+ T R++++L++H H 
Query 53 CSE--CHITSRSQEELEAHVVN-DH 74 

23.25 (bits) f: 539 t: 559 Target: dkf zphtes3_2el2 . 1 similarity to finger proteins 

Alignment to HMM consensus: 
Query ♦CpwPDCgKtFrrwsNLrRHMRTH* 
C C-H+T ++H-KR-f-H 

dkf2phtes3 539 CRL— CHYTSGNKGYIKQHLRVH 559 

O^^ry f: 567 t: 587 Target: dlcf 2phtes3_2el2 . 1 similarity to finger proteins 

Alignment to HMM consensus: 
HMM * CpwPDCgKtFrrwsNLrRHMRTH* 

CP+ C+ ++ +L+ HM-l- H 
Query 567 CPI— CEHIADNSKDLESHMIHH 587 

33.47 (bits) f: 595 t: 616 Target: d)cf zphtes3_2el2. 1 similarity to finger proteins 

Alignment to HMM consensus: 
Query ♦CpwPDCgKtFrrwsNLrRHMR. T . H* 

C+ C+-H-F ++S-J-LR-HH R H 

dlcfzphtes3 595 CKQ— CEESFHYKSQLRNHERE-QH 616 

Query f: 656 t: 676 Target: dkf 2phtes3_2el2. 1 similarity to finger proteins 

Alignment to HMM consensus: 
HMM *CpwPDCgKtFrrwsNLrRHMRTH* 

C-^+ C++T R+H-»-R+H 
Query 656 CDV— CDYTSTTYVGVRNHRRIH 676 

24.53 (bits) f: 684 t: 704 Target: dkf zphtes3_2el2 . 1 similarity to finger proteins 

Alignment to HMM consensus: 
Query *CpwPDCgKtFrrwsNLrRHMRTH* 
C+ CG++ +++ +L+ HM H 

dkf2phtes3 684 CSL— CGYVCSHPPSLKSHMWKH 704 

Query f: 809 t: 829 Target: dkf 2phtes3_2el2 . 1 similarity to finger proteins 

Alignment to HMM consensus: 
HMM *CpwPDCgKtFrrwsNLrRHMRTH* 

C + CG ++++NL HM+ H 
Query 809 CCI— CGFESTSKENLLDHMKEH 829 
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DKFZphtes3_2f 14 


group: testes derived 

DKFZphtes3_2f 14 encodes a novel 129 amino acid protein with very weak similarity to human 

omega protein. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes. 


weak similarity to omega protein 

complete cDNA, complete cds, 1 EST hit 

Sequenced by EMBL 

Locus: unknown 

Insert length: 2353 bp 

Poly A stretch at pos. 2341, no polyadenylation signal found 


1 GCAGATTCTC CAGGCCCAGC ATCTGCCTCA CCGTGGCCCC CCACAAGCCA 
51 AGCGCCTGCC TTTCAGCAGC CTCTACACAC CCAGCTCCTG CCACCCAATG 
101 GCTCTTTAGG CCAAGCTCAT ACCTCACGAT GATTTTTCCA GGCCCAACTT 
151 TTGTCTCATG GCAACCTTCC CTGGCCAAGT TTCCACCTAT TTCCTGGCAG 
201 CCTGGACAGG CCCAGGTCCT GCCACACACT GGCCTCTCTA CGCCCAGCTC 
251 ATGCCTCACA GTGGCCTCTC CAGGCCCAGC TCCTGTCCCG GGACATCATC 
301 TCCAGGCCCA AAACTTCCTC AAGTCGGCCT CTCCAGGCCC AGTTGCTGCC 
351 TCCCGGCATT CTCTCCAGGC CTAGCTCTTC CTCCTGGCTG TATCTACAAG 
401 ACCAACTCCT GCCTCACAAC AACCTTTTAT GGCTCAGCTC CTGCCCAACT 
451 ACTGCCGGCC TTTGTAGGCC CAAAACTTCC TCAAGTCAAG CTCTTTAGGC 
501 CCACCTTCTG CCTTGCAGTG GCCTGTACAG ACCCAGCTCT GGCTTGAGAA 
551 CAGCCTCTGC AGGCCCTGCT CTTGCCTCTT AGCTCCCTCT CCAGGCCCAT 
601 CTCTTGCCTC ACAGTGGCTT CCGTGGGCCA AGTTCCCGCC TGCCTCCCAG 
651 CAGCCTCAAC AGGCCTAGCT CCTCCCTCAC AATGGCTTGT TTAGGTCCAG 
701 TTGATGCCTC TGGCAACCTG TCCAGGCCCA GCTCCTGCCT CACACTGGCC 
751 TCTCTAGGCC GAGGTCCTTT CTCATACTGG CCTGTTTAGG CCCAGCTCAT 
801 TCCTCTTGTC ATCTCTCCAG GCCCAGCTTT TGCCTGTTGT TGGCCTCTAC 
851 CTCACAGTGC ACCTTCCAGT CCCACCTCTT GCCTCACCAT GGCCTCCTCT 
901 GACCAGGTTC CTGCCTTTCG GCAGCCTCTA CAGGCCTAGC TGCTGCCTCC 
951 CAATGGCCTT TGTAGGCCAC GCTCATGCCT CACTGTGGCC TTTCCAGGCC 
1001 TAGCTTTCGC TTTTTGGCCA CTCCAGGCCC AGAACTTCCC CCAGTCAGCC 
1051 TCTCCAGGCC CAGCTCTTCC TCCCAGCAAC CTCTGCAGGC CCAAATCATC 
1101 CTCAAATTGG CCTCTTCTTT CCCAGGTCCT GCCTCCTGGT GGCCTCTGAA 
1151 GACCCAAATC GTCCTCCAGT TGGTTTTTCC AGGCCCAGCT CCTGCCTTTT 
1201 GGTGGCCTCT CCAGGTGCAA AACTTCCTCC CATCAGCCTG TCCAGGCCCA 
1251 GCTCATGCCT CTTGGTGGCC TTCTCAGGCC CTGCTTTTGA CTTGGTGGCC 
1301 TCTTCAGGCC CAGAACTTGA ACTCAAGTCA GCCTCTCCAG GCCCAGCTCC 
1351 TGCCTTCTTA AGGTCTGTAC AGGCCCAGCC TCTACCTCAC AGCGGACTCT 
1401 CCACACCCAG CTCTTGCCTC ACTGTAGCCT CCCCAGTCCA AAACTCCTGC 
14 51 CTTTTGGCAG CTTCGACAAG CCCAGGTCCT GCCTTTCAAT GACCTCTTTA 
1501 GGCCCCGCTC ATTCCTTACA ACGGCCTTTC CAGGCCCAGT TTTTCCCTTT 
1551 TGGCGGCCTC TCCAGGCCCA GAACTTCCTC AAGTCGGCCT CTTTAGGCCC 
1601 AGTTGCTGCC TCCTGGCATC CTCTGCAGGC CGAGCTCTTC CTCCCTGCTG 
1651 TGTCTACAGG CCCAACTCCT GCCTCACAAC AACCTCCTTG GACTCAGCTT 
1701 CTGCCCAGCT CCTGGTGGCC TTTGTAGGCT CAAAATTTTC TCAAATCAAG 
1751 CTCTCCAGGC CTACTGTCAG CCTCGTGGCA GCCTAAACAG GCCCAGCTCC 
1801 TGCCTGACAA TGGCCTCTCC AGGCTTTTCT CCTGCCTCGC AGCAGGCTTT 
1851 CCAGGCCCAG CTCTTGCCTC ATGGTGGCCT TCCCCGGCCA TGTTCCTATC 
1901 TGACTTCTGG CAGCCTCAAC CGGCCCAGCT TCTGCCTCAC ACTGGCCTCT 
1951 CTAGGCCCAG CTCCTTTTTC ACAGTGGCCT CACTACGCCC ATCTCCTACC 
2001 TCAGATCTGC CTCCCAAGAC CCAGCTCCTG TCTCATGGTG GTCTCTCTTA 
2051 CACCAGCTCC TGCCTCACAA TGGCCTCGTC TGGCCCATCT TCTGCCTCAC 
2101 AGTGGCCACT CAAGGCCCAT CTTTTGCCTC ATGGTAGCCT CTTCTGGTTT 
2151 TGCTCTTGCC TCACAGTTGC CTCTTCCAGA TCCAGCTTTA AGCCTTTGAT 
2201 GGTCAACAGC ATCAAGGAGC CTAAAGCTTC CCTGGACTCT CATTTGTTCA 
2251 CTTTACAGCA GAGTGCCTTA GCAAAAACTG TCTCTTAACC TTGAGAGTGG 
2301 ATTTCTGACA AATCGATAGT AAATTCTGCC TGTGTGGTTT CAAAAAAAAA 
2351 AAA 


BLAST Results 


No BLAST result 
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Mo Medline entry 


Medline entries 


Peptide information for frame 2 


ORF from 158 bp to 544 bp; peptide length: 129 
Category: similarity to known protein 


1 MATFPGQVST YFLAAWTGPG PATHWPLYAQ LMPHSGLSRP SSCPGTSSPG 
51 PKLPQVGLSR PSCCLPAFSP GLALPPGCIY KTNSCLTTTF YGSAPAQLLP 
101 AFVGPKLPQV KLFRPTFCLA VACTDPALA 

BLASTP hits 

Entry 170697 from database PIR: 
omega protein - human (fragment) 

Score 79, P - 2.8e-03, identities = 32/94, positives = 38/94 


Alert BLASTP hits for DKFZphtes3_2f 14, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_2f 14, frame 2 

Report for DKFZphtes3_2f 14 .2 

[LENGTH] 129 

[MWJ 13421.76 

[pi] 9.14 

[PROSITEl MYRISTYL 2 

IKW] Irregular 

[KWl LOW^COMPLEXITY 10,85 % 

SEQ MATFPGQVSTYFLAAWTGPGPATHWPLYAQLMPHSGLSRPSSCPGTSSPGPKLPQVGLSR 

SEG xxxxxxxxxxxxxx 

PRD cccccccceeehhhhhcccccccccccccccccccccccccccccccccccccccccccc 

SEQ PSCCLPAFSPGLALPPGCIYKTNSCLTTTFYGSAPAQLLPAFVGPKLPQVKLFRPTFCLA 

SEG 

■PRO cccccccccccccccccccccccccceeeccccccccccccccccccccccccccccccc 

SEQ VACTDPALA 

SEG 

PRD ccccccccc 


Prosite for DKF2phtes3_2f 14 .2 

PS00008 6->12 MYRISTYL PDOC00008 

PS00008 92->98 MYRISTYL PDOC00008 


(No Pfam data available for DKFZphtes3_2f 14 .2) 
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DKF2phtes3_2g7 


group: testes derived 

DKFZphtes3_2g7 encodes a novel 359 amino acid protein with similarity to neurofiliament 
proteins . 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes. 


similarity to neurofilament proteins 

complete cDNA, complete cds, 6 EST hits (5 hits are out of a testis 
library) 

Sequenced by EMBL 

Locus: unknown 

Insert length: 1613 bp 

Poly A stretch at pos. 1595, polyadenylation signal at pos. 1557 


1 GCCACACAGG CTCCTTGGAG TAAGAGTGTG AGAAACTGGA TGAAGACAGC 
51 TGTATTCTTT TGGAAGCGTT CGAGATTGGT CTGTCTCTAC CAACTAAAAA 
101 CTTCTAGCTT AAGTGCAGAG ATTTAAGGAG ATCAACAAAA ACTCAGTCTA 
151 GACATATTAT GAGGCTGGGA GGGTATCAAC AGACTTGAGT TCTTGTCAGC 
201 AAGATCACCT GCTTTTAATA TTGTCCTCAG GGTCTGAGCA CATCTGGAAG 
251 TGAGGTCAAT CAAGTTAGAC CCCAAAAACT TTTGTGACAA CAGTGAAGAG 
301 GGGAAAATAA ACACACCACA AACATGAACC TCAACCCCCC GACATCTGCT 
351 CTTCAGATCG AGGGCAAAGG CAGCCATATT ATGGCTAGAA ATGTAAGCTG 
401 CTTTCTAGTC AGGCACACCC CTCATCCCAG AAGAGTCTGC CACATCAAAG 
451 GCTTGAATAA CATTCCAATC TGTACTGTGA ATGATGATGA GAATGCATTT 
501 GGAACATTGT GGGAAGTTGG CCAGTCTAAC TACTTAGAGA AGAACAGGAT 
551 ACCATTTGCC AATTGCAGTT ACCCCCCGAG CACTGCAGTC CAGAAGAGCC 
601 CTGTAAGAGG AATGTCGCCA GCCCCAAACG GTGCCAAAGT GCCTCCACGG 
651 CCTCATTCTG AGCCCAGTAG AAAAATTAAA GAGTGCTTCA AAACTTCCAG 
701 TGAGAATCCC TTAGTAATTA AAAAGGAAGA AATTAAGGCC AAAAGACCAC 
751 CATCACCTCC AAAGGCATGC TCTACTCCTG GCTCCTGTTC TTCAGGGATG 
801 ACAAGTACCA AGAATGATGT GAAAGCAAAC ACCATTTGCA TACCAAACTA 
851 TCTGGATCAG GAAATAAAAA TCCTGGCAAA GCTCTGTAGC ATTTTGCATA 
901 CTGATTCTCT GGCAGAAGTT TTACAGTGGC TGCTTCATGC AACTTCAAAA 
951 GAAAAAGAGT GGGTCTCAGC TTTGATTCAT TCTGAGCTTG CCGAGATAAA 
1001 CCTGTTAACT CATCACAGAA GAAACACCTC AATGGAACCA GCAGCAGAGA 
1051 CTGGGAAGCC ACCCACAGTT AAATCACCAC CCACAGTTAA ATTGCCCCCA 
1101 AATTTTACTG CAAAATCAAA AGTGCTGACC AGAGATACAG AAGGGGATCA 
1151 ACCAACCAGA GTGTCAAGTC AAGGATCTGA AGAAAACAAG GAAGTACCAA 
1201 AAGAGGCTGA GCACAAGCCT CCACTACTTA TAAGAAGAAA TAATATGAAA 
1251 ATACCTGTTG CAGAATATTT CAGCAAACCA AATTCTCCTC CCAGGCCTAA 
1301 CACTCAGGAG AGTGGATCAG CAAAACCAGT GTCAGCAAGG AGTATACAAG 
1351 AATACAACCT CTGTCCCCAA AGAGCATGTT ATCCTTCAAC ACACCGGAGG 
1401 TAGAAGTTCT AGACTGGGTG AATTCTTTCA TGAATATGAG CTTCACATTT 
1451 ACATCATCAA ATTATTTTTC AAATGAATAT TTTTGGTATT GAGGAATCAA 
1501 GTGGTCCTCT TTATGGTGGC ACATGTAAAT CTAAAAATAC CTGTATGTAA 
1551 TGCTACAAAT AAATATTACT GGAAATGATA TTTCCATTTG TAGTTAAAAA 
1601 AAAAAAAAAA AAA 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 3 


ORF from 324 bp to 1400 bp; peptide length: 359 
Category: similarity to known protein 
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1 MNLNPPTSAL QIEGKGSHIM ARNVSCFLVR HTPHPRRVCH IKGLNNIPIC 
SI TVNDDENAFG TLWEVGQSNY LEKNRIPFAN CSYPPSTAVQ KSPVRGMSPA 
101 PNGAKVPPRP HSEPSRKIKE CFKTSSENPL VIKKEEIKAK RPPSPPKACS 
151 TPGSCSSGMT STKNDVKANT ICIPNYLDQE IKILAKLCSI LHTDSLAEVL 
201 QWLLHATSKE KEWVSALXHS ELAEINLLTH HRRNTSMEPA AETGKPPTVK 
251 SPPTVKLPPN FTAKSKVLTR DTEGDQPTRV SSQGSEENKE VPKEAEHKPP 
301 LLIRRNNMKI PVAEYFSKPN SPPRPNTQES GSAKPVSARS IQEYNLCPQR 
351 ACYPSTHRR 

BLASTP hits 
Entry A43427 from database PIR: 

neurofilament triplet HI protein - rabbit (fragment) 

Score = 118, P = 5,6e-04, identities = 79/290, positives = 110/290 

Entry RNNFH_1 from database TREMBL: 

Rat high molecular weight neurofilament (NF-H) protein mRNA, 3* end. 
Score = 115, P = 9.5e-04, identities = 69/281, positives = 100/281 

Entry B43427 from database PIR: 

neurofilament protein H form H2 (repetitive region) - rabbit (fragment) 
Score =111, P = 1.3e-03, identities = 64/269, positives = 102/269 


Alert BLASTP hits for DKF2phtes3_2g7, frame 3 
Ho Alert BLASTP hits found 

Pedant inforxaation for DKFZphtes3_2g7, frame 3 


Report for DKFZphtes3_2g7 . 3 


(LENGTH] 

359 


[MW] 

39725.53 


tpl) 

9,45 


[PROSITEJ 

MYRISTYL 3 


JPROSITE) 

CAMP PHOSPHO SITE 

1 

[PROSITEl 

CK2 PHOSPHO SITE 

9 

[PROSITEJ 

PKC PHOSPHO'SITE 

10 

(PROSITE) 

ASN_GLYCOSYLATION 

4 

(KWl 

Alpha Beta 


[KW3 

LOW_COMPLEXITY 

4.16 % 


SEQ MNLNPPTSALQIEGKGSHIMARNVSCFLVRHTPHPRRVCHIKGLNNIPICTVNDDENAFG 

SEG 

PRD ccccccccceeecccccceeeeccceeeeecccccccccccccccccccccccccccccc 

SEQ TLWEVGQSNYLEKNRIPFANCSYPPSTAVQKSPVRGMSPAPNGAKVPPRPHSEPSRKIKE 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccchhhhhh 

SEQ CFKTSSENPLVIKKEEIKAKRPPSPPKACSTPGSCSSGMTSTKNDVKANTICIPMyLDQE 

SEG 

PRD hcccccccceeeehhhhhhccccccccccccccccccccccccccccceeeeccccchhh 

SEQ IKILAKLCSILHTDSLAEVLQWLLHATSKEKEWVSALIHSELAEINLLTHHRRNTSMEPA 

SEG 

PRD hhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccc 

SEQ AETGKPPTVKSPPTVKLPPNFTAKSKVLTRDTEGDQPTRVSSQGSEENKEVPKEAEHKPP 

SEG .... xxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccceeeeecccccccceeeeccccccccccccccccccc 

SEQ LLIRRNNMKIPVAEYFSKPNSPPRPNTQESGSAKPVSARSIQEYNLCPQRACYPSTHRR 

SEG 

PRD eeeeccccccceeeeecccccccccccccccccccchhhhhhccccccccccccccccc 


Prosite for DKFZphtes3_2g7 . 3 

PSOOOOl 23->27 ASN_GLYCOSYLATION PDOCOOOOl 

PSOOOOl 80->84 ASN_GLYCOSYLATION PDOCOOOOl 

PSOOOOl 234->238 ASN GLYCOSYLATION PDOCOOOOl 
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PSOOOOl 

260- 

->264 

ASN GLYCOSYLATION 

PDOCOOOOl 

PS00004 

232- 

->236 

CAMP PHOSPHO SITE 

PDOCQ0004 

PS00005 

115- 

->118 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

161- 

->164 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

207- 

->210 

PKC PHOSPHO SITE 

PDOC00005 

PS0O005 

243- 

->246 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

248- 

->251 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

254- 

->257 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

262- 

•>265 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

332->335 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

337- 

->340 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

356- 

->359 

PKC PHOSPHO SITE 

pcxxroooos 

PS00006 

51->55 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

61->65 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

" 124- 

'>128 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

162- 

'>166 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

195- 

■>199 

CK2~PH0SPH0""SITE 

PDOC00006 

PS00006 

207- 

->211 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

235- 

>239 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

272- 

>276 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

340- 

>344 

CK2 PHOSPHO SITE 

PDOC00006 

PS00008 

153- 

>159 

MYRISTYL 

PDOC00008 

PS00008 

158- 

>164 

MYRISTYL 

FDOC00008 

PS00008 

284- 

>290 

MYRISTYL 

PDOC00006 


(No Pfam data available for DKFZphtes3_2g7 .3) 
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DKF2phtes3_2hl 


group: transmembrane protein 

DKFZphtes3_2hl encodes a novel 116 amino acid protein with weak similarity to C. eleaans 
cosmid C13F10. ^ 

The novel protein contains 1 transmembrane region. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes and as a new marker for testicular cells. 

similarity to C.elegans C13F10.5 
TRANSMEMBRANE 1 
Sequenced by EMBL 
Locus: /map="2" 
Insert length: 1156 bp 

Poly A stretch at pos. 1143, polyadenylation signal at pos. 1121 


1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 


GGCCATCAAA 

GCCTCCATTT 

GGGCCTCGCT 

TGAGGAAGTA 

AAGCAGTTTC 

AAAATGCTGG 

GAACACAGCC 

ATATCACCTT 

GTGGAACTGG 

GATGTACGTC 

GCGCCTACTC 

ACTGCAGAGC 

ATAGGACCCA 

CACCATTGGC 

GCGGGTTAGC 

TTTGCTGCCA 

TTCACTGTGG 

GACCTGTTTG 

AGAAAGGCAC 

ATGATTAGGT 

AACATTTATA 

GGGTTTCTTG 

CAGTCATTAC 

AAAAAA 


ATAACTAAAC 
GGGCCAAGCT 
GTGACTGACA 
TCTACATCCT 
CTGAAATAAT 
AAGCGGCTCA 
ATTCCTCTGC 
CTTGAAGGTT 
AATTTGGCCT 
GGGACACGAG 
TGTGTTCAAT 
AGTTGGAGCG 
GCTGTGCTGT 
TATGGATTTG 
TCTGTGACTG 
TTTGATCTTT 
GTCCGACGCA 
TTTCATTTCT 
TGGGGAGATT 
ACATCAGGGC 
GCAATTTTTT 
TTTGTTTTTG 
TGGTATTGAA 


CATGTCATTT 
CTGACTGCAA 
ATGCCGCTGC 
CCTTCCCACT 
TCTGTGACGA 
GCCCCAGGGC 
CGTCGTGCTG 
CTTCTCTGGT 
GGCATATTTT 
GCCCTGAAGA 
CCAGGCTGTG 
CGAGTTACAG 
CATGCAGCTA 
ATTTCAGGTG 
CATAGTTTTT 
GATAGTTTTG 
ATTTATAAAA 
CATCTGTTTG 
CTCAGCTTAA 
TGCATTGTCA 
TTTTCCCGGA 
TTTTGCTTCC 
AAATAAAATA 


GGAGCAACAA 

TGATGCCTCT 

ATCTTTTCAG 

ACCAGATTTT 

GCTTCTTCCA 

AGCACATCAG 

GGACCAGTCT 

TGGTCCTGCT 

GTCCTGTCCT 

GAAGAAAGAG 

AAGCCATCCA 

TTGAGACCCC 

ACCTCTGATG 

TATAGGACTA 

CTACCTTCTT 

GTGAAACTCT 

ATTATGTACT 

GGAGATGATT 

AACATCCAGC 

ATGTTCTCTT 

GAGTTTAGGT 

TGCTTTAATT 

TCTTTAAAAC 


AGCCACTGCG 
GCCCCGACCC 
CAGTCATTGA 
GCTTGGAGAA 
CATTAGGACA 
AGACACCATG 
TTCCTGACCA 
GGGACTGTTT 
TGTTCTATTG 
GGAGAGAAGA 
GGGCACCCTG 
TGGCAGGGAG 
TGGTCTTCCT 
AGGGCAGCTT 
TCCCTGATCT 
CTAAAATACA 
CAAGAAGGGA 
TTAGAGCACT 
AGTTTGAAGT 
TAAGTCTTTT 
TGCAAGTTTT 
CTTTAATTTT 
ATCAAAAAAA 


BLAST Results 

Entry HS3 13307 from database EMBL: 
human STS SHGC-16715. 
Score = 1222, P = 1.4e-48, identities = 248/251 


No Medline entry 


Medline entries 


Peptide information for frame 2 


ORF from 254 bp to 601 bp; peptide length: 116 
Category: similarity to unknown protein 


1 MLEAAQPQGS TSETPWNTAI PLPSCWDQSF LTNITFLKVL LWLVLLGLFV 
51 ELEFGLAYFV LSLFYWMYVG TRGPEEKKEG EKSAYSVFNP GCEAIQGTLT 
101 AEQLERELQL RPLAGR 
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BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2hl, frame 2 

TREMBL:CEUC13F10^2 gene: "C13F10.5"; Caenorhabditis elegans cosmid 
C13F10., N « 1, Score « 141, P = 8.2e-10 

>TREMBL:CEUC13F10_2 gene: "C13F10.5"; Caenorhabditis elegans cosmid 
Length = 171 

HSPs: 

Score « 141 (21.2 bits). Expect •= 8.2e-10, P - 8.2e-10 
Identities « 32/82 (39%), Positives = 52/82 (63%) 

Query: 27 DQSFLTNITFLKVLLWLVLLGLFVELEFGLAYFVLSLFYWMYVGTRGPEEKKEGEKSAYS 86 
CK' . t^^ ++ T + V++++V L ++FG +F+LSL + Y T G ++ GE SAYS 

Sbjct: 90 EQSVVS~TRIAVWYVVGQALAAWVQFGAVFFILSLILFTYWNT-G— RRRRGEMSAYS 144 

Query: 87 VFNPGCEAIQGTLTAEQLEREL 108 

VFN CE + G++TAE ER+ + 
Sbjct: 145 VFNDNCERLAGSMTAEHFERDM 166 

Pedant information for DKFZphtes3_2hl, frame 2 

Report for DKrzphtes3_2hl . 2 

(LENGTH! 116 

[MW] 13092-19 

(pl3 4.64 

(PROSITE) MYRISTYL 1 

tPROSITE] CK2_PHOSPHO_SITE 2 

(PROSITEJ TYR_PHOSPHO SITE 2 

tPROSITE] ASNGLYCOSYLATION 1 

[ KW ] TRANSMEMBRANE 1 

tKW] LOW COMPLEXITY 32.76 % 


SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRO 
MEM 


MLEAAQPQGSTSETPWNTAIPLPSCWDQSFLTNITFLKVLLWLVLLGLFVELEFGLAYFV 
xxxxxxxxxxxxxxxxxxxxx . . . , 

ccccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhchhhhh 
HMMMMMMMMMMMMMMMM 

LSLFYWMYVGTRGPEEKKEGEKSAYSVFNPGCEAIQGTLTAEQLERELQLRPLAGR 

xxxxxxxxxxxxxxxxx - . 

lihhhhhhhcccccchhhhhcccceeeecccccccccccchhhhhhhhhhccccccc 


Prosite for DKFZphtes3_2hl . 2 

PSOOOOl 33->37 ASNGLYCOSYLATION 

PS00006 10->14 CK2_PH0SPH0_SITE 

PS00006 24->28 CK2_PHOSPH0 SITE 

PS00007 78->86 TYR PHOSPHO^SITE 

PS00007 77->86 TYR~PHOSPHO~SITE 

PS00008 97->103 MYRISTYL 


PDOCOOOOl 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00007 
PDOC00008 


(NO Pfam data available for DKFZphtes3_2hl .2) 
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DKFZphtes3_2hl5 


group: tesces derived 

DKF2phtes3_2hl5 encodes a novel 855 amino acid protein with very weak similarity to S. pombe 
cdc23. 

No informative BLAST results; no predictive prosite, pfara or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 


similarity to cdc23 

complete cDNA, complete cds, EST hits 

Sequenced by EMBL 

Locus : unknown 

Insert length: 4619 bp 

Poly A stretch at pos. 4598, polyadenylation signal at pos, 4589 


1 GAAGGCGTCC CGGCATCGGC CAAGATTCTA CATTGCTCAT CTGGGCATCT 
51 GAGCCTCCTT CGAAGTTTCC TGTCACAACT GTCCTCTTGA CAGCATGGAT 
101 GAGGAGGAAG ACAATCTGTC TCTGCTGACC GCACTGCTGG AAGAAAATGA 
151 GTCAGCCTTG GATTGTAATT CAGAAGAAAA TAACTTCTTG ACGCGGGAAA 
201 ATGGCGAGCC CGACGCATTT GATGAGCTCT TTGATGCCGA CGGCGACGGT 
251 GAATCTTATA CAGAAGAGGC TGATGATGGA GAAACAGGAG AGACAAGAGA 
301 CGAAAAGGAA AATCTGGCCA CTCTCTTTGG AGATATGGAG GACTTAACAG 
351 ATGAAGAAGA AGTTCCCGCA TCACAGTCAA CTGA/^TAG GGTCCTCCCT 
401 GCTCCTGCCC CCAGGCGAGA GAAAACGAAT GAAGAGTTGC AAGAGGAATT 
451 AAGGAATTTG CAAGAGCAAA TGAAGGCCTT ACAAGAGCAG CTAAAAGTAA 
501 CAACAATTAA ACAGACAGCA AGCCCAGCCC GTCTGCAAAA ATCCCCTGAG 
551 AAGTCTCCCC GGCCACCTCT TAAGGAGAGG AGAGTTCAGA GAATTCAGGA 
601 GTCAACATGC TTTTCTGCGG AGCTTGATGT CCCTGCGCTA CCAAGAACCA 
651 AGAGGGTGGC TCGAACACCA AAGCCTTCAC CTCCAGATCC CAAAAGCTCA 
701 TCTTCAAGGA TGACAAGTGC ACCCTCCCAA CCCCTACAGA CGATTTCTCG 
751 GAACAAACCT AGTGGGATAA CTAGAGGTCA AATTGTGGGG ACCCCAGGAA 
801 GTTCTGGGGA AACGACTCAA CCCATCTGTG TGGAAGCCTT CTCTGGTCTG 
851 CGGCTCAGGC GGCCTCGAGT ATCCTCCACA GAAATGAACA AGAAAATGAC 
901 CGGCCGAAAA CTGATCAGAC TGTCTCAGAT CAAGGAAAAG ATGGCCAGAG 
951 AGAAGCTGGA AGAAATAGAT TGGGTGACAT TTGGGGTTAT ATTGAAGAAG 
1001 GTTACGCCAC AGAGTGTGAA TAGTGGAAAA ACCTTCAGCA TATGGAAACT 
1051 GAATGATCTT CGTGACCTGA CACAATGTGT GTCCTTGTTC TTATTTGGAG 
1101 AAGTTCACAA AGCGCTCTGG AAGACGGAGC AGGGGACTGT CGTAGGGATC 
1151 CTCAATGCCA ACCCCATGAA GCCCAAGGAT GGTTCAGAGG AGGTGTGTTT 
1201 ATCTATCGAT CATCCTCAGA AGGTCTTAAT TATGGGTGAA GCTCTTGACC 
1251 TGGGAACCTG TAAAGCCAAG AAGAAGAATG GAGAGCCGTG CACGCAGACT 
1301 GTGAATTTGC GTGACTGTGA GTACTGTCAG TACCATGTCC AGGCTCAGTA 
1351 CAAGAAGCTC AGTGCAAAGC GTGCGGATCT GCAGTCCACC TTCTCTGGAG 
1401 GACGAATTCC AAAGAAGTTT GCCCGCAGAG GCACCAGCCT CAAAGAACGG 
1451 CTGTGCCAAG ATGGCTTTTA CTACGGAGGG GTTTCTTCTG CCTCGTATGC 
1501 AGCTTCAATT GCAGCAGCTG TGGCTCCTAA GAAGAAGATT CAAACCACTC 
1551 TGAGTAATCT GGTTGTTAAG GGCACAAACT TGATCATCCA GGAAACACGG 
1601 CAAAAACTCG GAATACCCCA GAAGAGCCTG TCTTGCTCTG AGGAGTTCAA 
1651 GGAACTGATG GACCTGCCGA CGTGTGGAGC CAGGAACTTA AAACAACATT 
1701 TAGCCAAAGC CTCAGCTTCA GGGATTATGG GGAGCCCAAA ACCAGCCATC 
1751 AAGTCCATCT CGGCCTCAGC ACTCTTGAAG CAACAGAAGC AGCGGATGTT 
1801 GGAGATGAGG AGAAGGAAAT CAGAAGAAAT ACAGAAGCGA TTTCTGCAGA 
1851 GCTCAAGTGA AGTTGAGAGC CCAGCTGTGC CATCTTCATC AAGACAGCCC 
1901 CCTGCTCAGC CTCCACGGAC AGGATCCGAG TTCCCCAGGC TGGAGGGAGC 
1951 CCCGGCCACA ATGACGCCCA AGCTGGGGCG AGGTGTCTTG GAAGGAGATG 
2001 ATGTTCTCTT TTATGATGAG TCACCACCAC CAAGACCAAA ACTGAGTGCT 
2051 TTAGCAGAAG CCAAAAAGTT AGCTGCTATC ACCAAATTAA GGGCAAAAGG 
2101 CCAGGTTCTT ACAAAAACAA ACCCAAACAG CATTAAGAAG AAACAAAAGG 
2151 ACCCTCAGGA CATCCTGGAG GTGAAGGAAC GTGTAGAAAA AAACACCATG 
2201 TTTTCTTCTC AAGCTGAGGA TGAATTGGAG CCTGCCAGGA AAAAAAGGAG 
2251 AGAACAACTT GCCTATCTGG AATCTGAGGA ATTTCAGAAA ATCCTAAAAG 
2301 CAAAATCAAA ACACACAGGC ATCCTGAAAG AGGCCGAGGC TGAGATGCAG 
2351 GAGCGCTACT TTGAGCCACT GGTGAAAAAA GAACAAATGG AAGAAAAGAT 
2401 GAGAAACATC AGAGAAGTGA AGTGCCGTGT CGTGACATGC AAGACGTGCG 
2451 CCTATACCCA CTTCAAGCTG CTGGAGACCT GCGTCAGTGA GCAGCATGAA 
2501 TACCACTGGC ATGATGGTGT GAAGAGGTTT TTCAAATGTC CCTGTGGAAA 
2551 CAGAAGCATC TCCTTGGACA GACTCCCGAA CAAGCACTGC AGTAACTGTG 
2601 GCCTCTACAA ATGGGAACGG GACGGAATGC TAAAGGTATG CCATTTGCGT 
2651 ACTAATTTTT GACTCCTTTT AGTGACCCAT GCTAATAATG TGGAACCATC 
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2701 TCCTATTAAA ATATTTTCAT TTTTCTAGGA AAAGACTGGT CCAAAGATAG 
2751 GAGGAGAAAC TCTGTTACCA AGAGGAGAAG AACATGCTAA ATTTCTGAAC 
2801 AGCCTTAAAT AACCCGAACT TCAGACATTT TCCCACAGAC TTCCTGGCCT 
2851 CCTGTGACTC TGGAAAGCAA AGGATTGGCT GTGTATTGTC CATTGATTCC 
2901 TGATTGACGC CGTCAAAAAC AAATGCTTGT TAAGCCCATA AGCTTTGCCT 
2951 GCTTACTTTC TGCCATTGGG TTGGTTTGAT ACCACATTTA ACATTGACAT 
3001 TTAAGTGGAA AACCAAGTTA TCATTGTCTT TTCTAAGCTC AGTGTGGATG 
3051 ATTGCATTAC TTCATTCACT GAAGTTTTTG CCCAAAAATT GGAAGGTAAA 
3101 CAGAGAGCTA TGTTTCTGTA TCTTTTGGTT ATAGAGTGTT CACTTCTTTA 
3151 TCATAACAAA ATTCTAGTGT TTATACGAAC ACCCAGAGGC AAAAGAATTT 
3201 GGCTTAATTC TCACTCCAGG TAAGTAGCTT AACTTCTGGG CTTCAGTTTT 
3251 CTCATCTGTA AAATCAGGAA GATTGGACTA AGTGATCCTG AAATGTATTT 
3301 TTTAGCACTG GATTTCTACA AATAATAAAA CTTTCCCATC TAGATAATGA 
3351 TGATCACATA GTCTTGATGT ACGGACATTA AAAGCCAGAT TTCTTCATTC 
3401 AATTCTGTTA TCTCTGTTTT ACTCTTTGAA ATTGATCAAG CCACTGAATC 
3451 ACTTTGCATT TCAGTTTATA TATAGAGAGA GAAAGAAGGC TGTCTGCTCT 
3501 TACATTATTG TGGAGCCCTG TGATAGAAAT ATGTAAAATC TCATATTATT 
3551 TTTTTTTTAA TTTTTTTATT TTTTATGACA GGGTCTCACT ATGTCACCCT 
3601 GGCTGGAGTG CAGTAGTGCG ATCGCGGCAC ACTGCAGCCT TGGCTTCCCT 
3651 GGGCTCAAGC AGTCCTCCCA CCTCAGTCTC CCAAATAGCT AGGACTACAG 
3701 GCGTGCGTGA CCAAGCCCAG CTAATTTTTG CATTTTTTGT AGAGATGGGG 
3751 TTTTGCCATG TTGCTCAGGC TGGTCTCAAA CTCCTGAGCA CTAGCAATCC 
3801 ACCCACCTCT GTTTCCAAAA AAAAAAAAAA AATGAAAGGT CAACCCCTAT 
3851 GCAAATTACC ACAGCAAAGG TTTCATTCAG GAGATTCTTC CATCTGGGCA 
3901 ACCTGGTTTT CCAAATATCA TTTGACCTAA GTGAATGTTG ATACTAGCTA 
3951 AAGATTGGGT AAATTGGTTG AATTATTGTA TTGAAGCTTG AGCTGTAGCT 
4001 AAAAGTAATT TAGGTTTCCC CTAAGATGTT ATTATGTTAG GGACATAACA 
4051 CTTTTGGGAG GTTGTTGTGG GAGATGGTTG ATTTAGGTTT TCAA7VAGCTA 
4101 GAAATAAAAT TTACATGCCT TAGATTTCAT AAAATTCTGC TCTAATTGGG 
4151 TGGAAGGTGC TGTATCTAAC TTGTGTTCCT CCTAAGGTTA TGTCCTAATA 
4201 ACTATTCTTT TAGGAGTATA CTTCTACTTT ATAGAAGGTT GCTTTTCTTT 
4251 TTAATTTTTT CTAACAAAGA AAAGAATAAA GTATTTATTA ATAAGAACCA 
4301 GAAAGCACTT GAAACTGATG TTTTTAATGG CTCATTTAGG GTAGATTTAT 
4351 TTATCTCATT AACTTAAAAC AGCTATGTGT ATGAAATAGG TCACAACAGA 
4401 ACTTGAACAC CAGGTTGGTG TCTGAGCAAT CCCTTTCTTA TGGGAAAAAC 
4451 AATGTTCTTG TTTGAACAGA GGGTATCATT GCAGTCAGTA TTCACGTGTA 
4501 TATTGTTATA TAAGTTGTAT AATATGCTTG TAAAGGCTGA GGGTGAGCTG 
4 551 TATCTGGATG CCTTTTTACA ATTTGATTTT AACTTTTAAA ATAAATTTAA 
4 601 AACATAAAAA AAAAAAAAA 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 2 


ORF from 95 bp to 2659 bp; peptide length: 855 
Category: similarity to known protein 
Classification: Cell division 


1 MDEEEDNLSL LTALLEENES ALDCNSEENN FLTRENGEPD AFDELFDADG 
51 DGESYTEEAD DGETGETRDE KENLATLFGD MEDLTDEEEV PASQSTENRV 
101 LPAPAPRREK TNEELQEELR NLQEQMKALQ EQLKVTTIKQ TASPARLQKS 
151 PEKSPRPPLK ERRVQRIQES TCFSAELDVP ALPRTKRVAR TPKPSPPDPK 
201 SSSSRMTSAP SQPLQTISRN KPSGITRGQI VGTPGSSGET TQPICVEAFS 
251 GLRLRRPRVS STEMNKKMTG RKLIRLSQIK EKMAREKLEE IDWVTFGVJL 
301 KKVTPQSVNS GKTFSIWKLN DLRDLTQCVS LFLFGEVHKA LWKTEQGTW 
351 GILNANPMKP KDGSEEVCLS IDHPQKVLIM GEALDLGTCK AKKKNGEPCT 
401 QTVNLRDCEY CQYHVQAQYK KLSAKRADLQ STFSGGRIPK KFARRGTSLK 
451 ERLCQDGFYY GGVSSASYAA SIAAAVAPKK KIQTTLSNLV VKGTNLIIQE 
501 TRQKLGIPQK SLSCSEEFKE LMDLPTCGAR NLKQHLAKAS ASGIMGSPKP 
551 AIKSISASAL LKQQKQRMLE MRRRKSEEIQ KRFLQSSSEV ESPAVPSSSR 
601 QPPAQPPRTG SEFPRLEGAP ATMTPKLGRG VLEGDDVLFY DESPPPRPKL 
651 SALAEAKKLA AITKLRAKGQ VLTKTNPNSI KKKQKDPQDI LEVKERVEKN 
701 TMFSSQAEDE LEPARKKRRE QLAYLESEEF QKILKAKSKH TGILKEAEAE 
751 MQERYFEPLV KKEQMEEKMR NIREVKCRW TCKTCAYTHF KLLETCVSEQ 
801 HEYHWHDGVK RFFKCPCGNR SISLDRLPNK HCSNCGLYKW ERi5GMLKVCH 
851 LRTNF 
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BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2hl5, frame 2 

TREMBLNEW:SPBC1347_10 gene: "0(1023"; ''SPBClSn? . 10"; product: "cell 
division cycle protein 23'*; S.pombe chromosome II cosmid cl347., N = 
2, Score = 284, P = 7e-21 

PIR:S48384 DNA43 protein - yeast {Saccharomyces cerevisiae), N « 2, 
Score = 203, P = 7e-12 

TREMBL:SCDNA52A_1 gene: "DNA52"; Saccharomyces cerevisiae DNA52 gene, 
complete cds., N = 2, Score = 201, P ~ 7.9e-12 

TREMBLNEW:AC006234_6 gene: "F5H14.6"; Arabidopsis thaliana chromosome 
II BAG F5H14 genomic sequence, complete sequence., N = 2, Score = 211, 

P = 1.7e-15 

PIR:S48384 DNA43 protein - yeast {Saccharomyces cerevisiae), N = 2, 
Score = 203, P = 7.2e-12 


>TREMBLNEW:SPBC134 7_10 gene: "cdc23"; -39801347.10"; product: "cell division 
cycle protein 23"; S.pombe chromosome II cosmid cl347. 
Length » 593 

HSPs: 

Score = 284 (42.6 bits). Expect = 7.0e-21, Sum P(2) = 7.0e-21 
Identities = 97/383 (25%), Positives = 186/383 (48%) 

EKTNEELQEELRNLQEQMKALQEQLKVTTIKQTASPARLQKSPEKSPRPPLKERRVQRIQ 168 
E+ + +L+E + LQ Q+ +QE+ ++ + ++ AS + + PR P ++ RV + 
EENDLDLEE— KRLQRQLNEIQEKKRLRSAQKEASSENAEVI— QVPRSPPQQVRVLTVS 63 

169 ESTCFSAE LDVPALPRTKRVARTPKPSPPDPKSSSSRMTSAPSQP LQTIS 218 

+ L + K V+ P P PK R+ A +Q L+T+ 


+N+ R + + G S E P+ C ++ +S + +S + + G ++ 


Query: 

109 

Sbjct: 

8 

Query: 

169 

Sbjct: 

64 

Query: 

219 

Sbjct: 

124 

Query: 

276 

Sbjct: 

184 

Query: 

332 

Sbjct: 

240 

Query: 

390 

Sbjct: 

300 

Query: 

450 

Sbjct: 

354 

Score 

= 41 1 

Identities = 

Query: 

453 

Sbjct: 

465 

Score 

- 40 ( 

Identities = 

Query: 

536 

Sbjct: 

481 


K E E+D +V G++ T ++VN K + + L DL+ +C 


FLFG+ + WK + GTV+ +LN +KPK+ L +D VL+ +G + LG C 


+++K+GE C ++ R + C+YHV ++ + R + S+ + P+ ARR 
SSRRKSGELCKHWLDKRAGDVCEYHVDLAVQRSMSTRTEFASSMATMHEPR— ARR 353 


++R GF Y+ G ++ ++A + +QT 

EKRFRGQGFQGYFAGEKYSAI PNAVAGLYDAEDAVQT 390 

[6.2 bits). Expect = 7.0e-21, Sum P(2) = 7.0e-21 
= 12/43 (27%), Positives « 17/43 (39%) 

LCQDGFYYGGVSSASYAASIAAAVAPKKKIQTTLSNLVVKGTN 495 
L +D S AS A++ K + SN + GTN 

LSKDSEIDSSTKKPSVLASFNASIMNPKSSLPSFSNSAILGTN 507 

6.0 bits). Expect « 8.9e-21, Sum P(2) « 8.9e-21 
13/26 (50%), Positives - 18/26 (69%) 

LAKASASGIMGSPKPAIKSISASALL 561 
LA +AS IM +PK ++ S S SA+L 


Pedant information for DKFZphtes3_2hl5, frame 2 


Report for DKFZphtes3_2hl5 .2 


800 


"^Om/nCSO PCT/IBOO/01496 


[LENGTH] 855 

[MWI 96135.01 

[pl] 8.96 

IHOMOLJ TREMBLNEW:SPBC1347_10 gene: "cdc23"; ''SPBC1347 . 10"; product: "cell division 

cycle protein 23"; S.pombe chromosome II cosmid cl347. 5e-16 P^o^^cc. cexi aivxsion 

f^MrnJl ^^^^^ control and mitosis (S. cerevisiae, YiLlSOc) le-11 

rimrnJ ^n"Jn synthesis and replication (S. cerevisiae, YiLlSOc} le-U 

ItUNCATi 30.10 nuclear organization (S. cerevisiae, YILlSOc] le-11 

IKW] Alpha_Beta 

IKWJ LOW_COMPLEXITY 12.05 % 

(KWJ COILED_COIL 4.21 % 

SEQ MDEEEDNLSLLTALLEENESALDCNSEENNFLTRENGEPDAFDELFDADGDGESYTEEAD 
SEG 

PRD cccchhhhhhhhhhhhhhhhccccccccceeeeccccccccceeeecccccccceeeeec 


COILS 


SEQ DGETGETRDEKENLATLFGDMEDLTDEEEVPASQSTENRVLPAPAPRREKTNEELQEELR 

SEG xxxxxxxxxxxx xxxxxxxxx 

r^ic ^^*^^^*=^^*=^^*^*^^^**^^^^^^cccceeeccccccccccccccccccchhhhhhhhhhh^ 
CCCCCCCCCCCCCC 

SEQ NLQEQMKALQEQLKVTTIKQTASPARLQKSPEKSPRPPLKERRVQRIQESTCFSAELDVP 

SEG xxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhcccccccccccccccccccceeeeeccccccccccc^^ 

COILS CCCCCCCCCCCCCCCCCCCCCC 


SEQ ALPRTKRVARTPKPSPPDPKSSSSRMTSAPSQPLQTISRNKPSGITRGQIVGTPGSSGET 
SEG xxxxxxxxxxxxx 

PRD cccccceeeecccccccccccchhhhhhhccccchhhhhhccccccceeeeec^^ 


COILS 


SEQ TQPICVEAFSGLRLRRPRVSSTEMNKKMTGRKLIRLSQIKEKMAREKLBEIDWVTFGVIL 
SEG 

COILS ^*^^^^*^^^*^*^'**^^^^*^**'^^^*^^**'^^^*^^^*^***^*^^*^'^hhhhhhh^ 

SEQ KKVTPQSVNSGKTFSIWKLNDLRDLTQCVSLFLFGEVHKALWKTEQGTVVGILNANPMKP 

PRD cccccccccccceeeeeeeccchhhhhhheeeeecchhhhhhhhccceeeeec^ 

COILS 

SEQ KDGSEEVCLSIDHPQKVLIMGEALDLGTCKAKKKNGEPCTQTVNLRDCEYCQYHVQAQYK 

SEG 

^^^rr. ^*=^*^*=^^®^^^^^*=«=c««eccccccccccccccccccccceeecccccccchhhhh^^ 

COILS 

SEQ KLSAKRADLQSTFSGGRIPKKFARRGTSLKERLCQDGFYYGGVSSASYAASIAAAVAPKK 

f ••;''*•-; xxxxxxxxxxxxxxxxxxx . . . 

PRD hhhhhhhhhhhhccccccccccccccchhhhhhhccccccccccchhhhhhhhhhhhcch 

COILS •*••••••-•'•••.••.......,...., 

SEQ KIQTTLSNLVVKGTNLIIQETROKLGIPQKSLSCSEEFKELMDLPTCGARNLKQHLAKAS 

SEG 

PRD hhhhhhheeecccceeeehhhhhhhcccccccchhhhhhhhhhccccccc^^ 

COILS ....»......,,,.. 

SEQ ASGIMGSPKPAIKSISASALLKQQKQRMLEMRRRKSEEIQKRFLQSSSEVESPAVPSSSR 
'***"''*'*'*'*''"•••*•**•••••••••••••■••••... xxxxxxxxxxxxxxx 

PRD hhcccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccccc 
COILS 

SEQ QPPAQPPRTGSEFPRLEGAPATMTPKLGRGVLEGDDVLFYDESPPPRPKLSALAEAKKLA 

SEG xxxxxxxx XXXXXX.XXXXXX 

^^7tc ^^^^^^^^*=^^^^^ccccccccccccccccccccceeeeeccccccchhhhhhhhhhhhh 

COILS •••••••»•••••.»..,,.♦.,,,. 

SEQ AITKLRAKGQVLTKTNPNSIKKKQKDPQDILEVKERVEKNTMFSSQAEDELEPARKKRRE 

SEG XXXXX 

^^TT.. ^^*^*^^^^^^^^^«cccccccccccccchhhhhhhhhhhhccchhh^^ 

COILS 

SEQ QIAYLESEEFQKILKAKSKHTGILKEAEAEMQERYFEPLVKKEQMEEKMRNIREVKCRVV 

SEG •■•••••••.•••.».....,,.,,... 

COXLS 

SEQ TCKTCAYTHFKLLETCVSEQHEYHWHDGVKRFFKCPCGNRSISLDRLPNKHCSNCGLyKW 
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SEG 

PRO eeecceeeeeeecccceeeccccccccceeeeeecccccccccccccccccccccceeec 

COILS 

SEQ ERDGMLKVCHLRTNF 

SEG 

PRD CCCCCCCCCCCCCCC 

COILS 


(No Prosite data available for DKF2phtes3_2hl5.2) 
(NO Pfaro data available for DKF2phtes3_2hl5.2) 
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DKFZphtes3_2i5 


group: testes derived 

DKFZphtes3_2i5 encodes a novel 151 amino acid protein with weak similarity to. C.eleoans 
cosmid F20D12.3 ^ -tcyauo 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

genes^'' ^^""^ application in studying the expression profile of testis-specific 

similarity to C.elegans F20D12.3 

many ATGs in front of the start of the ORF, 
unspliced intron in 5* region? 

Sequenced by EMBL 

Locus : unknown 

Insert length: 2142 bp 

Poly A stretch at pos. 2121, polyadenylation signal at pos. 2102 

1 GCAGTAAATA TGATATGAAA GAATTCTCTA ACTTGGGGGT GGCTTGTAAC 
51 CTGTAATAAA AATATTGCTA AAATACCTTC TCTCACTTTG AAAAAGCATC 

101 TGAGCAATCC TCAGTTATTG GTGAATTCTT ACCAGTGTTT AATTCCTCTC 

151 TTTCCGTTAT GGTCTTAGTG TGGTTGTCCT GGTGTAGTAT TTCAAGAGGA 

201 ACCTGCAGCA AGATGAAAAG AGAGTGGGAC TTGGAGCTAA GAACGTTTTT 

251 GGCTTTAAGT GCTACGTTAA CTCATTAAAT TCTTAGTGAT CTTGGGGAAG 

301 TCCCCTCACC AGTGTGAGCC TCAGTTTTCT TATCTAATAA GTAAGGATAA 

351 TCTTACCCAC CTTATTGCGG GGGCCCGAGG ATTACATGAT TGGTGTAACA 

401 GTAGCACCTT GTACATTTGA AAGGACTAAT ACCAGTGGAC TTTAACCTTG 

451 GCTGGGCTTT GGAATTCTTG GTGGGACTTT TTAATCATGT AGATTCTCAG 

501 GCCCCTGCCT GGCCTGTGGA ACCACAGACT CTATAGGTGG GCCCTTCCAG 

551 AAGGCCTCAT GGGTGGTTCT CATGTGGAAC CTGTGTTGCA AGCCACTGCA 

601 TGGTGTTACT GCTATTAACA TTAAAACTTA TATTTTCCTT ATTGTGTGGA 

651 TATATCTGTG GTGTTTGCCC ATGTATACTT CATTTTACAT TTCTTAAAGA 

701 ATAGAATGGA ATGGTTTTAA GCACGCTACA TTGTCCAGGT TATACCCACA 

751 GAAGAGCTGT TGTGTAACAG AATCAGCATC ATACCTGAAT CATTTGTACA 

801 TTGCATATAA GACTATGTCT AAGTAGAAGA TGCTATGAAA TCATGTCTGC 

851 TGTGGGGCCA GGCATAATTA TGAATGTTAC TTAAGAGCAT AGGTGAGGTG 

901 AGAAAAGGGA ATGTGACTAG TGTTTTAGTA TTTTCTTGGT GTGGGATGAA 

951 GTATAATTCT tTTTTTTTTT TCTCAACAAA GCAGTAAAAC TAGAAAGAAG 
1001 GAGAACTCTT CCCTCAAGAA TGGCTGTACC TTCATATCTA GAGGCACATT 
1051 AAAAAAAAGA ACGTCTGTAC CTTAAAAATG GAGGTCATTT CATTGTGTTC 
1101 ATTTTCAAGG TTGTTGTATG GCTCGGTCAG AACTTTCTGT TACCAGAAGA 
1151 CACTCACATT CAGAATGCTC CATTTCAAGT GTGTTTCACA TCTTTACGGA 
1201 ATGGCGGCCA CCTGCATATA AAAATAAAAC TTAGTGGAGA GATCACTATA 
1251 AATACTGATG ATATTGATTT GGCTGGTGAT ATCATCCAGT CAATGGCATC 
1301 ATTTTTTGCT ATTGAAGACC TTCAAGTAGA AGCGGATTTT CCTGTCTATT 
1351 TTGAGG7VATT ACGAAAGGTG CTAGTTAAGG TGGATGAATA TCATTCAGTG 
1401 CATCAGAAGC TCAGTGCTGA TATGGCTGAT CATTCTAATT TGATCCGAAG 
1451 TTTGCTGGTC GGAGCTGAGG ATGCTCGTCT GATGAGGGAC ATGAAAACAA 
1501 TGAAGAGTCG TTATATGGAA CTCTATGACC TTAATAGAGA CTTGCTAAAT 
1551 GGATATAAAA TTCGCTGTAA CAATCACACA GAGCTGTTGG GAAACCTCAA 
1601 AGCAGTAAAT CAAGCAATTC AAAGAGCAGG TCGTCTGCGG GTTGGAAAAC 
1651 CAAAGAACCA GGTGATCACT GCTTGTCGGG ATGCAATTCG AAGCAATAAC 
1701 ATCAACACAC TGTTCAAAAT CATGCGAGTG GGGACAGCTT CTTCCTAGGT 
1751 GAGGAAAATA CAGGTCATGA AGTTCCTGGC AAAGATTTTC TGTTAAAAAC 
1801 CTATGCTGGT TTGCTTTGGA TCACACCCTG GTGAACCCCG GGTGCTAAGA 
1851 ATGAAAATAA CCTTGGTGAG TTGTACAAAT TAAAGACAAA GAACTACATG 
1901 TGAAGATAGA CTTGCTTTCT ATTTTTAAAT CAGTAGTAGT ACTGTTGCTG 
1951 AATAATACTA GGTTTTTATG GAATAGGATG AATGCTTTTG AAGTATTAGG 
2001 GCTTCAGAGT CCAATTTTGC TTATTTATGG TATATAAATA CATATTTTTT 
2051 TCTTGAAATT GCAATTGAGT TTGTACTTTT CAAATAGATT ATCTACTTTT 
2101 TCATTAAAAT GTAAAGATGT TAAAAAAAAA AAAAAAAAAA AA 


BLAST Results 


No BLAST result 


Medline entries 
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No Medline entry 


Peptide information for frame 3 


ORF from 1293 bp to 1745 bp; peptide length: 151 
Category: similarity to unknown protein 
Classification: no clue 

1 MASFFAIEDL QVEADFPVYF EELRKVLVKV DEYHSVHQKL SADMADHSNL 
51 IRSLLVGAED ARLMRDMKTM KSRYMELYDL NRDLLNGYKI RCNNHTELLG 
101 NLKAVNQAIQ RAGRLRVGKP KNQVITACRD AIRSNNINTL FKIMRVGTAS 
151 S 


BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2i5, frame 3 

TREMBL:CEF20D12_1 gene: "F20D12.3"; Caenorhabditis elegans cosmid 
F20D12., N = 1, Score = 173, P = 4.5e-12 


>TREMBL:CEF20D12_1 gene: '•F20D12.3"; Caenorhabditis elegans cosmid F20D12. 
Length « 699 

HSPs: 


Score = 173 (26.0 bits). Expect = 4.5e-12, P = 4.5e-12 
Identities = 33/130 (25%), Positives = 72/130 (55%) 


Query: 

20 

FEELRKVLVKVDEYHSVHQKLSADMADHSNLI RSLLVGAEDARLMRDMKTMKSRYMELYD 7 9 



F+E ++L ++D V +L+A++ + ++ +++ AED+ + ++ + y+ L 

Sbjct: 

569 

FKEADEILEEIDPMTEVRDRLTAELQERQAAVKEI I IRAEDSIAIDNIPDARKFYIRLKA 628 

Query: 

80 

LNRDLLNGYKIRCNNHTELLGNLKAVNQAIQRAGRLRVGKPKNQVITACRDAIRSNNINT 139 



+ ++R NN + +L+ +N+ 1+ RLRVG+P Q++ +CR AI +N 

Sbjct: 

629 

NDAAARQAAQLRWNNQERCVKSLRRLNKI I ENCSRLRVGEPGRQI WSCRSAI ADDNKQI 68 8 

Query: 

140 

LFKIMRVGTA 149 



+ KI++ G + 

Sbjct: 

689 

ITKILQYGAS 698 


Pedant information for DKF2phtes3_2i5, frame 3 


Report for DKFZphte53_2i5.3 


(LENGTH] 151 

[MWl 17304.07 

[plj 9.33 

[HOMOL] TREMBL:CEF20D12_1 gene: "F20D12.3"; Caenorhabditis elegans cosmid F20D12. 2e-12 

[KWl Alpha_Beta 


SEQ MASFFAIEDLQVEADFPVYFEELRKVLVKVDEYHSVHQKLSADMADHSNLIRSLLVGAED 

PRD ccceeeehhhhhhccccchhhhhhhhhhhhccchhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ ARLMRDMKTMKSRYMELYDLNRDLLNGYKIRCNNHTELLGNLKAVNQAIQRAGRLRVGKP 

PRD hhhhhccchhhhhheeeccchhhhhhheeeeeccchhhhhhhhhhhhhhhhhcccccccc 

SEQ KNQVITACRDAIRSNNINTLFKIMRVGTASS 

PRD cceeeeehhhhhhcccceeeeccceeecccc 


(No Prosite data available for DKFZphtes3_2i5 .3) 
(No Pfam data available for DKFZphtes3__2i5.3) 
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group: testes derived 

DKF2phtes3_2119 encodes a novel 166 amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 


unknown 

complete cDNA, complete cds, no EST hits 

Sequenced by EHBL 

Locus : unknown 

Insert length: 1079 bp 

Poly A stretch at pos. 1053, polyadenylation signal at pos. 1038 


1 CCACAGGACA CACTGTTCCC AGGGCACAGA CACCCTGGGC TTTGGTTGGG 
51 TCTTGGCCTC CAGGTAGGGC CCTGTTGGGC AGCGGGCAGC AACTCCTGAG 
101 ACACTACTGT GATTCTTGGT GGTGGCTGTG GTAAAAAACC TGCAGGGCTA 
151 GAGTTTGGGG TGAGATTCAG CAGTAACTGT GGCCTCTCCT AGTGACAGTA 
201 TGTCACTCCC ACTCCCAGCA CGCATGCCCA CAGGCCACGG CCTCCACATC 
251 ACAAACCCCC CACCAAGTTG CCCATCTATG GAGCAGCTCC CATACGGCAG 
301 GGTCAGGCTC TTACCTCCAC CTCCAGGGCA CAGACAGGGG GAGCTCTGTC 
351 TCACTGTAAG GCAATGAGGA GAGTTGAGGG CCCAGACCAG GCTAGGGGCC 
401 ATCCCCTTTC CCGAGCAGGC CTCAGGGAAG GACCAGCCCC ATTCCCATCT 

,451 GACCTAGGTC TTAGCCCAGG AGCCTGCATA GGGAAGAAAG GACAGACAGG 
501 GCCTCCTTAC TGGCTGACAC TCAGGAGGGG CTGGGGCAAG AGAGCAGAGG 
551 GAGCGCAGGG CCAGGCAGGG GCTGCTGAGG ATCCATGGGA GCTCAGGGTG 
601 CACAAGGGGG CTGCCCTTCC TGGGCTGCAG GCAGCATCCC TATGGGAGCT 
651 GAGAAAGTCC AATCCTGAGA TGGGACAGTG CTGCCCAGGG GTGTGTGGCT 
701 GGGCCCTGAC AACAGTCTCC CCAAAAGTGA CCACATCACC AGGCTCAGTT 
751 CCAGGAAGGC TGAGAAGTGC CCAGTACACT GAGGATGCAC CTCAGTTACA 
801 TAAAATAAAT GAAACTGGAG TACTAACGTA CAGTTTAAAG GTTATAGTTA 
851 CTATTTTTAT ATGATATACT AGTAATTTTT GAATAGGGTA AACTTTAGGT 
901 GTTTTGACAC CAAAAGAAAA CTACATGAGT TCATGCATGT GTTAAATTGC 
951 TTTACTGTAG TAATCATTTA CATGTATATG TATATATGAA TATAATTATG 

1001 GGCTCATTAA ATTTAAATAT TATAAATAGG TGACAAAGAA TAAAGTTAAC 

1051 TGGAAAAAAA AAAAAAAAAA AAAAAAAAA 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 1 


ORF from 364 bp to 861 bp; peptide length: 166 
Category: putative protein 
Classification: no clue 

1 MRRVEGPDQA RGHPLSRAGL REGPAPFPSD LGLSPGACIG KKG<yrGPPYW 
51 LTLRRGWGKR AEGAQGQAGA AEDPWELRVH KGAALPGLQA ASLWELRKSN 
101 PEMGQCCPGV CGWALTTVSP KVTTSPGSVP GRLRSAQYTE DAPQLHKINE 
151 TGVLTYSLKV IVTIFI 


BLAST P hits 

No BLASTP hits available 
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Alert BLASTP hits for DKFZphtes3_2119, frame 1 
No Alert BLASTP hits found 

Pedant information for DKF2phtes3_2119, frame 1 


Report for DKFZphtes3_2119. 1 


166 

17691.35 
9.54 

All_Beta 

LOW^COMPLEXITY 7.23 % 


SEQ MRRVEGPDQT^GHPLSRAGLREGPAPFPSDLGLSPGACIGKKGQTGPPYWLTLRRGWGKR 

SEG 

PRD ccccccccccccccccccccccccccccccccccccceeeccccccccceeeeecccccc 

SEQ AEGAQGQAGAAEDPWELRVHKGAALPGLQAASLWELRKSNPEMGQCCPGVCGWALTTVSP 

SEG xxxxxxxxxxxx 

PRD ccccccccccccccceeeeccccccccchhhhhhhhhhcccccccccccccceeeeeccc 

SEQ KVTTSPGSVPGRLRSAQYTEDAPQLHKINETGVLTYSLKVIVTIFI 

SEG 

PRD ccccccccccccccccccccccccceeeccccceeeehhhhhhccc 


<No Prosite data available for DKFZphtes3_2119.1) 
(No Pfam data available for DKFZphtes3_2119. 1) 


(LENGTH] 
(MWJ 

IpIJ 
[KW] 
IKW) 
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DKFZphtes3_2ml8 


group: nucleic acid management 

DKFZphtes3_2ml8 encodes a novel amino acid protein, with similarity to mouse Dhml. 

The protein seems to play a role in nucleotide metabolism, RNA metabolism, but also in DNA 
repair and cell cycle. The yeast homologue is a DNA strand exchange protein required for 
sporulation and homologous recombination. 

The novel protein can find application as multifunctional nuclease / exoribonuclease. 


nearly identical to mouse Dhml 

complete cDNA, complete cds, start at Bp 42, EST hits 

Sequenced by EMBL 

Locus : unknown 

Insert length: 3022 bp 

Poly A stretch at pos. 3004, polyadenylation signal at pos. 2981 


1 CTCGTCAGCC GGTCGGCCGC CGCCTCCAGC CGTGTGCCGC TATGGGAGTC 
51 CCGGCGTTCT TCCGCTGGCT CAGCCGCAAG TACCCGTCCA TCATAGTCAA 
101 CTGCGTGGAA GAGAAGCCAA AAGAATGCAA TGGTGTAAAG ATTCCAGTTG 
151 ATGCCAGTAA ACCTAATCCA AATGATGTGG AGTTTGATAA TCTGTATTTG 
201 GATATGAATG GAATCATCCA TCCCTGTACT CATCCTGAAG ACAAACCAGC 
251 ACCAAAAAAT GAAGATGAAA TGATGGTTGC AATTTTTGAG TACATTGACA 
301 GACTTTTCAG TATTGTAAGA CCAAGAAGAC TTCTCTACAT GGCAATAGAT 
351 GGAGTGGCAC CACGTGCTAA AATGAACCAG CAGCGTTCAA GGAGGTTCAG 
401 GGCATCAAAA GAAGGAATGG AAGCAGCAGT CGAGAAGCAG CGAGTCAGGG 
451 AAGAAATATT GGCAAAAGGT GGCTTTCTTC CTCCAGAAGA AATAAAAGAA 
501 AGATTTGACA GCAACTGTAT TACACCAGGA ACTGAATTCA TGGACAATCT 
551 TGCTAAATGC CTTCGCTATT ACATAGCTGA TCGTTTAAAT AATGACCCTG 
601 GGTGGAAAAA TTTGACAGTT ATTTTATCTG ATGCTAGTGC TCCTGGTGAA 
651 GGAGAACATA AAATCATGGA TTACATTAGA AGGCAAAGAG CCCAGCCTAA 
701 CCATGACCCA AATACTCATC ATTGTTTATG TGGAGCAGAT GCTGATCTCA 
751 TTATGCTTGG CCTTGCCACA CATGAACCGA ACTTTACCAT TATTAGAGAA 
801 GAATTCAAAC CAAACAAGCC CAAACCATGT GGTCTTTGTA ATCAGTTTGG 
851 ACATGAGGTC AAAGATTGTG AAGGTTTGCC AAGAGAAAAG AAGGGAAAGC 
901 ATGATGAACT TGCCGATAGT CTTCCTTGTG CAGAAGGAGA GTTTATCTTC 
951 CTTCGGCTTA ATGTTCTTCG TGAGTATTTG GAAAGAGAAC TCACAATGGC 
1001 CAGCCTACCA TTCACATTTG ATGTTGAGAG GAGCATTGAT GACTGGGTTT 
1051 TCATGTGCTT CTTTGTGGGA AATGACTTCC TCCCTCATTT GCCATCGTTA 
1101 GAGATTAGGG AAAATGCAAT TGACCGTTTG GTTAACATAT ACAAAAATGT 
1151 GGTACACAAA ACTGGGGGTT ACCTTACAGA AAGTGGTTAT GTCAATCTGC 
1201 AAAGAGTACA GATGATCATG TTAGCAGTTG GTGAAGTTGA GGATAGCATT 
1251 TTTAAAAAGA GAAAGGATGA TGAGGACAGT TTTAGAAGAC GACAGAAAGA 
1301 AAAAAGAAAG AGAATGAAGA GAGATCAACC AGCTTTCACT CCTAGTGGAA 
1351 TATTAACTCC TCATGCCTTG GGTTCAAGAA ATTCACCAGG TTCTCAAGTA 
1401 GCCAGTAATC CGAGACAAGC AGCCTATGAA ATGAGGATGC AGAATAACTC 
1451 TAGTCCTTCG ATATCTCCTA ATACGAGTTT CACATCTGAT GGCTCCCCGT 
1501 CTCCATTAGG AGGAATTAAG CGAAAAGCAG AAGACAGTGA CAGTGAACCT 
1551 GAGCCAGAGG ATAATGTCAG GTTATGGGAA GCTGGCTGGA AGCAGCGGTA 
1601 CTACAAGAAC AAATTTGATG TGGATGCAGC TGATGAGAAA TTCCGTCGGA 
1651 AAGTTGTGCA GTCGTACGTT GAAGGACTTT GCTGGGTTCT TAGATATTAT 
1701 TACCAGGGCT GTGCTTCCTG GAAGTGGTAT TATCCATTTC ATTATGCACC 
1751 ATTTGCTTCA GACTTTGAAG GCATTGCAGA CATGCCATCT GATTTTGAGA 
1801 AGGGTACGAA ACCGTTTAAA CCACTAGAAC AACTTATGGG GGTATTTCCA 
1851 GCTGCAAGTG GTAATTTTCT ACCTCCATCA TGGCGGAAGC TCATGAGTGA 
1901 TCCTGATTCT AGTATAATTG ACTTCTATCC TGAAGATTTT GCTATTGATT 
1951 TGAATGGGAA GAAATATGCA TGGCAAGGTG TTGCTCTCTT GCCATTCGTG 
2001 GATGAGCGAA GGCTACGAGC TGCCCTAGAA GAGGTATACC CAGACCTCAC 
2051 TCCAGAAGAG ACCAGAAGAA ACAGCCTTGG AGGTGATGTC TTATTTGTGG 
2101 GGAAACATCA CCCACTCCAT GACTTCATTT TAGAGCTGTA CCAGACAGGT 
2151 TCCACAGAGC CAGTGGAGGT ACCCCCTGAA CTATGTCATG GGATTCAAGG 
2201 AAAGTTTTCT TTGGATGAAG AAGCCATTCT TCCAGATCAA ATAGTATGTT 
2251 CTCCTGTTCC TATGTTAAGG GATCTGACAC AGAACACTGT AGTCAGTATT 
2301 AATTTTAAAG ACCCACAGTT TGCTGAAGAT TACATTTTTA AAGCTGTAAT 
2351 GCTTCCAGGA GCAAGAAAGC CAGCAGCAGT ACTGAAACCT AGTGACTGGG 
2401 AAAAATCCAG CAATGGACGG CAGTGGAAGC CTCAGCTTGG CTTTAACCGT 
2451 GACCGGAGGC CTGTGCACCT GGATCAGGCA GCCTTCAGGA CTTTGGGCCA 
2501 TGTGATGCCA AGAGGCTCAG GAACTGGCAT TTACAGCAAT GCTGCACCAC 
2551 CACCTGTGAC TTACCAGGGA AACTTATACA GGCCGCTTTT GAGAGGACAA 
2601 GCCCAGATTC CAAAACTTAT GTCAAATATG AGGCCCCAGG ATTCCTGGCG 
2651 AGGTCCTCCT CCCCTTTTCC AGCAGCAAAG GTTTGACAGA GGCGTTGGGG 
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2701 CTGAACCTCT GCTCCCATGG AACCGGATGC TGCAAACCCA GAATGCAGCC 

2751 TTCCAGCCAA ACCAGTACCA GATGCTAGCT GGGCCTGGTG GGTATCCACC 

2801 CAGACGAGAT GATCGTGGAG GGAGACAGGG ATATCCCAGA GAAGGAAGGA 

2851 AATACCCTTT GCCACCACCC TCAGGAAGAT ACAATTGGAA TTAAGCTTTT 

2901 GTAAAGCTTT CCCAAATCCT TTCATCATTC TACAGTTTTA TGCTATTTGT 

2951 GGAAAGATTT CCTTCTCAAG TAGTAGTTTT TAATAAAACT ACAGTACTTT 

3001 GTGTAAAAAA AAAAAAAAAA AA 


BLAST Results 


No BLAST result 


Medline entries 


95192042: 

Characterization of cDNA encoding mouse homolog of fission yeast dhpl+ 
gene: structural 

and functional conservation. 

97361754; 

Cloning and characterization of mouse Dhm2 cDMA, a functional homolog 
of budding yeast 
SEPl. 


Peptide information for frame 3 


ORF from 42 bp to 2891 bp; peptide length: 950 
Category: strong similarity to known protein 


1 MGVPAFFRWL SRKYPSIIVN CVEEKPKECN GVKIPVDASK PNPNDVEFDN 

51 LYLDMNGIIH PCTHPEDKPA PKNEDEMMVA IFEYIDRLFS IVRPRRLLYM 
101 AIDGVAPRAK MNQQRSRRFR ASKEGMEAAV EKQRVREEIL AKGGFLPPEE 
151 IKERFDSNCI TPGTEFMDNL AKCLRYYIAD RLNNDPGWKN LTVILSDASA 
201 PGEGEHKIMD YIRRQRAQPN HDPNTHHCLC GADADLIMLG LATHEPNFTI 
251 IREEFKPNKP KPCGLCNQFG HEVKDCEGLP REKKGKHDEL ADSLPCAEGE 
301 FIFLRLNVLR EYLERELTMA SLPFTFDVER SIDDWVFMCF FVGNDFLPHL 
351 PSLEIRENAI DRLVNIYKNV VHKTGGYLTE SGYVNLQRVQ MIMLAVGEVE 
401 DSIFKKRKDD EDSFRRRQKE KRKRMKRDQP AFTPSGILTP HALGSRNSPG 
451 SQVASNPRQA AYEMRMQNNS SPSISPNTSF TSDGSPSPLG GIKRECAEDSD 
501 SEPEPEDNVR LWEAGWKQRY YKNKFDVDAA DEKFRRKWQ SYVEGLCWVL 
551 RYYYQGCASW ECWYYPFHYAP FASDFEGIAD MPSDFEKGTK PFKPLEQLMG 
601 VFPAASGNFL PPSMRKLMSD PDSSIIDFYP EDFAIDLNGK KYAWQGVALL 
651 PFVDERRLRA ALEEVYPDLT PEETRRNSLG GDVLFVGKHH PLHDFILELY 
701 QTGSTEPVEV PPELCHGIQG KFSLDEEAIL POQIVCSPVP MLRDLTQNTV 
751 VSINFKDPQF AEDYIFKAVM LPGARKPAAV LKPSDWEKSS NGRQWKPQLG 
801 FNRDRRPVHL DQAAFRTLGH VMPRGSGTGI YSNAAPPPVT YQGNLYRPLL 
851 RGQAQIPKLM SNMRPQDSWR GPPPLFQQQR FDRGVGAEPL LPWNRMLQTQ 
901 NAAFQPNQYQ MLAGPGGYPP RRDDRGGRQG YPREGRKYPL PPPSGRYNWN 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2ml8, frame 3 

PIR: 149635 mouse Dhml protein - mouse, N « 1, Score » 4765, P - 0 

PIR:S43891 dhpl protein - fission yeast (Schizosaccharomyces pombe), N 
= 3, Score = 1172, P = 2e-197 

PIR:S20126 exoribonuclease RATI (EC 3.1.11.-) - yeast (Saccharomyces 
cerevisiae), N = 2, Score « 1146, P = 3.8e-175 

PIR:S72531 exonuclease II - fission yeast (Schizosaccharomyces pombe), 
N = 4, Score = 622, P » 4.2e-125 


>PIR: 149635 mouse Dhml protein - mouse 
Length » 947 

HSPs: 
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Score • 4765 (714.9 bits). Expect = O.Oe+00, P = O.Oe+00 
Identities - 884/930 (95%), Positives = 895/930 (96%) 


Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 


1 
1 
61 
61 
121 
121 
181 
181 
241 
241 
301 
301 
361 
361 
421 
421 
481 
481 
541 
541 
601 
601 

661 

661 

721 

721 

781 

781 

841 

841 

901 

901 


MGVPAFFRWLSRKYPSIIVNCVEEKPKECNGVKIPVDASKPNPNDVEFDNLYLDMNGIIH 60 
MGVPAFFRWLSRKYPSIIVNCVEEKPKECNGVKIPVDASKPNPMDVEFDMLYLDMNGIIH 
MGVPAFFRWLSRKYPSIIVNCVEEKPKECNGVKIPVDASKPNPNDVEFDNLYLDMNGIIH 60 

PCTHPEDKPAPKNEDEMMVAIFEYIDRLFSIVRPRRLLYMAIDGVAPRAKMNQQRSRRFR 120 
PCTHPEDKPAPKNEDEMMVAIFEYIDRLF+IVRPRRLLYMAIDGVAPRAKMNQQRSRRFR 
PCTHPEDKPAPKNEDEMMVAIFEYIDRLFNIVRPRRLLYMAIDGVAPRAKMNQQRSRRFR 120 

ASKEGMEAAVEKQRVREEILAKGGFLPPEEIKERFDSNCITPGTEFMDNLAKCLRYYIAD 180 
A K GMEAAVEKQRVREEILAKGGFLPPEEIKERFDSNCITPGTEFMDNLAKCLRYYIAD 
AIKGGMEAAVEKQRVREEILAKGGFLPPEEIKERFDSNCITPGTEFMDNLAKCLRYYIAD 180 

RLNNDPGWKNLTVXLSDASAPGEGEHKIMDYIRRQRAQPNHDPNTHHCLCGADADLIMLG 240 
RLNNDPGWKNLTVILSDASAPGEGEHKIMDYIRRQRAQPN DPNTHHCLCGADADLIMLG 
RLNNDPGWKNLTVILSDASAPGEGEHKIMDYIRRQRAQPNQDPNTHHCLCGADADLIMLG 240 

LATHEPNFTIIREEFKPNKPKPCGLCNQFGHEVKDCEGLPREKKGKHDELADSLPCAEGE 300 
LATHEPNFTIIREEFKPNKPKPC LCNQFGHEVKDCEGLPREKKGKHDELADSLPCAEGE 
LATHEPNFTIIREEFKPNKPKPCALCNQFGHEVKDCEGLPREKKGKHDELADSLPCAEGE 300 

FIFLRLNVLREYLERELTMASLPFTFDVERSIDDWVFMCFFVGNDFLPHLPSLEIRENAI 360 
FIFLRLNVLREYLERELTMASLPF FDVERS DDW FMCFFVGNDFLPHLPSLEIRE AI 

FIFLRLNVLREYLERELTMASLPFPFDVERSNDDWEFMCFFVGNDFLPHLPSLEIREGAI 360 

DRL VN I YKNWHKTGG YLT ESG Y VNLQRVQMI MLAVGE VEDS I FKKRKDDEDS FRRRQKE 420 
DRLVNIYKNWHKTGGYLTESGYVNLQRVQMIMLAVGEVEDSIFKKRKDDEDSFRRRQKE 
DRLVNI YKNWHKTGG YLTESGYVNLQRVQMIMLAVGE VEDS I FKKRKDDEDS FRRRQKE 420 

KRKRMKRDQPAFTPSGILTPHALGSRNSPGSQVASNPRQAAYEMRMQNNSSPSISPNTSF 480 
KRKRMKRDQPAFTPSGILTPHALGSRNSPG QVASNPRQAAYEMRMQ NSSPSISPNTSF 
KRKRMKRDQPAETPSGILTPHALGSRNSPGCQVASNPRQAAYEMRMQRNSSPSISPNTSF 480 

TSDGSPSPLGGIKRKAEDSDSEPEPEDNVRLWEAGWKQRYYKNKFDVDAADEKFRRKWQ 540 

SDGSPSPLGGI+RKAEDSDSEPEPEDNVRLWEAGWKQRYYKNKFDVDAADEKFRRKWQ 
ASDGSPSPLGGIRRKAEDSDSEPEPEDNVRLWEAGWKQRYYKNKFDVDAADEKFRRKVVQ 540 

SYVEGLCWVLRYYYQGCASWKWYYPFHYAPFASDFEGIADMPSDFEKGTKPFKPLEQLMG 600 
SYVEGLCWVLRYYYQGCASWKW YPFHYAPFASDFEGIADM S+FEKGTKPFKPLEQLMG 
SYVEGLCWVLRYYYQGCASWKWLYPFHYAPFASDFEGIADMSSEFEKGTKPFKPLEQLMG 600 

VFPAASGNFLPPSWRKLMSDPDSSIIDFYPEDFAIDLNGKKYAWQGVALLPFVDERRLRA 660 
VFPAASGNFLPP+WRKLMSDPDSSIIDFYPEDFAIDLNGKKYAWQGVALLPFVDERRLRA 
VFPAASGNFLPPTWRKLMSDPDSSIIDFYPEDFAIDLNGKKYAWQGVALLPFVDERRLRA 660 

ALEEVYPDLTPEETRRNSLGGDVLFVGKHHPLHDFILELYQTGSTEPVEVPPELCHGIQG 720 
ALEEVYPDLTPEE RRNSLGGDVLFVGK HPL DFILELYQTGSTEPV+VPPELCHGIQG 
ALEEVYPDLTPEENRRNSLGGDVLFVGKLHPLRDFILELYQTGSTEPVDVPPELCHGIQG 720 

KFSLDEEAILPDQIVCSPVPMLRDLTQNTVVSINFKDPQFAEDYIFKAVMLPGARKPAAV 780 

FSLDEEAILPDQ VCSPVPMLRDLTQNT VSINFKDPQFAEDY+FKA MLPGARKPA V 
TFSLDEEAILPDQTVCSPVPMLRDLTQNTAVSINFKDPQFAEDYVFKAAMLPGARKPATV 780 

LKPSDWEKSSNGRQWKPQLGFNRDRRPVHLDQAAFRTLGHVMPRGSGTGIYSNAAPPPVT 840 
LKP DWEKSSNGRQWKPQLGFNRDRRPVHLDQAAFRTLGHV PRGSGT +Y+N A P 

LKPGDWEKSSNGRQWKPQLGFNRDRRPVHLDQAAFRTLGHVTPRGSGTSVYTNTALLPAN 840 

YQGNLYRPLLRGQAQI PKLMSNMRPQDSWRGPPPLFQQQRFDRGVGAEPLLPWNRMLQTQ 900 
YQGN YRPLLRGQAQIPKLMSNMRP+DSWRGPPPLFQQ RF+R VGAEPLLPWNRM+Q Q 
YQGNNYRPLLRGQAQIPKLMSNMRPKDSWRGPPPLFQQHRFERSVGAEPLLPWNRMXQNQ 900 

NAAFQPNQYQMLAGPGGYPPRRDD-RGGRQ 929 
NAAFQPNQYQML GPGGYPPRRDD RGGRQ 
NAAFQPNQYQMLGGPGGYPPRRDDHRGGRQ 930 


Pedant information for DKrzphtes3_2ml8, frame 3 


Report for DKFZphtes3 2ml8.3 


(LENGTH) 

[MW] 

tpl] 

(HOMOL] 

[FUNCAT] 

(FUNCAT) 


950 

108582.68 
7.26 

PIR: 149635 mouse Dhml protein - mouse 0.0 

08.01 nuclear transport (S. cerevisiae, YOR048c] le-123 

04.01.04 rrna processing [S. cerevisiae, YOR048cJ le-123 
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tFUNCATj 30.10 nuclear organization [s. cerevisiae, YOR048c] le-123 

(FUNCAT] 01.03.16 polynucleotide degradation fs. cerevisiae, yGL173c] 3e-79 

(FUNCATJ 30.03 organization of cytoplasm (S. cerevisiae, YGLlVScj 3e-79 

(FUNCAT) 03.22 cell cycle control and mitosis (S. cerevisiae, YGL173c} 3e-79 

(PIRKWJ nucleus le-126 

[PIRKW] hydrolase le-122 

(PIRKwj exoribonuclease le-122 

(PROSITE) MYRISTYL 7 

IPROSITEJ AMIDATION 2 

(PROSITE) CAMP_PHOSPHO_SITE 1 

(PROSITE] CK2_PH0SPH0_SITE 12 

(PROSITE) TYR_PHOSPHO_SITE 1 

(PROSITE) GLYCOSAMINOGLYCAN 1 

(PROSITE) PKC_PHOSPHO_SITE 8 

(PROSITEl - ASN_GLYCOSYLATION 4 

(KWJ TRANSMEMBRANE 1 

(KWJ LOW COMPLEXITY 6.21 % 


SEQ MGVPAFFRWLSRKYPSIIVNCVEEKPKECNGVKIPVDASKPNPNDVEFDNLYLDMNGIIH 

SEG 

PRD cccchhhhhhhhhcceeeeeeecccccccccccccccccccccccccccceeeeccceee 

MEM 

SEO PCTHPEDKPAPKNEDEMMVAI FEYI DRLFS I VRPRRLL YMAI DGVAPRAKMNQQRSRRFR 

SEG 

PRD ccccccccccccchhhhhhhhhhhhhhhhhhhhcceeeeeeeccccchhhhhhhhhhhhh 

MEM 

SEO ASKEGMEAAVEKQRVREEILAKGGFLPPEEIKERFDSNCITPGTEFMDNLAKCLRYYIAD 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhcccccccccccccccccccccchhhhhhhhhhhhhhhh 

MEM 

SEQ RLNNDPGWKNLTVILSDASAPGEGEHKIMDYIRRQRAQPNHDPNTHHCLCGADADLIMLG 

SEG 

PRD hcccccccceeeeeeeccccccccchhhhhhhhhhhhccccccccccccccccccceeec 

MEM 

SEQ LATHEPNFTIIREEFKPNKPKPCGLCNQFGHEVKDCEGLPREKKGKHDELADSLPCAEGE 

SEG 

PRD ccccccccccccccccccccccceeeccccccccccccccchhhhhhhhhcccccccccc 

MEM 

SEQ FIFLRLNVLREYLERELTMASLPFTFDVERSIDDWVFMCFFVGNDFLPHLPSLEIRENAI 

SEG 

PRD ccchhhhhhhhhhhhhhhhhhhchhhhhhhhhhhheeeeeeccccccccccccccchhhh 

MEM MMMMMMMMMMMMMMMMMMM 

SEQ DRLVNI YKNVVHKTGGYLTESGYVNLQRVQMIMLAVGEVEDSI FKKRKDDEDSFRRRQKE 

SEG xxxxxx 

PRD hhhhhhhhhhhcccccccccccchhhhhhhhhhhhhhcchhhhhhhhhhhhhhhhhhhhh 

MEM 


SEQ KRKRMKRDQPAFTPSGILTPHALGSRNSPGSQVASNPRQAAYEMRMQNNSSPSISPNTSF 

SEG xxxxxxx xxxxxxxxxxxxx 

PRD hhhhhhhhcccccccccccccccccccccccchhhhhhhhhhhhhhhccccccccccccc 

MEM 

SEQ TSDGSPSPLGGIKRKAEDSDSEPEPEDNVRLWEAGWKQRYYKNKFDVDAADEKFRRKVVQ 

SEG XX xxxxxxxxxxx 1 

PRD ccccccccchhhhhhhhcccccccchhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhh 

MEM 

SEQ SYVEGLCWVLRYYYQGCASWKWYYPFHYAPFASDFEGIADMPSDFEKGTKPFKPLEQLMG 

SEG 

PRD hhhhhhheeeeeeccccccccccccccccccccccccccccccccccccccccchhhhhh 

MEM 

SEQ VFPAASGNFLPPSWRKLMSDPDSSIIDFYPEDFAIDLNGKKYAWQGVALLPFVDERRLRA 

SEG 

PRD hccccccccccccccccccccccceeeccccceeeccccccceeeeeeeeeccchhhlihh 

MEM 

SEQ ALEEVYPDLTPEETRRNSLGGDVLFVGKHHPLHDFILELYQTGSTEPVEVPPELCHGIQG 

SEG 

PRD hhhhhccccchhhhhhcccccceeeeeecccchhhhhhhhhcccccceeecccccccccc 

MEM 

SEQ KFSLDEEAILPDQIVCSPVPMLRDLTQNTWSINFKDPQFAEDYIFKAVMLPGARKPAAV 

SEG 
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SEQ LKPSDWEKSSNGRQWKPQLGFNRDRRPVHLDQAAFRTLGHVMPRGSGTGIV 


ggQ ><""^ v^v^i "^"^^i^^vnijuyAAtKTLLiMVMPRGSGTGiySNAAPPPVT 

PRD eccccccccccccc^ccccccccccccccchhhhhhhhhhccccccci^cccccccccccc 


SEQ YQGNLYRPLLRGQAQIPKLMSNMRPQDSWRGPPPLFQQQRFDRGVGAEPLLPWNRMLQTQ 

M^ 

SEQ NAAFQPNQYQMLAGPGGYPPRRDDRGGRQGYPREGRKYPLPPPSGRy^ 


. xxxxxxxxxxxxxxxxxxxx 

ncccccccceei 

MEM 


PRD *»cccccccceeeccccccccccc"cccccccc^ccccc^ 


PSOOOOl 

PSOOOOl 

PSOOOOl 

PSOOOOl 

PS00002 

PS00004 

PS00005 

PS00005 

PS00005 

PS00005 

PS0Q005 

PS00005 

PS00005 

PS00005 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00007 

PS00008 

psooood 

PS00008 
P500008 
PS00008 
PS00008 
PS00008 
PS00009 
PS00009 


Prosite for DKF2phtes3_2ml8.3 

190->194 ASN^GLYCOSYLATION 

247->251 ASN_GLYCOSYLATI0N 

468->472 ASN_GLYCOSYLATION 

477->48l ASN_GLYC0SYLATION 

826->830 GLYCOSAMINOGLYCAN 

675->679 CAMP_PHOSPHO_SITE 
1 1- >1 4 PKC_PHOSPHO_SITE 

1 1 6->ll 9 PKC_PHOSPH0_SITE 

413->416 PKC_PHOSPH0_SITE 

559->562 PKC_PHOSPHO_SITE 

613->616 PKC_PHOSPHO_SITE 

674->677 PKC_PHOSPHO SITE 

868->871 PKC_PHOSPHO SITE 

944->947 PKC_PHOS PHO_^S I TE 
63->67 CK2_PH0SPH0_SITE 
331->335 CK2_PHOSPHO SITE 
499'>503 CK2_PH0SPH0~SITE 
501->505 CK2_PH0SPH02SITE 

541->545 CK2_PH0SPH0 SITE 

573->577 CK2_PH0SPHO""SITE 

583->587 CK2_PHOSPHO^SITE 

619->623 CK2_PH0SPHO_SITB 

624 -> 62 8 CK2PHOSPHO_SITE 

670->674 CK2_PHOSPHO_SITE 

723->727 CK2_PHOSPHO SITE 

784->788 CK2_PH0SPH0~SITE 

659->667 TYR_PHOSPHO SITE 

125->131 MYRISTYL 

375->381 MYRISTYL 

450->456 MYRISTYL 

600->606 MYRISTYL 

825->831 MYRISTYL 

829->835 MYRISTYL 

926->932 MYRISTYL 

638->642 AMIDATION 

934->938 AMIDATION 


PDOCOOOOl 
PDOCOOOOl 

PDOCOOOOl 
PDOCOOOOl 
PDOC00002 
PDOC00004 
PDOC00005 
PDOCOOOOS 
PDOC00005 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOC00006 
PDOC00006 
PDOC00006 
PDOCOOOOS 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOC00007 
PDOC00008 
PDOC00008 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOC00009 
PDOC00009 


(No Pfara data available for DKFZphtes3_2ml8 . 3) 
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DKFZphtes3_2in20 


group: testes derived 

DKFZphtes3_2ra20 encodes a novel 183 amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes. 


group: unknown 

DKF2phtes3_2m20 encodes a novel 

amino acid protein without similarity to known proteins. 

NO informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 


unknown 

EST hits are only from testis or uterus librarys 
remaining intron in3' UTR see EST-BLAST 

Sequenced by EMBL 

Locus : unknown 

Insert length: 1341 bp 

Poly A stretch at pos. 1320, polyadenylation signal at pos. 1300 


1 GCAATCCAGG AGCTGAATGG TAACTCTTCC ACAAGCGAAA ACTGTTCGTG 
51 AATACAAGCA AAAGGCCCCC CAAGAGGACC CCTGATATGA TCCAGCAGCC 
101 TCGGGCCCCG CTGGTGTTGG AGAAGGCTTC TGGTGAAGGA TTTGGCAAAA 
151 CCGCCGCTAT TATACAGCTC GCTCCTAAAG CTCCTGTTGA CCTGTGTGAG 
201 ACAGAGAAAC TGAGGGCAGC CTTCTTTGCA GTCCCGTTGG AAATGAGAGG 
251 GTCCTTCCTG GTGCTGCTCC TGAGGGAATG CTTCCGAGAC CTG.AGCTGGC 
301 TGGCACTCAT CCATAGCGTC CGTGGGGAGG CGGGGCTGCT GGTGACGAGT 
351 ATCGTCCCGA AGACCCCGTT TTTCTGGGCC ATGCACATCA CTGAGGCTCT 
401 GCACCAGAAC ATGCAGGCTC TGTTTAGCAC CCTGGCTCAG GCGGAGGAGC 
4 51 AGCAGCCCTA CCTGGAGGCT CCACCGTTAT GCGCGGGACT CGCTGTCTGG 
501 CAGAGTACCA CCTGGGGGAT TATGGACACG CCTGGAACAG GTGTTGGGTG 
551 CTGGACAGGG TGGACACCTG GGCTGTGGTC ATGTTCATTG ATTTTGGACA 
601 GTTGGCCACC ATCCCTGTGC AGTCTCTGCG CCAGCTAGAC AGCGACGACT 
651 TCTGGACCAT CCCACCCCTG ACTCAGCCAT TCATGCTGGA GAAAGACATT 
701 TTGAGTTCGT ATGAGGTTGT CCATCGAATC CTCAAAGGGA AAATCACTGG 
751 TGCTTTGAAC TCGGCGGTAA CTGCTCCTGC ATCTAACTTG GCTGTTGTCC 
801 CTCCACTCCT GCCCTTGGGG TGTCTGCAGC AGGCTGCTGC CTAGGCCTGG 
851 ACACATTGCA CATCCTAAAG TTTGAAGAGT CTAAATAACG GGGCTTCCCT 
901 CAGCATGTTC CCTCTCCTGT TTGCCACGGA TCCAGAGCCA CCTGCCCTGT 
951 CTTCTCGTAC CCCTTTCACT CTTGAGGCCT GGGAGGTGAA AAAGGCCAGA 
1001 CTGTGCCCAG GATTGATTCA ATTTTGCTTT TACTCCCAGC TTCCCTCTCA 
1051 AAAGAGAGTG AAGTCTCATT TGTCATGTGT CTTCAGTTCC CCAACTTGGC 
1101 ATGAACATTT GAACCAAACA TAGGAAACTA CCATTAGGTT GAAAGCCTGA 
1151 GGCAGCTGGG ATGGTCTTTC TTGTGTCTCT TCTTTGCACC CCAGAGCATG 
1201 ATATAAGTGG TCCTAACAGA TTCTGGATAA TGGAGAAGCC CTCTGCTGGT 
1251 TTTCCTGGCA TTCCATGTAG AATAGGTAGA GAATATTTAA CCAATGAGCA 
1301 AATAAATGTT GGCATGTTTC ATGAAAAAAA AAAAAAAAAA A 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


812 


wo 01/12659 


PCT/IBOO/01496 


Peptide information for frame 2 


ORF from 479 bp to 841 bp; peptide length: 121 
Category: questionable ORF 
Classification: no clue 

1 MRGTRCLAEY HLGDYGHAWN RCWVLDRVDT WAVVMFIDFG QLATIPVQSL 
51 RQLDSDDFWT IPPLTQPFML EKDILSSYEV VHRILKGKIT GALNSAVTAP 
101 ASNLAVVPPL LPLGCLQQAA A 


BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2ni20, frame 2 
No Alert BLASTP hits found 

Peptide information for frame 3 


ORF from 87 bp to 635 bp; peptide length: 183 
Category: putative protein 
Classification: no clue 

1 MIQQPRAPLV LEKASGEGFG KTAAIIQLAP KAPVDLCETE KLRAAFFAVP 
51 LEMRGSFLVL LLRECFRDLS WLALIHSVRG EAGLLVTSIV PKTPFFWAMH 
101 ITEALHQNMQ ALFSTLAQAE EQQPYLEAPP LCAGLAVWQS TTWGIMDTPG 
151 TGVGCWTGWT PGLWSCSLIL DSWPPSLCSL CAS 


BLASTP hits 

NO BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2m20, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_2m20, frame 2 

Report for DKFZphtes3_2m20.2 

(LENGTH] 121 

[MW] 13436.69 

Ipll 5.81 

[KWl Alpha_Beta 

SEQ MRGTRCLAEYHLGDYGHAWNRCWVLDRVDTWAVVMFI DFGQLATIPVQSLRQLDSDDFWT 
PRD ccchhhhhcccccccccccceeeecccccccceeeeeecccccccccccccccccccccc 

SEQ IPPLTQPFMLEKDILSSYEVVHRILKGKITGALNSAVTAPASNLAVVPPLLPLGCLQQAA 
PRD cccccchhhhhhhcchhhhhhhhhhcccccchhhhhhcccccceeeeccccccccccccc 

SEQ A 
PRD c 

(No Prosite data available for DKF2phtes3_2m20.2) 
(No Pfam data available for DKFZphtes3_2m20.2) 

Pedant information for DKFZphtes3_2m20, frame 3 

Report for DKFZphtes3_2ra20 .3 

[LENGTH] 183 

[MW] 19971.49 

[pl3 5.31 

[KWJ Alpha_Beta 
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SEQ MIQQPRAPLVLEKASGEGFGKTAAIIQLAPKAPVDLCETEKLRAAFFAVPLEMRGSFLVL 

PRD ccccccccceeeecccccccccccccccccccccchhhhhhhhhhhhhhhhhcchhhhhh 

SEQ LLRECFRDLSWLALIHSVRGEAGLLVTSIVPKTPFFWAMHITEALHQNMQALFSTLAQAE 

PRD hhhhhhcchhhhhhhhhhcccceeeeeeeccccchhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ EQQPYLEAPPLCAGLAVWQSTTWGIMDTPGTGVGCWTGWTPGLWSCSLILDSWPPSLCSL 

PRD hhhcccccccccccceeeecccceeecccccccccccccccccccceeeeccccccceee 

SEQ CAS 

PRD CCC 


(No Prosite data available for DKFZphtes3__2m20.3> 
(No Pfam data available for DKFZphtes3_2m20.3) 
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group: testes derived 

DKF2phtes3_2n9 encodes a novel 184 amino acid protein with very weak similarity to Homo 
sapiens PAC clone DJ0771P04 from 7qll .21-qll .23 . 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 


unknown 

on genomic level encoded by HS1186N24, no splice pattern but EST 
matches 

Sequenced by EMBL 

Locus : unknown 

Insert length: 1000 bp 

Poly A stretch at pos. 988, polyadenylation signal at pos. 970 


1 CAACTTTTTA AAGATGTGAA TTGGACAGCC AGACTTGCTT ATTTGTCTGA 
51 TATCTTCAGT ATTTTTTAAT GATCTTAATG CTTCTATGCA AGGGAAGAAT 
101 GCAACTTATT TTTCAATGGC AGATAAAGTT GAAGGACAAA AACAGAAGTT 
151 AGAAGCTTGG AAAAACAGAA TTTCTACAGA TTGTTATGAC ATGTTTCATA 
201 ATTTAACAAC AATTATCAAT GAAGTAGGTA ATGATCTTGA TATTGCACAT 
251 CTGCGAAAAG TTATCAGTGA ACATCTTACA AATTTGTTAG AATGTTTTGA 
301 ATTTTATTTT CCATCAAAAG AAGATCCACG CATAGGAAAT TTGTGGATCC 
351 AAAATCCATT TCTTTCATCA AAAGATAACT TAAATTTAAC TGTAACTCTA 
401 CAGGATAAGT TGTTGAAGCT GGCTACCGAC GAAGGATTGA AAATCAGTTT 
451 TGAAAATACA GCATCACTTC CTTCATTTTG GATAAAAGCT AAAAATGACT 
501 ATCCTGAGCT TGCTGAGATT GCTTTAAAAT TGCTGCTTCT TTTCCCCTCA 
551 ACATACCTCT GTGAGACCGG ATTCTCTACT TTAAGTGTTA TTAAAACAAA 
601 ACATAGAAAC AGTTTAAATA TACATTATCC CCTGAGGTAG CATTGTCATC 
651 AATCCAACCT AGATTAGACA AATTAACAAG CAAGAAGCAA GCTCACTTAT 
701 CACATTAAAA GCTTTAAATA TTGATATGTA AGGTATTGGT TCAAAGTATG 
751 CATATAAGCA TTGAGTGTGA GGAATTTGCT ATTTCACTTT AAACTTTCTG 
801 TCTAGTTACA GTTATGGAAG TATGAGAAGT TATGAGTGAA ACAGCAATTT 
851 TCTATATAAA TTGCCTATAT GTATATTTTC TUITTAAGAAT GTGTACAGTT 
901 TTTATAATTC TATTTTTCCT CATATTTGTC GTATTTATTA AAATATAATT 
951 TTAAATCTGT TGATTCTAAT ATTAAAACAT TTGATCTTAA AAAAAAAAAA 


BLAST Results 


Entry HS1186N24 from database EMBLNEW: 

Human DNA sequence *** SEQUENCING IN PROGRESS *** from clone 1186N24 
Score - 4921, P • 5.8e-215, identities = 989/992 


Medline entries 


No Medline entry 


Peptide information for frame 2 


ORF from 86 bp to 637 bp; peptide length: 184 
Category: similarity to unknown protein 
Classification: no clue 

1 MQGKNATYFS MADKVEGQKQ KLEAWKNRIS TDCYDMFHNL TTIINEVGND 
51 LDIAHLRKVI SEHLTNLLEC FEFYFPSKED PRIGNLWIQN PFLSSKDNLN 
101 LTVTLQDKLL KLATDEGLKI SFENTASLPS FWIKAKNDYP ELAEIALKLL 
151 LLFPSTYLCE TGFSTLSVIK TKHRNSLNIH YPLR 


BLASTP hits 
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No BLASTP hits available 

Alert BLASTP hits for DKF2phtes3_2n9, frame 2 

TREMBLNEW:AC004883_3 gene: "WUGSC:H_DJ0771P04 .2"; Homo sapiens PAC 
clone DJ0771P04 from 7qll.21-qll.23, complete sequence., N = 1, Score « 
94, P = 0.042 


>TREMBLNEW:AC004883_3 gene: "WUGSC:H_DJ0771P04 . 2"; Homo sapiens PAC clone 
DJ0771P04 from 7qii .21-qll .23, complete sequence. 
Length 533 

HSPs : 

Score » 94 (14.1 bits). Expect = 4.3e-02, P = 4.2e-02 
Identities = 39/177 (22%), Positives = 75/177 (42%) 

Query: 1 MQGKNATYFSMADKVEGQKQKLEAWKNRISTDCYDMFHNLTTIINEVGNDLD-IAHLRKV 59 

+QG + M D + KL W+ ++ + F L + L+ I + ++ 

Sbjct: 354 LQGHSQIVTQMYDLIRAFLAKLCLWETHLTRNNLAHFPTLKLASRNESDGLNYIPKIAEL 413 

Query: 60 ISEHLTNLLECFErYFPSKEDPRIGNLWIQNPFLSSKDNLNLTVTLQDKLLKLATDEGLK 119 

+E L + F+ Y + + + +PF + D+++ LQ +++ L + LK 

Sbjct: 414 KTEFQKRLSD-FKLY ESELTL FSSPFSTKIDSVH--EELC^EVIDLQCNTVLK 463 

Query: 120 ISFENTASLPSFWIKAKNDYPXXXXXXXXXXXXFPSTYLCETGFSTLSVIKTKHRNSL 177 

++ +P F+ YP F STY+CE FS + + KTK+ + L 

Sbjct: 464 TKYDKVG-IPEFYKYLWGSYPKYKHHCAKILSMFGSTYICEQLFSIMKLSKTKYCSQL 520 


Pedant information for DKrzphtes3_2n9, frame 2 


Report for DKFZphtes3_2n9 . 2 


[LENGTH! 184 

fMW] 21203.53 

[pl] 6.52 

[KW) Alpha_Beta 

[KW] LOW COMPLEXITY 6.52 % 


SEQ mqgknatyfsmadkvegqkqkleawknristdcydmfhnlttiinevgndldiahlrkvi 

SEG 

PRD ccccccchhhhhhhhhhhhhhhhhhhhhhcchhhhhcccceeecccccccchhhhhhhhh 

SEQ sehltnllecfefyfpskedprignlwiqnpflsskdnlnltvtlqdkllklatdeglki 

SEG 

PRD hhhhhhhhhhhhcccccccccccceeeeccccccccccceeeeehhhhhhhhhhhcccee 

SEQ sfentaslpsfwikakndypelaeialkllllfpstylcetgfstlsviktkhrnslnih 

SEG xxxxxxxxxxxx 

PRD eecccccccceeeeecccchhhhhhhhhhhhhcccccccccccceeeeeecccccceeec 

SEQ YPLR 

SEG 

PRD CCCC 


(No Prosite data available for DKFZphtes3_2n9.2) 
(No Pfam data available for DKFZphte53_2n9.2) 
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DKFZphtes3_30f4 


PCT/IB00/0I496 


group: testes derived 

DKFZphtes3_30f4 encodes a novel 192 amino acid protein without similarity to known proteins. 

No informative BLAST results; Ko predictive prosite, pfam or SCOP mot if e. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 

unknown 

Sequenced by LMU 

Locus: /map=-717.2-8 cR from top of Chr8 linkage group" 

Insert length: 1388 bp 

Poly A stretch at pos. 1330, polyadenylation signal at pos. 1310 

1 CACTGAGCCC TCCTCAGATG GTTAGTGGCT TCCAACAGCC ATCAGGAGTG 
51 TTTCTTGAAT GCCCCAGGTG TGGAGGACTT GGTCTGTGAC CACCTAGAAC 

101 CCCAGAGCTG AACAGGAAGC CGTCCCTGCA GCAACAAGAG GGCTGGAAGG 

151 GGGAGCTGCA GGCCACCCTC GGCTCTCCCA CTGCTGGGGC GGTGATGTTC 

201 GGGTGACATG TTTGAAAAAT ACTCTTAAAG ATACCAACTG TTCCCTTATA 

251 TGGCTAATGG TTTGTGCAGC CACCAGCGAT GGCGGCCCCT ATTAGAGACC 

301 AGGTTTGTTA AAACACCAAA TATTGCTGTC CACACTAGAC ATTAACCGGC 

351 TTCAGAAAAG ATGGACACCT TTTCCCACGC TGTTTCGCTT CTTAACTTTG 

401 GTCCAGCTTT AGCCACCACA CAGCGTGTGA GGGACTGCTG CTGCGGAGTC 

451 AGCCTCGTTT GTCCCTCCGC CTCCCACCAG CATGCGCCGC TTCTGAGAGA 

501 CACCAGCTCC CTGCCTCCAA GCCTGGTGCC ACAGGCCTGT CGTGAGGGAC 

551 CCCTGCTTCC GAGAGCTCCT GGGGGGGTTC TGCCCTTCAC CACCTGGGAG 

601 AGGTGTCAGT TCAGTTCCGA GTTGAACAAG GCCCGTGCAC ACAGCATGTT 

651 GGGGGCCCAG CCCAAAGTTC TTGTCACCTC CTCATGCAAA GCCAGCCATC 

701 ACCCTCCGGC CAGAGCTCAA GGTGGCCCCT TGGCCAGCCC CTCCTTGGGT 

751 CCTCCAGGAG GACTGAGCAC CCCTCCTAGC GGCATCCCTT GCCCTCCACA 

801 GTGCTGCCAG GGGCACGTCG CTCTGTGCCG TGGACTGAGA CCATCCCCTG 

851 GTGACAGAAT GACCCGTTTG TTGGAAATGC CTCGTTGCCA GAGAAACTCC 

901 CCAGGCATCT CGGAACGAAA CTATTTAGTT CCATTGTGAA CTGGCCACGG 

951 GACAGCTTTT TATCAACTTA TTAAGTTGGA GCACTGTAAT CGCGCTTGCT 
1001 GAGTTAGCAG TGGTGGTAAG CGTGTGTTAA ACACATAATG TTACGTTTTA 
1051 GGAGAGAGAG GTCGTAAGGA AGTGTCGTGT CGCTCATGAC TCTCTTCTAT 
1101 TAGTTGGGTA ACAGTGGCCT CATGTTTGTG TCTGTGTGTA CACAGAGCCC 
1151* TTAGGTTCTG CTCTGTTTCT TTGCCAGGTG AATGTTTGTG GCATGCGCTG 
1201 CTGTCCGCGC CCCTCTGTCC TGCGCAGGGT TCAGCTGTGC GGCGCCCTGA 
1251 TTTCCTCCAT GCACACAGAA CCTCCTTGTG TCTGTTTCTC TGTTCCTCTG 
1301 TGGCTGACTC AATAAACTTT TCCCTCTGAC ATGAAAAAAA AAAAAAAAAA 
1351 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAG 

BLAST Results 


Entry HS548358 from database EMBL: 
human STS EST67250. 
Score = 2126, P = 1.5e-89, identities - 444/472 

Entry HS670351 from database EMBL: 
human STS WI-18501. 
Score - 2089, P = 7.1e-88, identities = 445/476 


Medline entries 

No Medline entry 

Peptide information for frame 1 


ORF from 361 bp to 936 bp; peptide length: 192 
Category: putative protein 
Classification: no clue 
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1 MDTFSHAVSL LNFGPALATT QRVRDCCCGV SLVCPSASHQ HAPLLRDTSS 
51 LPPSLVPQAC REGPLLPRAP GGVLPFTTWE RCQFSSELNK ARAHSMLGAQ 
101 PKVLVTSSCK ASHHPPARAQ GGPLASPSLG PPGGLSTPPS GIPCPPQCCQ 
151 GHVALCRGLR PSPGDRMTRL LEMPRCQRNS PGISERNYLV PL 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_30f 4, frame 1 
No Alert -BLASTP hits found 

Pedant information for DKFZphtes3_30f 4, frame 1 


Report for DKF2phtes3_30f 4 . 1 


[LENGTH] 192 

[MWJ 20281.56 

[pi] 9.21 

(BLOCKS] BL01013C Oxys terol-binding protein family proteins 

tKW] All^Alpha 

[KW] LOW COMPLEXITY 10.94 « 


SEQ MDTFSHAVSLLNFGPALATTQRVRDCCCGVSLVCPSASHQHAPLLRDTSSLPPSLVPQAC 

SEG 

PRD ccchhhhheeecccccchhhhhhhhcccceeeeccccccccccccccccccccccccccc 

SEQ REGPLLPRAPGGVLPFTTWERCQFSSELNKARAHSMLGAQPKVLVTSSCKASHHPPARAQ 

SEG 

PRD cccccccccccccccccccccchhhhhhhhhhhhhhccccceeeeecccccccccccccc 

SEQ GGPLASPSLGPPGGLSTPPSGI PCPPQCCQGHVALCRGLRPSPGDRMTRLLEMPRCQRNS 

SEG xxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccchhhhhhhhcccccccchhhhhccccccccc 

SEQ PGISERNYLVPL 

SEG 

PRD cccccccccccc 


(No Prosite data available for DKFZphtes3_30f4 .1) 
(No Pfam data available for DKFZphtes3_30f4 .1) 
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DKFZphtes3_35b4 


PCT/IBOO/01496 


group: cell cycle 

DKFZphtes3_35b4 encodes a novel 1780 amino acid protein which is C-terminal identical to human 
M-phase phosphoprotein-1 (MPPl) . 

The novel protein contains a N-terminal Pfam kinesin motor domain and a ATP/GTP-binding site 
motif A (P-loop) . MPPl is expressed and phosphorylated in the metaphase. Therefore the novel 
protein seems to be involved in the mitotic spindle during cell division. 

The new protein can find application in modulation of the mitotic spindle. 


"M-phase phosphoprotein-1" extension 
motor protein 
Sequenced by DKFZ 

Locus: /map="750_H_l; 758_H_7; 759_C_9; 847_D_4; 906_D_1; 931_D_3; 944 C 1; 750 G 12; 
800_A_11; 512.1 cR from top of ChrlO linkage group" 

Insert length: 6284 bp 

No poly A stretch found, no polyadenylation signal found 


1 ATCGCAGTGC TGCTCGCGGG TCTGGCTAGT CAGGCGAAGT TTGCAGAATG 
51 GAATCTAATT TTAATCAAGA GGGAGTACCT CGACCATCTT ATGTTTTTAG 
101 TGCTGACCCA ATTGCAAGGC CTTCAGAAAT AAATTTCGAT GGCATTAAGC 
151 TTGATCTGTC TCATGAATTT TCCTTAGTTG CTCCAAATAC TGAGGCAAAC 
201 AGTTTCGAAT CTAAAGATTA TCTCCAGGTT TGTCTTCGAA TAAGACCATT 
251 TACACAGTCA GAAAAAGAAC TTGAGTCTGA GGGCTGTGTG CATATTCTGG 
301 ATTCACAGAC TGTTGTGCTG AAAGAGCCTC AATGCATCCT TGGTCGGTTA 
351 AGTGAAAAAA GCTCAGGGCA GATGGCACAG AAATTCAGTT TTTCCAAGGT 
401 TTTTGGCCCA GCAACTACAC AGAAGGAATT CTTTCAGGGT TGCATTATGC 
451 AACCAGTAAA AGACCTCTTG AAAGGACAGA GTCGTCTGAT TTTTACTTAC 
501 GGGCTAACCA ATTCAGGAAA AACATATACA TTTCAAGGGA CAGT^GAAAA 
551 TATTGGCATT CTGCCTCGAA CTTTGAATGT ATTATTTGAT AGTCTTCAAG 
601 AAAGACTGTA TACAAAGATG AACCTTAAAC CACATAGATC CAGAGAATAC 
651 TTAAGGTTAT CATCAGAACA AGAGAAAGAA GAAATTGCTA GCAAAAGTGC 
701 ATTGCTTCGG CAAATTAAAG AGGTTACTGT GCATAATGAT AGTGATGATA 
751 CTCTTTATGG AAGTTTAACT AACTCTTTGA ATATCTCAGA GTTTGAAGAA 
801 TCCATAAAAG ATTATGAACA AGCCAACTTG AATATGGCTA ATAGTATiiAA 
851 ATTTTCTGTG TGGGTTTCTT TCTTTGAAAT TTACAATGAA TATATTTATG 
901 ACTTATTTGT TCCTGTATCA TCTAAATTCC AAAAGAGAAA GATGCTGCGC 
951 CTTTCCCAAG ACGTAAAGGG CTATTCTTTT ATAAAAGATC TACAATGGAT 
1001 TCAAGTATCT GATTCCAAAG AAGCCTATAG ACTTTTAAAA CTAGGAATAA 
1051 AGCACCAGAG TGTTGCCTTC ACAAAATTGA ATAATGCTTC CAGTAGAAGT 
1101 CACAGCATAT TCACTGTTAA AATATTACAG ATTGAAGATT CTGAAATGTC 
1151 TCGTGTAATT CGAGTCAGTG AATTATCTTT ATGTGATCTT GCTGGTTCAG 
1201 AACGAACTAT GAAGACACAG AATGAAGGTG AAAGGTTAAG AGAGACTGGG 
1251 AATATCAACA CTTCTTTATT GACTCTGGGA AAGTGTATTA ACGTCTTGAA 
1301 GAATAGTGAA AAGTCAAAGT TTCAACAGCA TGTGCCTTTC CGGGAAAGTA 
1351 AACTGACTCA CTATTTTCAA AGTTTTTTTA ATGGTAAAGG GAAAATTTGT 
1401 ATGATTGTCA ATATCAGCCA ATGTTATTTA GCCTATGATG AAACACTCAA 
14 51 TGTATTGAAG TTCTCCGCCA TTGCACAAAA AGTTTGTGTC CCAGACACTT 
1501 TAAATTCCTC TCAAGATAAA TTATTTGGAC CTGTCAAATC TTCTCAAGAT 
1551 GTATCACTAG ACAGTAATTC AAACAGTAAA ATATTAAATG TAAAAAGAGC 
1601 CACCATTTCA TGGGAAAATA GTCTAGAAGA TTTGATGGAA GACGAGGATT 
1651 TGGTTGAGGA GCTAGAAAAC GCTGAAGAAA CTCAAAATGT GGAAACTAAA 
1701 CTTCTTGATG AAGATCTAGA TAAAACATTA GAGGAAAATA AGGCTTTCAT 
1751 TAGCCACGAG GAGAAAAGAA AACTGTTGGA CTTAATAGAA GACTTGAAAA 
1801 AAAAACTGAT AAATG/UVAAA AAGGAAAAAT TAACCTTGGA ATTTAAAATT 
1851 CGAGAAGAAG TTACACAGGA GTTTACTCAG TATTGGGCTC AACGGGAAGC 
1901 TGACTTTAAG GAGACTCTGC TTCAAGAACG AGAGATATTA GAAGAAAATG 
1951 CTGAACGTCG TTTGGCTATC TTCAAGGATT TGGTTGGT7VA ATGTGACACT 
2001 CGAGAAGAAG CAGCGAAAGA CATTTGTGCC ACAAAAGTTG AAACTGAAGA 
2051 AGCTACTGCT TGTTTAGAAC TAAAGTTTAA TCAAATTAAA GCTGAATTAG 
2101 CTAAAACCAA AGGAGAATTA ATCAAAACCA AAGAAGAGTT AAAAAAGAGA 
2151 GAAAATGAAT CAGATTCATT GATTCAAGAG CTTGAGACAT CTAATAAGAA 
2201 AATAATTACA CAGAATCAAA GAATTAAAGA ATTGATAAAT ATAATTGATC 
2251 AAAAAGAAGA TACTATCAAC GAATTTCAGA ACCTAAAGTC TCATATGGAA 
2301 AACACATTTA AATGCAATGA CAAGGCTGAT ACATCTTCTT TAATAATAAA 
2351 CAATAAATTG ATTTGTAATG AAACAGTTGA AGTACCTAAG GACAGCAAAT 
2401 CTAAAATCTG TTCAGAAAGA AAAAGAGTAA ATGAAAATGA ACTTCAGCAA 
2451 GATGAACCAC CAGCAAAGAA AGGGTCTATC CATGTTAGTT CAGCTATCAC 
2501 TGAAGACCAA AAGAAAAGTG AAGAAGTGCG ACCGAACATT GCAGAAATTG 
2551 AAGACATCAG AGTTTTACAA GAAAATAATG AAGGACTGAG AGCATTTTTA 


819 


wo 01/12659 


PCT/IBOO/01496 


2601 CTCACTATTG AGAATGAACT TAAAAATGAA AAGGAAGAAA AAGCAGAATT 
2651 AAATAAACAG ATTGTTCATT TTCAGCAGGA ACTTTCTCTT TCTGAAAAAA 
2701 AGAATTTAAC TTTAAGTAAA GAGGTCCAAC AAATTCAGTC AAATTATGAT 
2751 ATTGCAATTG CTGAATTACA TGTGCAGAAA AGTAAAAATC AAGAACAGGA 
2801 GGAAAAGATC ATGAAATTGT CAAATGAGAT AGAAACTGCT ACAAGAAGCA 
2851 TTACAAATAA TGTTTCACAA ATAAAATTAA TGCACACGAA AATAGACGAA 
2901 CTACGTACTC TTGATTCAGT TTCTCAGATT TCAAACATAG ATTTGCTCAA 
2951 TCTCAGGGAT CTGTCAAATG GTTCTGAGGA GGATAATTTG CCAAATACAC 
3001 AGTTAGACCT TTTAGGTAAT GATTATTTGG TAAGTAAGCA AGTTAAAGAA 
3051 TATCGAATTC AAGAACCCAA TAGGGAAAAT TCTTTCCACT CTAGTATTGA 
3101 AGCTATTTGG GAAGAATGTA AAGAGATTGT GAAGGCCTCT TCCAAAAAAA 
3151 GTCATCAGAT TGAGGAACTG GAACAACAAA TTGAAAAATT GCAGGCAGAA 
3201 GTAAAAGGCT ATAAGGATGA AAACAATAGA CTAAAGGAGA AGGAGCATAA 
3251 AAACCAAGAT GACCTACTAA AAGAAAAAGA AACTCTTATA CAGCAGCTGA 
3301 AAGAAGAATT GCAAGAAAAA AATGTTACTC TTGATGTTCA AATACAGCAT 
3351 GTAGTTGAAG GAAAGAGAGC GCTTTCAGAA CTTACACAAG GT6TTACTTG 
3401 CTATAAGGCA AAAATAAAGG AACTTGAAAC AATTTTAGAG ACTCAGAAAG 
34 51 TTGAACGTAG TCATTCAGCC AAGTTAGAAC AAGACATTTT GGAAAAGGAA 
3501 TCTATCATCT TAAAGCTAGA AAGAAATTTG AAGGAATTTC AAGAACATCT 
3551 TCAGGATTCT GTCAAAAACA CCAAAGATTT AAATGTAAAG GAACTCAAGC 
3601 TGAAAGAAGA AATCACACAG TTAACAAATA ATTTGCAAGA TATGAAACAT 
3651 TTACTTCAAT TAAAAGAAGA AGAAGAAGAA ACCAACAGGC AAGAAACAGA 
3701 AAAATTGAAA GAGGAACTCT CTGCAAGCTC TGCTCGTACC CAGAATCTGA 
3751 AAGCAGATCT TCAGAGGAAG GAAGAAGATT ATGCTGACCT GAAAGAGAAA 
3801 CTGACTGATG CCAAAAAGCA GATTAAGCAA GTACAGAAAG AGGTATCTGT 
3851 AATGCGTGAT GAGGATAAAT TACTGAGGAT TAAAATTAAT GAACTGGAGA 
3901 AAAAGAAAAA CCAGTGTTCT CAGGAATTAG ATATGAAGCA GCGAACCATT 
3951 CAGCAACTCA AGGAGCAGTT AAATAATCAG AAAGTGGAAG AAGCTATACA 
4001 ACAGTATGAG AGAGCATGCA AAGATCTAAA TGTTAAAGAG AAAATAATTG 
4051 AAGACATGCG AAT6ACACTA GAAGAACAGG AACAAACTCA GGTAGAACAG 
4101 GATCAAGTGC TTGAGGCTAA ATTAGAGGAA GTTGAAAGGC TGGCCACAGA 
4151 ATTGGAAAAA TGGAAGGAAA AATGCAATGA TTTGGAAACC AAAAACAATC 
4 201 AAAGGTCAAA TAAAGAACAT GAGAACAACA CAGATGTGCT TGGAAAGCTC 
4 251 ACTAATCTTC AAGATGAGTT ACAGGAGTCT GAACAGAAAT ATAATGCTGA 
4 301 TAGAAAGAAA TGGTTAGAAG AAAAAATGAT GCTTATCACT CAAGCGAAAG 
4 351 AAGCAGAGAA TATACGAAAT AAAGAGATGA AAAAATATGC TGAGGACAGG 
4401 GAGCGTTTTT TTAAGCAACA GAATGAAATG GAAATACTGA CAGCCCAGCT 
4 451 GACAGAGA7\A GATAGTGACC TTCAAAAGTG GCGAGAAGAA CGAGATCAAC 
4 501 TGGTTGCAGC TTTAGAAATA CAGCTAAAAG CACTGATATC CAGTT^TGTA 
4 551 CAGAAAGATA ATGAAATTGA ACAACTAAAA AGGATCATAT CAGAGACTTC 
4 601 TAAAATAGAA ACACAAATCA TGGATATCAA GCCCAAACGT ATTAGTTCAG 
4 651 CAGATCCTGA CAAACTTCAA ACTGAACCTC TATCGACAAG TTTTGAAATT 
4701 TCCAGAAATA AAATAGAGGA TGGATCTGTA GTCCTTGACT CTTGTGAAGT 
4751 GTCAACAGAA AATGATCAAA GCACTCGATT TCCAAAACCT GAGTTAGAGA 
4801 TTCAATTTAC ACCTTTACAG CCAAACAAAA TGGCAGTGAA ACACCCTGGT 
4851 TGTACCACAC CAGTGACAGT TGAGATTCCC AAGGCTCGGA AGAGGAAGAG 
4901 TAATGAAATG GAGGAGGACT TGGTGAAATG TGAAAATAAG AAGAATGCTA 
4951 CACCCAGAAC TAATTTGAAA TTTCCTATTT CAGATGATAG AAATTCTTCT 
5001 GTCAAAAAGG AACAAAAGGT TGCCATACGT CCATCATCTA AGAAAACATA 
5051 TTCTTTACGG AGTCAGGCAT CCATAATTGG TGTAAACCTG GCCACTAAGA 
5101 AAAAAGAAGG AACACTACAG AAATTTGGAG ACTTCTTACA ACATTCTCCC 
5151 TCAATTCTTC AATCAAAAGC AAAGAAGATA ATTGAAACAA TGAGCTCTTC 
5201 AAAGCTCTCA AATGTAGAAG CAAGTAAAGA AAATGTGTCT CAACCAAAAC 
5251 GAGCCAAACG GAAATTATAC ACAAGTGAAA TTTCATCTCC TATTGATATA 
5301 TCAGGCCAAG TGATTTTAAT GGACCAGAAA ATGAAGGAGA GTGATCACCA 
5351 GATTATCAAA CGACGACTTC GAACAAAAAC AGCCAAATAA ATCACTTATG 
5401 GAAATGTTTA ATATAAATTT TATAGTCATA GTCATTGGAA CTTGCATCCT 
5451 GTATTGTAAA TATAAATGTA TATATTATGC ATTAAATCAC TCTGCATATA 
5501 GATTGCTGTT TTATACATAG TATAATTTTA ATTCAATAAA TGAGTCAAAA 
5551 TTTGTATATT TTTATAAGGC TTTTTTATAA TAGCTTCTTT CAAACTGTAT 
5601 TTCCCTATTA TCTCAGACAT TGGATCAGTG AAGATCCTAG GAAAGAGGCT 
5651 GTTATTCTCA TTTATTTTGC TATACAGGAT GTAATAGGTC AGGTATTTGG 
5701 TTTACTTATA TTTAACAATG TCTTATGAAT TTTTTTTACT TTATCTGTTA 
5751 TACAACTGAT TTTACATATC TGTTTGGATT ATAGCTAGGA TTTGGAGAAT 
5801 AAGTGTGTAC AGATCACAAA ACATGTATAT ACATTATTTA GAAAAGATCT 
5851 CAAGTCTTTA ATTAGAATGT CTCACTTATT TTGTAAACAT TTTGTGGGTA 
5901 CATAGTACAT GTATATATTT ACGGGGTATG TGAGATGTTT TGACACAGGC 
5951 ATGCAATGTG T^TACGTGT ATCATGGAGA ATGAGGTATC CATCCCCTCA 
6001 AGCATTTTTC CTTTGAATTA CAGATAATCC AATTACATTC TTTAGATCAT 
6051 TTAAAAATAT ACAAGTAAGT TATTATTGAT TATAGTCACT CTATTGTGCT 
6101 ATCAGATAGT AGATCATTCT TTTTATCTTA TTTGTTTTTG TACCCATTAA 
6151 CCATCCCCAC CTCCCCCTGC AACCGTCAGT ACCCTTACCA GCCACTGGTA 
6201 ACCATTCTTC TACTCTGTAT GCCCATGAGG TCAATTGATT TTATTTTTAG 
6251 ATCCCATAAA TAAATGAGAA CATGCAAAAA AAAA 


BXJVST Results 


Entry HS898149 from database EMBL: 
human STS WI-9217. 
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Score = 4247, p = 1.5e-187, identities = 855/862 


Medline entries 


94119956: 

Cloning of cDNAs for M-phase phosphoproteins recognized 
by the MPM2 monoclonal antibody and determination of the 
phosphorylated epitope. 

98101856: 

Interaction of a Golgi-associated kinesin-like protein with 
Rab6 . 

95122643: 

Identification and partial characterization of mitotic 
centromere-associated klnesin, a 

kinesin-related protein that associates with centromeres during 
mitosis . 


Peptide information for frame 3 


ORF from 48 bp to 5387 bp; peptide length: 1780 
Category: known protein 
Classification: Cell structure/motility 
Prosite motifs: ATP_GTP A (152-160) 


1 MESNFNQEGV PRPSYVFSAD PIARPSEINF DGIKLDLSHE FSLVAPNTEA 
51 NSFESKDYLQ VCLRIRPFTQ SEKELESEGC VHILDSQTVV LKEPQCILGR 
101 LSEKSSGQMA QKFSFSKVFG PATTQKEFFQ GCIMQPVKDL LKGQSRLIFT 
151 YGLTNSGKTY TFQGTEENIG ILPRTLNVLF DSLQERLYTK MNLKPHRSRE 
201 YLRLSSEQEK EEIASKSALL RQIKEVTVHN DSDDTLYGSL TNSLNISEFE 
251 ESIKDYEQAN LNMANSIKFS VWVSFFEIYN EYIYDLFVPV SSKFQKRKML 
301 RLSQDVKGYS FIKDLQWIQV SDSKEAYRLL KLGIKHQSVA FTKLNNASSR 
351 SHSIFTVKIL QIEDSEMSRV IRVSELSLCD LAGSERTMKT QNEGERLRET 
401 GNINTSLLTL GKCINVLKNS EKSKFQQHVP FRESKLTHYF QSFFNGKGKI 
4 51 CMIVNISQCY LAYDETLNVL KFSAIAQKVC VPDTLNSSQD KLFGPVKSSQ 
501 DVSLDSNSNS KILNVKRATI SWENSLEDLM EDEDLVEELE NAEETQNVET 
551 KLLDEDLDKT LEENKAFISH EEKRKLLDLI EDLKKKLINE KKEKLTLEFK 
601 IREEVTQEFT QYWAQREADF KETLLQEREI LEENAERRLA IFKDLVGKCD 
651 TREEAAKDIC ATKVETEEAT ACLELKFNQI KAELAKTKGE LIKTKEELKK 
701 RENESOSLIQ ELETSNKKII TQNQRIKELI NIIDQKEDTZ NEFQNLKSHM 
751 ENTFKCNDKA DTSSLIINNK LICNETVEVP KDSKSKICSE RKRVNENELQ 
801 QDEPPAKKGS IHVSSAITED QKKSEEVRPN lAEIEDIRVL QENNEGLRAF 
851 LLTIENELKN EKEEKAELNK QIVHFQQELS LSEKKNLTLS KEVQQIQSNY 
901 DIAIAELHVQ KSKNQEQEEK IMKLSNEIET ATRSITNNVS QIKLMHTKID 
951 ELRTLDSVSQ ISNIDLLNLR DLSNGSEEDN LFNTQLDLLG NDYLVSKQVK 
1001 EYRIQEPNRE NSFHSSIEAI WEECKEIVKA SSKKSHQIEE LEQQIEKLQA 
1051 EVKGYKDENN RLKEKEHKNQ DDLLKEKETL IQQLKEELQE KNVTLDVQIQ 
1101 HVVEGKRALS ELTQGVTCYK AKIKELETIL ETQKVERSHS AKLEQDILEK 
1151 ESIILKLERN LKEFQEHLQD SVKNTKDLNV KELKLKEEIT QLTNNLQDMK 
1201 HLLQLKEEEE ETNRQETEKL KEELSASSAR TQNLKADLQR KEEDYADLKE 
1251 KLTDAKKQIK QVQKEVSVMR DEDKLLRIKI NELEKKKNQC SQELDMKQRT 
1301 IQQLKEQLNN QKVEBAIQQY ERACKDLNVK EKIIEDMRMT LEEQEQTQVE 
1351 QDQVLEAKLE EVERLATELE KHKEKCNDLE TKNNQRSNKE HENNTDVLGK 
1401 LTNLQDELQE SEQKYNADRK KWLEEKMMLI TQAKEAENIR NKEMKKYAED 
1451 RERFFKQQNE MEILTAQLTE KDSDLQKWRE ERDQLVAALE IQLKALISSN 
1501 VQKDNEIEQL KRIISETSKI ETQIMDIKPK RISSADPDKL QTEPLSTSFE 
1551 ISRNKIEDGS VVLDSCEVST ENDQSTRFPK PELEIQFTPL QPNKMAVKHP 
1601 GCTTPVTVEI PKARKRKSNE MEEDLVKCEN KKNATPRTNL KFPISDDRNS 
1651 SVKKEQKVAI RPSSKKTYSL RSQASIIGVN LATKKKEGTL QKFGDFLQHS 
1701 PSILQSKAKK IIETMSSSKL SNVEASKENV SQPKRAKRKL YTSEISSPID 
1751 ISGQVILMDQ KMKESDHQII KRRLRTKTAK 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKr2phtes3_35b4, frame 3 

TREMBL:U93121_1 product: *'M-phase phosphoprotein-l"; Human M-phase 
phosphoprotein-1 mRNA, partial cds., N = 1, Score = 3743, P - 0 
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PIR:A3688l MPM2-reactive phosphoprotein 1 - human (fragment), N = 2, 
Score = 2808, P = 2.5e-294 

TREMBL:AF070672_l product: -rabkinesinB" ; Homo sapiens rabkinesinS 
mRNA, complete cds . , N = 2, Score = 680, P = 2.6e-99 


>TREMBL:U93121_1 product: "M-phase phosphoprotein-1" ; Human M-phase 
phosphoprotein-l mRNA, partial cds- 
Length * 753 

HSPs: 

Score = 3743 (561.6 bits), Expect = O.Oe+00, P = O.Oe+00 
Identities = 752/753 (99%), Positives = 753/753 (100%) 

Query: 1028 VKASSKKSHQIEELEQQIEKLQAEVKGYKDENNRLKEKEHKNQDDLLKEKETLIQQLKEE 1087 

VKASSKKSHQI EELEQQI EKLQAEVKGYKDENNRLKEKEHKNQDDLLKEKETLIQQLKEE 
SbjCt: 1 VKASSKKSHQI EELEQQI EKLQAEVKGYKDENNRLKEKEHKNQDDLLKEKETLIQQLKEE 60 

Query: 1088 LQEKNVTLDVQIQHWEGKRALSELTQGVTCYKAKIKELETILETQKVERSHSAKLEQDI 1147 

LQEKNVTLDVQIQHVVEGKRALSELTQGVTCYKAKIKELETILETQKVERSHSAKLEQDI 
Sbjct: 61 LQEKNVTLDVQIQHWEGKRALSELTQGVTCYKAKIKELETILETQKVERSHSAKLEQDI 120 

Query: 1148 LEKESIILKLERNLKEFQEHLQOSVKNTKDLNVKELKLKEEITQLTNNLQDMKHLLQLKE 1207 

LEKESIILKLERNLKEFQEHLQDSVKNTKDLNVKELKLKEEITQLTNNLQDMKHLLQLKE 
Sbjct: 121 LEKESIILKLERNLKEFQEHLQDSVKNTKDLNVKELKLKEEITQLTNNLQDMKHLLQLKE 180 

Query: 1208 EEEETNRQETEKLKEELSASSARTQNLKADLQRKEEDYADLKEKLTDAKKQIKQVQKEVS 1267 

EEEETNRQETEKLKEELSASSARTQNLKADLQRKEEDYADLKEKLTDAKKQIKQVQKEVS 
Sbjct: 181 EEEETNRQETEKLKEELSASSARTQNLKADLQRKEEDYADLKEKLTDAKKQIKQVQKEVS 240 

Query: 1268 VMRDBDKLLRIKINELEKKKNQCSQELDMKQRTIQQLKEQLNNQKVEEAIQQYERACKDL 1327 

VMRDEDKLLRIKINELEKKKNQCSQELDMKQRTIQQLKEQLNNQKVEEAIQQYERACKDL 
Sbjct: 241 VMRDEDKLLRIKINELEKKKNQCSOELDMKQRTIQQLKEQLNNQKVEEAIQQYERACKDL 300 

Query: 1328 NVKEKIIEDMRMTLEEQEQTQVEQDQVLEAKLEEVERLATELEKWKEKCMDLETKNNQRS 1387 

NVKEKIIEDMRMTLEEQEQTQVEQDQVLEAKLEEVERLATELEKWKEKCNDLETKNNQRS 
Sbjct: 301 NVKEKIIEDMRMTLEEQEQTQVEQDQVLEAKLEEVERLATELEKWKEKCNDLETKNNQRS 360 

Query: 1388 NKEHENMTDVLGKLTNLQDELQESEQKYNADRKKWLEEKMMLITQAKEAENIRNKEMKKY 1447 

NKEHENNTDVLGKLTNLQDELQESEQKYNADRKKWLEEKMMLITQAKEAENIRNKEMKKY 
Sbjct: 361 NKEHENNTDVLGKLTNLQDELQESEQKYNADRKKWLEEKMMLITQAKEAENIRNKEMKKY 420 

Query: 1448 AEDRERFFKQQNEMEILTAQLTEKDSDLQKWREERDQLVAALEIQLKALISSNVQKDNEI 1507 

AEDRERFFKQQKEMEILTAQLTEKDSDLQKWREERDQLVAALEIQLKALISSNVQKDNEI 
Sbjct: 421 AEDRERFFKQQNEMEILTAQLTEKDSDLQKWREERDQLVAALEIQLKALISSNVQKDNEI 480 

Query: 1508 EQLKRIISETSKIETQIMDIKPKRISSADPDKLQTEPLSTSFEISRNKIEDGSVVLDSCE 1567 

EQLKRIISETSKIETQIMDIKPKRISSADPDKLQTEPLSTSFEISRNKIEDGSVVLDSCE 
Sbjct: 481 EQLKRIISETSKIETQIMOIKPKRISSADPDKLQTEPLSTSFEISRNKIEDGSVVLDSCE 540 

Query: 1568 VSTENDQSTRFPKPELEIQFTPLQPNKMAVKHPGCTTPVTVEIPKARKRKSNEMEEDLVK 1627 

VSTENDQSTRFPKPELEIQFTPLQPNKMAVKHPGCTTPVTV+IPKARKRKSNEMEEDLVK 
Sbjct: 541 VSTENDQSTRFFKPELEIQFTPLQPNKMAVKHPGCTTPVTVKIPKARKRKSNEMEEDLVK 600 

Query: 1628 CENKKNATPRTNLKFPISDDRKSSVKKEQKVAIRPSSKKTYSLRSQASIIGVNLATKKKE 1687 

CENKKNATPRTNLKFPISDDRNSSVKKEQKVAIRPSSKKTYSLRSQASIIGVNLATKKKE 
Sbjct: 601 CENKKNATPRTNLKFPISDDRNSSVKKEQKVAIRPSSKKTYSLEISQASIIGVNLATKKKE 660 

Query: 1688 GTLQKFGDFLQHSPSILQSKAKKIIETMSSSKLSNVEASKENVSQPKRAKRKLYTSEISS 1747 

GTLQKFGDFLQHSPSILQSKAKKIIETMSSSKLSNVEASKENVSQPKRAKRKLYTSEISS 
Sbjct: 661 GTLQKFGDFLQHSPSILQSKAKKIIETMSSSKLSNVEASKENVSQPKRAKRKLYTSEISS 720 

Query: 1748 PIDISGQVILMDQKMKESDHQIIKRRLRTKTAK 1780 

PIDISGQVILMDQKMKESDHQIIKRRLRTKTAK 
Sbjct: 721 PIDISGQVILMDQKMKESDHQIIKRRLRTKTAK 753 

Score = 197 (29.6 bits), Expect = 2.1e-ll, P = 2.1e-ll 
Identities = 114/542 (21%), Positives - 253/542 (46%) 

Query: 692 IKTKEELKKRENESDSLIQELETSNKKIITQNQRIKELINIIDQKEDTINEFQNLKSHM- 750 

+K + + E + I++L+ K +N R+KE + ++D + E + L + 
Sbjct: 1 VKASSKKSHQI EELEQQIEKLQAEVKGYKDEmiRLKEKEH--KNQDDLLKEKETLIQQLK 58 

Query: 751 ENTFKCNDKADTS-SLUNNKLICNETVEVPKDSKSKICSERKRVNENELQQDEPPAK — 807 

E + N D ++ K +E + K+KI E + + E + + AK 

Sbjct: 59 EELQEKNVTLDVQIQHVVEGKRALSELTOGVTCYKAKI-KELETILETQKVERSHSAKLE 117 

Query: 808 KGSIHVSSAITEDQKKSEEVRPNIAE-IEDIRVLQENNEGLRAFLLTIENELKNEK 862 
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+ + S I + ++ +E + ++ + +++ + L L+ + + N L++ K 

Sbjct: 118 QDILEKESIILKLERNLKEFQEHLQDSVKNTKDLNVKELKLKEEITQLTNNLQDMKHLLQ 177 

Query: 863 --EEKAELNKQIVH-FQQELSLSEKKNLTLSKEVQQIQSNYDIAIAELHVQKSKNQEQEE 919 

EE+ E N+Q ++ELS S + L ++Q+ + +Y A+L K K + ++ 
Sbjct: 178 LKEEEEETNRQETEKLKEELSASSARTQNLKADLQRKEEDY ADL KEKLTDAKK 230 

Query: 920 KIMKLSNEIETATRSITNNVSQIKLMHTKIDEL-RTLDSVSQISNIDLLNLRDLSNGSEE 978 

+1 ++ E+ S+ + + KL+ KI+EL .+ + SQ +D+ R + E+ 

Sbjct: 231 QIKQVQKEV SVMRD— EDKLLRIKINELEKKKNQCSQ— ELDMKQ-RTIQQLKEQ 280 

Query: 979 DNLPNTQLDLLGNDYLVSKQVKEYRIQEPNRENSFHSSIEAIWEECKEIVKASSKKSHQI 1038 

N N +++ Y + K+ ++E E+ ++E + E + K ++ 

Sbjct: 281 LN— NQKVEEAIQQY — ERACKDLNVKEKIIED-MRMTLEEQEQTQVEQDQVLEAKLEEV 335 

Query: 1039 EELEQQIEKLQAEVKGYKDENNRLKEKEHKNQDDLLKEKETLIQQLKEELQEKNVT 1094 

E L ++EK + + + +NN+ KEH+N D+L + L +Ij+E Q+ N 
Sbjct: 336 ERLATELEKWKEKCNDLETKNNQRSNKEHENNTDVLGKLTNLQDELQESEQKYNADRKKW 395 

Query: 1095 LDVQIQHVVEGKRA LSELTQGVTCYKAKIKELETILETQKVERSHSAKLEQDI 1147 

L++++ + KA +-f + + + E+E IL Q E+ + ++ 

Sbjct: 396 LEEKMMLITQAKEAENIRNKEMKKYAEDRERFFKQQHEME-ILTAQLTEKDSDLQKWRE- 453 

Query: 1148 LEKESIILKLERNLKEFQEHLQDSVKNTKDLNVKELK-LKEEITQLTNNLQDMKHLLQLK 1206 

E++ ++ LE LK + +V+ KD +++LK + E +++ + D+K + 
Sbjct: 454 -ERDQLVAALEIQLKAL ISSNVQ— KDNEIEQLKRIISETSKIETQIMDIK PKR 504 

Query: 1207 EEEEETNRQETEKLKEELSASSARTQN 1233 

+ ++ +TE L S + ++ 

Sbjct: 505 ISSADPDKLQTEPLSTSFEISRNKIED 531 

Score = 186 (27.9 bits). Expect = 3.2e-10, P = 3.2e-10 
Identities ^ 131/674 (19%), Positives = 294/674 (43%) 

Query: 673 LELKFNQIKAELAKTKGELIKT-KEELKKRENESDSLIQELETSNKKIITQNQRIKELIN 731 

L+ K ++ + +L K K LI+ KEEL+++ D IQ + + + Q + 
Sbjct: 35 LKEKEHKNQDDLLKEKETLICXJLKEELQEKNVTLDVQIQHWEGKRALSELTQGVTCYKA 94 

Query: 732 IIDQKEDTINEFQNL-KSHMENTFKCNDKADTSSLIINNKLICNETVEVPKDSKSKICSE 790 

I + E TI E Q + +SH + D + S+I+ + EE +DS 
Sbjct: 95 KIKELE-TILETQKVERSHSAKLEQ— DILEKESIILKLERNLKEFQEHLQDSr VKN 147 

Query: 791 RKRVNENELQ-QDEPPAKKGSIHVSSAITEDQKKSEEV-RPNIAEI-EDIRVLQENNEGL 847 

K +N EL+ ++E ++ + + +++ EE R ++ E++ + L 

Sbjct: 148 TKDLNVKELKLKEEITQLTNNLQDMKHLLQLKEEEEETNRQETEKLKEELSASSARTQNL 207 

Query: 848 RAFLLTIENELKNEKEEKAELNKQIVHFQQELSLSEKKNLTLSKEVQQI QSNYDI 902 

+A L E + + KE+ + KQI Q+E+S+ ++ L ++ ++ Q + ++ 

Sbjct: 208 KADLQRKCEDyADLKEKLTDAKKQIKQVQKEVSVMRDEDKLLRIKINELEKRKNQCSQEL 267 

Query: 903 AIAELHVQKSKNQEQEEKIMKLSNEIETATRSITNNVSQIKLMHTKIDEL-RTLDSVSQI 961 

+ + +Q+ K Q +K+ + +EA + + I+M ++E +T Q+ 

Sb j ct : 268 DMKQRTIQQLKEQLNNQKVEEAIQQYERACKDLNVKEKI lEDMRMTLEEQEQTQVEQDQV 327 

Query: 962 SNIDLLNLRDLSNGSEEDNLPNTQLDLLGNDYLVSKQVKEYRI— QEPNRENSFHSSIEA 1019 

L + L+ E+ L+ N + + + N ++ S + 

Sbjct: 328 LEAKLEEVERLATELEKWKEKCNDLETKNNQRSNKEHENNTDVLGKLTNLQDELQESEQK 387 

Query; 1020 IWEECKEIVKASSKKSHQIEELEQQIEKLQAEVKGYKDENNRLKEKEHKNQ— DDLLKEK 1077 

+ K+ ++ Q +E E K E+K Y ++ R +++++ + L EK 

Sbjct: 388 YMADRKKWLEEKMMLITQAKEAENIRNK EMKKYAEDRERFFKQQNEMEILTAQLTEK 444 

Query: 1078 ETLIQQLKEELQEKNVTLDVQIQHWEGKRALSELTQGVTCYKAKIKELETILETQKVER 1137 

++ +Q+ +EE + L++Of+ ++ + + ++ ++ET + K +R 

Sbjct: 445 DSOLQKWREERDQLVAALEIQLKALZSSNVQKDNEZEQLKRXZSETSKIETQINDIKPKR 504 

Query: 1138 SHSAKLEQDILEKESIILKLERNLKEFQEHLQDS VKNTKDLNVKELKLKEEITQLT 1193 

SA ++ E S ++ RN E + DS +N + + +L+ + T L 

Sbjct: 505 ISSADPDKLQTEPLSTSFEISRNKIEDGSWLDSCEVSTENDQSTRFPKPELEIQFTPLQ 564 

Query: 1194 NNLQDMKH— LLQLKEEEEETNRQETEKLKEEL-SASSARTQNLKADLQRKEEDYADLK- 1249 

N +KH + + + ++++ +++E+L + + + +L+ D + 

Sbjct: 565 PNKMAVKHPGCTTPVTVKIPKARKRKSNEMEEOLVKCENKKNATPRTNLKFPISDDRNSS 624 

Query: 1250 EKLTDAKKQIKQVQKEVSVMRDEDKLLRIKINELEKKKNQCSQEL-DMKQRTIQQLKEQL 1308 

K + K 1+ K+ +R + + I +N KKK Q+ D Q + L+ + 
Sbjct: 625 VK-KEQKVAIRPSSKKTYSLRSQASI — IGVNLATKKKEGTLQKFGDFLQHSPSILQSKA 681 

Query: 1309 NNQKVEEAIQQYERACKDLNVKEKIIEDNR 1338 

+K+ E+ ++ ++KE++ R 
Sbjct: 682 —KKIIETMSSSKLSNVEAS-KENVSQPKR 708 
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score = 165 (24.8 bits). Expect = 5.8e-08, P = 5.8e-08 
Identities = 140/626 (22%), Positives = 271/626 (43%) 

Query: 536 VEELENAEETQNVETKLLDEDLDKTLEENKAFISHEEKRKLLDLIEDLKKKLINEKKEK- 594 

+EELE E E K +D + L+E + H+ + LL EL ++L E +EK 
Sbjct: 11 lEELEQQIEKLQAEVKGY-KDENNRLKEKE HKNQDDLLKEKETLIQQLKEELQEKN 65 

Query: 595 LTLEFKIREEVT QEETQYWAQREADFKE — TLLQEREILEENAERRLAIFKDLVG 647 

+TL+ +1+ V E TQ +A KE T+L+ +++ E + +L +D++ 

Sbjct: 66 VTLDVQIQHVVEGKRALSELTQGVTCYKAKIKELETILETQKV-ERSHSAKLE— QDILE 122 

Query: 648 KCDT REEAAKDICATKVETEEATACLELKFNQIKAELAKTKGELIKTKEELKKRENE 704 

K E K+ ++ + T L +K ++K E+ + L K L+ +E E 

Sbjct: 123 KESIILKLERNLKEFQEHLQDSVKNTKDLNVKELKLKEEITQLTNNLQDMKHLLQLKEEE 182 

Query: 705 SDSLIQELETSNKKIITQNQRIKELINIIDQKEDTINEFQNLKSHMENTFKCNDKADTSS 764 

++ QE E +++ + R + L + +KE+ ++ ++ K K+S 
Sbjct: 183 EETNRQETEKLKEELSASSARTQNLKADLQRKEEDYADLKEKLTDAKKQIKQVQK-EVSV 241 

Query: 765 LIINNKLICNETVEVPKDSKSKICSERKRVNENELQQDEPPAKKGSIHVSSAITEDQKKS 824 

+ +KL+ + E+ K K CS+ + + +QQ + V AI + ++ 

Sbjct: 242 MRDEDKLLRIKINELEK— KKNQCSQELDMKQRTIQQLKEQLNNQK— VEEAIQQYERAC 297 

Query: 825 EEVRPNIAEIEDIRVLQENNEGLRAFLLTIENELKNEKEEKAELNKQIVHFQQELSLSEK 884 
+++ IED+R+ E E + + + L+ + EE L ++ ++++ + E 

Sbjct: 298 KDLNVKEKIIEDMRMTLEEQEQTQ---VEQDQVLEAKLEEVERLATELEKWKEKCNDLET 354 

Query: 885 KNLTLSKEVQQIQSNYDIAIAELHVQKSKNQEQEEKIMKLSNE-IETATRSITN N 938 

KN S + + ++N D+ + +L + + QE E+K + +E IT N 

Sbjct: 355 KNNQRSNK— EHENNTDV-LGKLTNLQDELQESEQKYNADRKKWLEEKMMLITQAKEAEN 411 

Query: 939 VSQIKLMHTKIDELRTLDSVSQISNIDL-LNLRD— LSNGSEEDNLPNTQLDLLGNDYLV 995 

+ ++ D R +++ + L +D L EE + L++ + 

Sbjct: 412 IRNKEMKKYAEDRERFFKQQNEMEILTAQLTEKDSDLQKWREERDQLVAALEIQLKALIS 471 

Query: 996 SKQVKEYRIQEPNRENSFHSSIEA-IWE-ECKEIVKASSKKSHQIEELEQQIEKLQAEVK 1053 

S K+ I++ RSSIEI + + KIA KQEL E + +++ 
Sbjct: 472 SNVQKDNEIEQLKRIISETSKIETQIMDIKPKRISSADPDKL-QTEPLSTSFEISRNKIE 530 

Query: 1054 GYKDENNRLKEKEHKNQDDLLKEKE TLIQQLKEELQEKNVTLDVQIQHVVEGKRA 1108 

+ + +Q + E T +Q K ++ T V ++ KR 

Sbjct: 531 DGSVVLDSCEVSTENDQSTRFPKPELEIQFTPLQPNKMAVKHPGCTTPVTVKIPKARKRK 590 

Query: 1109 LSELTQG-VTCYKAKIKELETILETQ-KVERSHSAKLEQDILEKES 1152 

+E+ + V C K T L+ +R+ S K EQ + + S 

Sbjct: 591 SNEMEEDLVKCENKKNATPRTNLKFPISDDRNSSVKKEQKVAIRPS 636 

Score = 143 (21.5 bits). Expect = 1.3e-05, P = 1.3e-Q5 
Identities « 164/684 (23%), Positives = 304/684 (44%) 

Query: 295 QKRKMLR-LSQDVKGYSFIKDLQWIQVSDSKEAYRLLKLGIKHQSVAFTKLNNASS 349 

+K +++ L ++++ + D+Q V + K A L G+ +L 
Sbjct: 49 EKETLIQQLKEELQEKNVTLDVQIQHVVEGKRALSELTQGVTCYKAKIKELETILETQKV 108 

Query: 350 -RSHSI-FTVKILQIEDSEMSRVIRVSELSLCDLAGSERTMKTQNEGE-RLRETGNINTS 406 

RSHS IL+ E + +E LS+KNE +L+E T+ 

Sbjct: 109 ERSHSAKLEQDILEKESIILKLERNLKEFQE-HLQDSVKNTKDLNVKELKLKEEITQLTN 167 

Query: 407 LLTLGKCINVLKNSEKSKFQQHVPFRESKLTHYFQSFFNGKGKICMIVNISQCYLAYDET 466 

L K + LK E+ +Q + +L+ N K + + Y E 

Sbjct: 168 NLQDMKHLLQLKEEEEETNRQETEKLKEELSASSARTQNLKADL QRKEEDYADLKEK 224 

Query: 467 LNVLKFSAIAQKVCVPDTLNSSQDKLFGPVKSSQDVSLDSNSNSKILNVKRATISWENSL 526 

L K I Q V ++ +DKL +K ++ + N S+ L++K+ TI 
Sbjct: 225 LTDAK-KQIKQ-VQKEVSVMRDEDKLLR-IKINE-LEKKKNQCSQELDMKQRTIQQLKEQ 280 

Query: 527 EDLMEDEDLVEELENAEETQNVETKLLDEDLDKTLEENKAFISHEEKRKLLDL-IEDLKK 585 

+ + E+ +++ E A + NV+ K++ ED+ TLEE + + E+ ++L+ +E++++ 
Sbjct: 281 LNNQKVEEAIQQYERACKDLNVKEKII-EDMRMTLEEQEQ— TQVEQDQVLEAKLEEVER 337 

Query: 586 KLIN-EK-KEKLT-LEFKIREEVTQEFTQYWAQREADFKETLLQEREILEE NAERR 638 

EK KEK LE K + +E + K T LQ+ E+ E NA+R+ 

Sbjct: 338 LATELEKWKEKCNDLETKNNQRSNKEHEN NTDVLGKLTNLQD-ELQESEQKYNADRK 393 

Query: 639 LAIFKDLVGKCDTREEAAKDICATKVETEEATACLELKFNQIKAELAKTKGELIKTKEEL 698 

+ + ++ T+ + A++I K E ++ E F Q + E+ +L + +L 

Sbjct: 394 KWLEEKMM— LITQAKEAENI-RNK-EMKKYAEDRERFFKQ-QNEMEILTAQLTEKDSDIi 448 

Query: 699 KKRENESDSLIQELETSNKKIITQN-QR— IKELINIIDQKEDTINEFQNLKSHMEMTF 754 
+K E D L+ LE K +1+ N Q+ I++L II + + ++K ++ 
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Sbjct: 449 QKWREERDQLVAALEIQLKALISSNVQKDNEIEQLKRIISETSKIETQIMDIKPKRISSA 508 

Query: 755 KCNDKADTSSLIINNKLICN— ETVEVPKDSKSKICSERK— RVNENELQ-QDEP— PA 806 

DK T L + ++ N E V DS ++ +E R + EL+ Q P P 

Sbjct: 509 D-PDKLQTEPLSTSFEISRNKIEDGSVVLDS-CEVSTENDQSTRFPKPELEIQFTPLQPN 566 

Query: 807 KKGSIH— VSSAITEDQKKSEEVRPNIAEIEDIRVLQENNEGLRA— -FLLTIENELKNE 861 

K H +++T K+ + + NE + ++ +N R F+++ + 
Sbjct: 567 KMAVKHPGCTTPVTVKIPKARKRKSNEMEEDLVKCENKKNATPRTNLKFPISDDRNSSVK 626 

Query: 862 KEEKAEL-— NKQIVHFQQELSLSEKKNLTLSKEVQQIQSNYDIAIAELHVQKSKNQEQE 918 

KE+K + +K+ +-»-$+ NL K+ +Q D + +SK ++ 

Sbjct: 627 KEQKVAIRPSSKKTYSLRSQASIIGV-NLATKKKEGTLQKFGDFLQHSPSILQSKAKKII 685 

Query: 919 EKIM— KLSNEIETATRSITNNVSQIKLMHTKI«-DELRT-LDSVSQISNID 965 

E + KLSN +E + NVSQ K " K+ E+ + +D Q+ +D 

Sbjct: 686 ETMSSSKLSN-VEASKE NVSQPKRAKRKLYTSEISSPIDISGQVILMD 732 

Score = 133 (20.0 bits). Expect = 1.6e-04, P = 1 . 6e-04 
Identities = 94/426 (22%), Positives = 188/426 (44%) 

Query: 527 EDLM-EDEDLVEELENAEETQNVETKLLDEDLDKTLEENKAFISHEEKRKLLDL-IEDLK 584 

+DL+ E E L+++L+ + +NV LD + +E +A + I++L+ 

Sbjct: 44 DDLLKEKETLIQQLKEELQEKNVT— LDVQIQHVVEGKRALSELTQGVTCYKAKIKELE 100 

Query: 585 KKLINEKKEKLTLEFKIREEVTQ-EFTQYWAQREA-DFKETLLQEREILEENAERRLAIF 642 

L +K E+ + K+ + + + + E +R +F+E L + ++ + L + 

Sbjct: 101 TILETQKVER-SHSAKLEQDILEKESIILKLERNLKEFQEHLQDSVKNTKDLNVKELKL- 158 

Query: 64 3 KDLVGKCDTREEAAKDICATKVETEEATACLELKFNQIKAELAKTKGELIKTKEELKKRE 702 

+ + + K + K E EE + ++K EL+ + K +L+++E 

Sbjct: 159 KEEITQLTNNLQDMKHLLQLKEEEEETN-— RQETEKLKEELSASSARTQNLKADLQRKE 215 

Query: 703 NESDSLIQELETSNKKIITQNQRIKELINIIDQK-EDTINEFQNLKSHMENTFKCNDKA- 760 

+ L ++L T KK I Q Q+ ++ D+ INE + K+ + 

Sbjct: 216 EDYADLKEKL-TDAKKQIKQVQKEVSVMRDEDKLLRIKINELEKKKNQCSQELDMKQRTI 274 

Query: 761 DTSSLIINNKLICNETVE VPKDS— KSKICSE-RKRVNENE LQQDEPPAKKGS 810 

+NN+ + E ++ KD K KI + R + E E ++QD+ K 

Sbjct: 275 QQLKEQLNNQKV-EEAIQQYERACKDLNVKEKIIEDMRMTLEEQEQTQVEQDQVLEAKLE 333 

Query: 811 IHVSSAITEDQKKSEEVRP-NIAEIEDIRVLQENNEGLRAFLLTIENELKNEKEEKAELN 869 
V TE +K E+ + ENN + L +++EL+ E E+K + 

Sbjct: 334 -EVERLATELEKWKEKCNDLETKNNQRSNKEHENNTDVLGKLTNLQDELQ-ESEQKYNAD 391 

Query: 870 KQIVHFQQELSLSEKKNLTLSKEVQQIQSNYDIAIAELHVQKSKNQEQEEKIMKLSNEIE 929 

++ ++++ L +T +KE + r++ + K E E+ K NE+E 

Sbjct: 392 RK-KWLEEKMML ITQAKEAENIRNK EMKKYAEDRERFFKQQNEME 435 

Query: 930 TATRSITNNVSQIKLMHTKIDEL 952 

T +T S ++ + D+L 
Sbjct: 436 ILTAQLTEKDSDLQKWREERDQL 458 

Pedant information for DKFZphtes3_35b4, frame 3 


Report for DKFZphtes3_35b4. 3 

[LENGTH 1 1780 

[MWl 206176.77 

Ipl) 5.60 

[HOMOLI TREMBL:U93121_1 product: "M-phase phosphoprotein-l"; Human M-phase 

phosphoprotein-1 niRNA, partial cds . 0,0 

IFUNCAT] 30.10 nuclear organization [s. cerevisiae, YEL061c) 2e-37 

[FUNCAT] 30.04 organization of cytoskeleton (S. cerevisiae, YEL061cJ 2e-37 

[FUNCATJ 08.22 cytoskeleton-dependent transport (S. cerevisiae, YEL061cl 2e-37 

[FUNCATJ 03.22 cell cycle control and mitosis [S, cerevisiae, YEL061c] 2e-'37 

[FUNCATJ 08.07 vesicular transport (golgi network, etc.) (S. cerevisiae, YDL058wl 

7e-30 

[FUNCAT] 30.03 organization of cytoplasm [S. cerevisiae, YDL058w] 7e-30 

[FUNCAT] 30.05 organization of centrosome {S. cerevisiae, YPR141cl 3e-23 

[FUNCAT] 11.01 stress response [S. cerevisiae, YPR141cl 3e-23 

[FUNCATl 03.07 pheromone response, mating-type determination, sex-specific proteins 

[S. cerevisiae, YPR141c] 3e-23 

(FUNCAT) 03.13 meiosis [S. cerevisiae, YPRl41cl 3e-23 

[FUNCAT] 06.10 assembly of protein complexes [S. cerevisiae, YPR141c] 3e-23 

(FUNCAT] 09.10 nuclear biogenesis [S. cerevisiae, YPRl41c] 3e-23 

[FUN(:at1 11.04 dna repair (direct repair, base excision repair and nucleotide excision 

repair) (S. cerevisiae, YKR095wJ le-21 
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tFUNCAT] 99 unclassified proteins (S. cerevisiae, YLR309c] 6e-20 

(FUNCATJ 03.04 budding, ceil polarity and filament formation [S. cerevisiae, YHR023w 

MYOl - myosin-1 isoform) 4e~19 


(FUNCAT] 

(FUNCAT] 

[FUNCAT] 

jannaschii 

(FUNCAT] 

(FUNCATJ 

(FUNCATJ 

(FUNCATJ 

[FUNCAT] 

2e-07 

(FUNCAT] " 

(FUNCATJ 

3e-06 

(FUNCAT] 

(FUNCAT] 

[FUNCATJ 

YAL035wJ 2e-04 

( FUNCATJ 

[BLOCKS! 

(BLOCKS} 

[BLOCKS J 

[BLOCKS) 

[BLOCKS J 

[BLOCKS J 

[BLOCKS] 

(BLOCKS] 

[ BLOCKS J 

I SCOP J 

[SCOP J 

[SCOP] 

(ECJ 

(PIRKWJ 

(PIRKWJ 

[PIRKW] 

[PIRKWJ 
[PIRKHJ 
[PIRKWJ 
[PIRKW] 
(PIRKWJ 
(PIRKWJ 
(PIRKWJ 
(PIRKWJ 
[PIRKW] 
(PIRKW) 
[PIRKW] 
(PIRKWJ 
(PIRKWJ 
(PIRKWJ 
[PIRKW] 
(PIRKW) 
[PIRKWJ 
[PIRKWJ 
(PIRKW) 
[PIRKWJ 
[PIRKW] 
(PIRKW) 
[PIRKWJ 
(PIRKWJ 
[PIRKWJ 
[PIRKWJ 
[PIRKWJ 
[PIRKW] 
[PIRKW] 
[PIRKWJ 
[PIRKW] 
(PIRKWJ 
(PIRKWJ 
(PIRKW) 
[PIRKW] 
(PIRKWJ 
[PIRKWJ 
[PIRKWJ 
[PIRKWJ 
(PIRKWJ 
(PIRKWJ 
[PIRKW] 
[PIRKW) 
[PIRKWJ 


03.25 cytokinesis (S. cerevisiae, YHR023w MYOl - myosin-l isoform} 4e-19 

03.19 recombination and dna repair (S. cerevisiae, YNL250w] le-15 
1 genome replication, transcription, recombination and repair (M. 

MJ1322J 2e-14 

30.13 organization of chromosome structure [S. cerevisiae, YDR285wJ 2e-09 
09.04 biogenesis of cytoskeleton [S. cerevisiae, YKL179cJ 3e-09 

09.13 biogenesis of chromosome structure (S. cerevisiae, YLR086w] 2e-07 
03.01 cell growth (S. cerevisiae, YNL079c) 2e-07 

08.99 other intracellular-transport activities (S. cerevisiae, YNL079ci 


22.01 cell cycle check point proteins 
05.99 other pheromone response activities 


[S. 


cerevisiae, YGL086wJ le-06 
[S. cerevisiae, YHRlSBc) 


05.01.04 transcriptional control 
classification not yet clear-cut 


(S. cerevisiae, YDR217c] 4e-06 
[S. cerevisiae, YJR134c) 2e-05 


05.04 translation {initiation, elongation and termination) (S. cerevisiae, 

[M. jannaschii, MJ1254J 0.001 


r general function prediction 
BL00387A 
BL00411H 
BL00411G 
BL00411F 

BL0G411E Kinesin motor domain proteins 
BL00411D Kinesin motor domain proteins 
BL00411C Kinesin motor domain proteins 
BL00411B Kinesin motor domain proteins 
BL00411A Kinesin motor domain proteins 
d2kin-l 3.29. l.S. 3 Kinesin [Rat (Rattus 
d2tmab_ 1.105.4.1.1 Tropomyosin [rabbit 

d3kar 3.29.1.5.4 Kinesin (Baker's yeast 

3.6.1.32 Myosin ATPase 5e-25 
nucleus 4e~27 
phosphotransferase 3e-16 
duplication" 6e-20 
citrulline 6e-18 
tandem repeat 4e-24 
heterodimer 3e-28 
endocytosis le-23 
heart le-17 • 

transmembrane protein 2e-28 

serine/threonine-specific protein Icinase 3e-16 
zinc finger le-23 
surface antigen 2e-ie 
DNA binding le-25 
metal binding le-23 
muscle contraction 4e-24 
heterotetramer 4e-24 
acetylated amino end 2e-19 
actin binding 5e-25 
mitosis 3e~58 
microtubule binding 3e-58 
ATP 3e-58 

thick filament 4e-24 
phosphoprotein 9e-29 
leucine zipper le-12 
skeletal muscle 8e-24 
disulfide bond le-12 
heterotrimer le-29 
calcium binding 6e-18 
alternative splicing 4e-21 
P-loop 2e-63 
coiled coil 3e-58 
heptad repeat le-25 
methylated amino acid 4e-24 
peripheral membrane protein le-23 
dimer le-12 
cardiac muscle le-17 
hydrolase 5e-25 
microtubule 6e-15 
muscle 7e-23 
membrane protein 6e-20 
GTP binding 8e-22 
EF hand 6e-18 
cell division le-25 
cytoskeleton 4e-24 
hair 6e-18 

Golgi apparatus 8e-24 
calmodulin binding le-23 


norvegicus) 2e-68 
(Oryctolagus cuniculus) 
(Saccharomyce 2e-09 


4e-05 


826 


wo 01/12659 


PCT/lBOO/01496 


unassigned Ser/Thr or Tyr-specific protein kinases 3e-16 

myosin motor domain homology 5e-25 

alpha-actinin actin-binding domain homology le-13 

kinesin-related protein KIPl 9e-27 

kinesin-related protein CIN8 4e-36 

kinesin heavy chain 4e-24 

plectin le~13 

trichohyalin 6e-18 

kinesin-related protein KIF3 le-29 

kinesin-related protein KIF2 3e-20 

ribosomal protein SIO homology le-13 

giantin 8e-24 

protein kinase homology 3e-16 

protein kinase C zinc-binding repeat homology 2e-13 

kinesin-related protein unc-104 8e-26 

human early endosome antigen 1 le-23 

unassigned kinesin-related proteins le-28 

Mycoplasma genitalium hypothetical protein MG218 4e-17 

myosin heavy chain 5e-25 

conserved hypothetical P115 protein 4e-20 
centromere protein E 5e-24 
calmodulin repeat homology 6e-18 
kinesin-related protein KLP61F le-25 
hypothetical protein MJ0914 3e-12 
kinesin-related protein MKLP-1 2e-63 
pleckstrin repeat homology 8e-26 
hypothetical protein MJ1322 4e-13 
kinesin-related protein KIFIB 3e-28 
kinesin motor domain homology 2e-63 
kinesin-related protein KLPA 7e-25 
kinesin-related protein nodA le-12 
kinesin-related protein Eg5 5e-30 
ATP_GTP_A 1 
Kinesin motor domain 
Irregular 
3D 

LOWCOMPLEXITY 7.53 % 

COILEDCOIL 19-78 % 


SEQ MESNFNQEGVPRPSYVFSADPIARPSEINFDGIKLDLSHEFSLVAPNTEANSFESKDYLQ 

SEG 

COILS 

3kar- 

SEQ VCLRIRPFTQSEKELESEGCVHILDSQTWLKEPQCILGRLSEKSSGQMAQKFSFSKVFG 

SEG 

COILS * 

3kar- /,/.'.','.[[.'. 

SEQ PATTQKEFFQGCIMQPVKDLLKGQSRLIFTYGLTNSGKTYTFQGTEENIGILPRTLNVLF 

SEG 

COILS 

3kar- 

SEQ DSLQERLYTKMNLKPHRSREYLRLSSEQEKEEIASKSALLRQIKEVTVHNDSDDTLYGSL 

SEG 

COILS 

3kar- ['///. 

SEQ TNSLNISEFEESIKDYEQANLNMANSIKFSVWVSFFEIYNEYIYDLFVPVSSKFQKRKML 

SEG 

COILS 

3kar- EBEEEEEEEEETTEEEETTTCC CCEE 

SEQ RLSQDVKGYSFIKDLQWIQVSDSKEAYRLLKLGIKHQSVAFTKLNNASSRSHSIFTVKIL 
SEG 

COILS ../...[[ 

3kar- EEETTTTE-EEEETTCCEEECCGGGHHHHHHHHHHHHCCTTTTCHHHHHHCEEEEEEEEE 

SEQ QIEDSEMSRVIRVSELSLCDLAGSERTMKTQNEGERLRETGNINTSLLTLGKCINVLKNS 

SEG 

COILS 

3kar- E— EETTTTCEEEEEEEEEECCCCCCC CCCHHHHHHHHHHHHHHHHHHHHHHHHTT 

SEQ EKSKFQQHVPFRESKLTHYFQSFFNGKGKICMIVNISQCYLAYDETLNVLKFSAIAQKVC 

SEG 

COILS 

3kar- tttt--tccttttthhhhhhgggctttteeeeeeeecccggghhhhhhhhhhhh! , . 

SEQ VPDTLNSSQDKLFGPVKSSQDVSLDSNSNSKILNVKRATISWENSLEDLMEDEDLVEELE 


827 


(SUPFAMJ 

(SUPFAMJ 

[SUPFAM] 

(SUPFAMJ 

(SUPFAM] 

( SUPFAM j 

(SUPFAM] 

(SUPFAMJ 

(SUPFAM] 

(SUPFAM] 

(SUPFAM] 

(SUPFAMJ 

(SUPFAM] 

(SUPFAM] 

[SUPFAM] 

(SUPFAM] 

(SUPFAMJ 

(SUPFAM] 

(SUPFAMJ 

(SUPFAM] 

(SUPFAMJ 

(SUPFAM] 

[SUPFAM] 

(SUPFAMJ 

(SUPFAM] 

(SUPFAMJ 

(SUPFAM] 

(SUPFAM] 

(SUPFAMJ 

(SUPFAM] 

(SUPFAMJ 

[SUPFAMJ 

[PROSITEJ 

(PFAMJ 

(KWJ 

[KWJ 

[KWJ 

(KWJ 
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SEG xxxxxxxxxxxxxxxxxx 

COILS 

3kar- 

SEQ NAEETQNVETKLLDEDLDKTLEENKAFI SHEEKRKLLDLI EDLKKKLI NEKKEKLTLEFK 

SEG xxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxx . . 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ IREEVTQEFTQYWAQREADFKETLLQEREILEENAERRLAl FKDLVGKCDTREEAAKDIC 

SEG 

COILS CCCCCCC 

3kar- 

SEQ ATKVETEEATACLELKFNQIKAELAKTKGELIKTKEELKKRENESDSLIQELETSNKKI I 

SEG 

COILS CCCCCCCCCCCCCCC 

3kar- 

SEQ TQNQRIKELINIIDQKEDTINEFQNLKSHMENTFKCNDECADTSSLIINNKLICNETVEVP 

SEG 

COILS CCCCCCCCCCCCCCC 

3kar- 

SEQ KOSKSKICSERKRVNBNELQQDEPPAKKGSIHVSSAITEDQKKSEEVRPNIAEIEDIRVL 

SEG 

COILS CCCC 

3kar- 

SEQ QENNEGLRAFLLTIENELKNEKEEKAELNKQIVHFQQELSLSEKKNLTLSKEVQQIQSNY 

SEG xxxxxxxxxxxxxxxx 

cox LS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

Bkar- 

SEQ DIAIAELHVQKSKNQEQEEKIMKLSHEIETATRSITNNVSQIKLMHTKIDELRTLDSVSQ 

SEG 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ ISNIDLLNLRDLSNGSEEDNLPNTQLDLLGNDYLVSKQVKEYRIQEPNRENSFHSSIEAI 

SEG 

COILS 

3kar- , 


SEQ WEECKEIVKASSKKSHQIEELEQQIEKLQAEVKGYKDENNRLKEKEHKNQDDLLKEKETL 

SEG xxxxxxxxxxxxx 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ IQQLKEELQEKNVTLDVQIQHWEGKRALSELTQGVTCYKAKIKELETILETQKVERSHS 

SEG 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ AKLEQDI LEKESI I LKLERNLKEFQEHLQDSVKNTKDLNVKELKLKEEI TQLTNNLQDMK 

SEG 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ HLLQLKEEEEETNRQETEKLKEELSASSARTQNLKADLQRKEEDYADLKEKLTDAKKQIK 

SEG .xxxxxxxxxxxxxxxxxxx 

COI LS CCCCC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ QVQKEVSVMRDEDKLLRIKINELEKKKNQCSQELDMKQRTIQQLKEQLNNQKVEEAIQQY 

SEG 

COILS CCCCCCCCCCCC 

3kar- 

SEQ ERACKDLNVKEKI lEDMRMTLEEQEQTQVEQDQVLEAKLEEVERLATELEKWKEKCNDLE 

SEG xxxxxxxxxxxxxxxxx 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ TKNNQRSNKEHENNTDVLGKLTNLQDELQESEQKYNADRKKWLEEKMMLITQAKEAENIR 

SEG 

COILS CC 

3kar- 

SEQ NKEMKKYAEDRERFFKQQNEMEILTAQLTEKDSDLQKWREERDQLVAALEIQLKALISSN 

SEG 


828 


wo 0 J/1 2659 


PCT/IBOO/01496 


COILS 

3kar- * i ! 

SEQ VQKDNEIEQLKRIISETSKIETQrMDIKPKRISSADPDKLQTEPLSTSFEISRKKIEDGS 

SEG 

COILS 

3kar- • - * * " * * - * ! ^ ! 

SEQ VVLDSCEVSTENDQSTRFPKPELEIQFTPLQPNKMAVKHPGCTTPVTVEIPKARKRKSNE 

SEG 

COILS 

3kar- y..."-^ ... 

SEQ MEEDLVKCENKKNATPRTNLKFPISDDRNSSVKKEQKVAIRPSSKKTYSLRSQASIIGVN 

SEG 

COILS 

3kar- * * i i 1 ! 1 !!!!!!!!!!!!!!!!!! ! 

SEQ LATKKKEGTLQKFGDFLQHSPSILQSKAKKIIETMSSSKLSNVEASKENVSQPKRAKRKL 

SEG 

COILS 

3kar- ^ ^ ^ ^ ^ ^ ^ i ! i i !!!!!!!!!!!!!!!!!!! ! 

SEQ YTSEISSPIDISGQVILMDQKMKESDHQIIKRRLRTKTAK 

SEG 

COILS 

3kar- 


Prosite for DKFZphtes3_35b4 . 3 


PS00017 152->160 ATP_GTP_A PDOC00017 


Pfam for DKFZphtes3_35b4 . 3 
HMM_NAME Kinesin motor domain 

"MM *RCRPlNeREindgcscvVQWPpWtGyktvhnghegds phks 

R+RP+ + + +v + ++++ ++ + * ++ 

Query 64 RIRPFTQSEKELESEGCVHILDSQTWLKEPQCILGRLSEKSSGQMAQK 112 

"MM FtFDHVFWWncTQedVYdtvAHPI VDDcFhGYNCTI FAYGQTGSGKTYTM 

F+F +VF++++TQ++ +++ + V+D+++G IF+YG T SGKTYT 
Query 113 FSFSKVFGPATTQKEFFQGCIMQPVKDLLKGQSRLIFTYGLTNSGKTYTF 162 

"MM MGpggehPDHraGIIPRcCHDIFdrldkfqekDhdFW 

G +++GI+PR+++ +FD++ + +++ 

Quej^y 163 QG TEENIGILPRTLNVLFDSLQERL-YTKMNLKPHRSREYLRLSSE 207 

HMM 

Query 208 QEKEEIASKSALLRQIKEVTVHNDSDDTLYGSLTNSLNISEFEESIKDYE 257 

"MM hVkCSYMEIYNEelYDLLCPnP. . . qhMkpLnlHEHPN 

+V +S++EIYNE+IYDL +P++ 0++K L++ + + 
Query 258 QANLNMANSIKFSVWVSFFEIYNEYIYDLFVPVSSKFQKRKMLRLSQDVK 30^7 

HMM MGpYVqCCTEfHVcSYeDachWIWqGnknRHVAaTnMNdhSSRSHtlFTI 
++++++ V +A +++ +G K+ VA T++N SSRSH+IFT+ 
Query 308 GYSFIKDLQWIQVSDSKEAYRLLKLGIKHQSVAFTKLNNASSRSHSIFTV 357 

HMM HVeQrHk . qcdehvcHSKMNLVDLAGSERvnrTGAEGQRlKEGcNINqSL 

++ Q + + +++S ++L DLAGSER+ +T+ EG RL+E +NIN SL 
Query 358 KILQIEDSEMSRVIRVSELSLCDLAGSERTMKTQNEGERLRETGNINTSL 407 

"MM ttLGnVInaLaDgqTKYmYgghgHIPYRDSKLTWlLQDSLGGNcKTcMIA 
+fLG++IN+L + + + +H+P+R+SKLT+ +Q + G +K CMI + 
Query 408 LTLGKCINVLKNSE KSKFQQHVPFRESKLTHYFQSFFNGKGKICMIV 454 

HMM CIWPadWNYEETLSTLRYAdRAKnIkNkPQINEDPca* 

+ 1+ + Y+ETL4-+L++ + A+++ + ++N+++++ 
Query 455 NISQCYLAYDETLNVLKFSAIAQKVCVPDTLNSSQDK 491 
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group: metabolism 

DKF2phtes3_35b5 encodes a novel 466 amino acid protein, with similarity to bovine accessory 
subunit for vacuolar ATPase and rat C7-1 protein. 

The vacuolar proton-ATPase (V-ATPase) translocates protons into intracellular organelles or 
across the plasma membrane of specialized cells. The catalytic domain consists of a hexamer of 
3 A subunits and 3 B subunits, plus accessory subunits C, D, and E. The rat homolog C7-1 seems 
to be enriched in aged adult rats in the frontal cortex. 

The novel protein can find application in modulating the v-ATPase activity in endocytic and 
secretory organelles. 


strong similarity to bovine vacuolar ATPase {EC 3.6.1.-) chain A 

complete cDNA, complete cds potential start at Bp 8 , EST hits 
matches perfect to 154197 hypothetical protein, but posess 186 aa 
additional at N-terminus 

Sequenced by OKFZ 

Locus: unknown 

Insert length: 2043 bp 

Poly A stretch at pos. 2033, polyadenylation signal at pos. 2012 


1 GGCGGCCATG GCGACGGCTC GAGTGCGGAT GGGGCCGCGG TGCGCCCAGG 
51 CGCTCTGGCG CATGCCGTGG CTGCCGGTGT TTTTGTCGTT GGCGGCGGCG 
101 GCGGCGGCGG CAGCGGCGGA GCAGCAGGTC CCGCTGGTGC TGTGGTCGAG 
151 TGACCGGGAC TTGTGGGCTC CTGCGGCCGA CACTCATGAA GGCCACATCA 
201 CCAGCGACTT GCAGCTCTCT ACCTACTTAG ATCCCGCCCT GGAGCTGGGT 
251 CCCAGGAATG TGCTGCTGTT CCTGCAGGAC AAGCTGAGCA TTGAGGATTT 
301 CACAGCATAT GGCGGTGTGT TTGGAAACAA GCAGGACAGC GCCTTTTCTA 
351 ACCTAGAGAA TGCCCTGGAC CTGGCCCCCT CCTCACTGGT GCTTCCTGCC 
401 GTCGACTGGT ATGCAGTCAG CACTCTGACC ACTTACCTGC AGGAGAAGCT 
4 51 CGGGGCCAGC CCCTTGCATG TGGACCTGGC CACCCTGCGG GAGCTGAAGC 
501 TCAATGCCAG CCTCCCTGCT CTGCTGCTCA TTCGCCTGCC CTACACAGCC 
551 AGCTCTGGTC TGATGGCACC CAGGGAAGTC CTCACAGGCA ACGATGAGGT 
601 CATCGGGCAG GTCCTGAGCA CACTCAAGTC CGTVAGATGTC CCATACACAG 
651 CGGCCCTCAC AGCGGTCCGC CCTTCCAGGG TGGCCCGTGA TGTAGCCGTG 
701 GTGGCCGGAG GGCTAGGTCG CCAGCTGCTA CAAAAACAGC CAGTATCACC 
751 TGTGATCCAT CCTCCTGTGA GTTACAATGA CACCGCTCCC CGGATCCTGT 
801 TCTGGGCCCA AAACTTCTCT GTGGCGTACA AGGACCAGTG GGAGGACCTG 
851 ACTCCCCTCA CCTTTGGGGT GCAGGAACTC AACCTGACTG GCTCCTTCTG 
901 GAATGACTCC TTTGCCAGGC TCTCACTGAC CTATGAACGA CTCTTTGGTA 
951 CCACAGTGAC ATTCAAGTTC ATTCTGGCCA ACCGCCTCTA CCCAGTGTCT 
1001 GCCCGGCACT GGTTTACCAT GGAGCGCCTC GAAGTCCACA GCAATGGCTC 
1051 CGTCGCCTAC TTCAATGCTT CCCAGGTCAC AGGGCCCAGC ATCTACTCCT 
1101 TCCACTGCGA GTATGTCAGC AGCCTGAGCA AGAAGGGTAG TCTCCTCGTG 
1151 GCCCGCACGC AGCCCTCTCC CTGGCAGATG ATGCTTCAGG ACTTCCAGAT 
1201 CCAGGCTTTC AACGTAATGG GGGAGCAGTT CTCCTACGCC AGCGACTGTG 
1251 CCAGCTTCTT CTCCCCCGGC ATCTGGATGG GGCTGCTCAC CTCCCTGTTC 
1301 ATGCTCTTCA TCTTCACCTA TGGCCTGCAC ATGATCCTCA GCCTCAAGAC 
1351 CATGGATCGC TTTGATGACC ACAAGGGCCC . CACTATTTCT TTGACCCAGA 
1401 TTGTGTGACC CTGTGCCAGT GGGGGGGTTG AGGGTGGGAC GGTGTCCGTG 
1451 TTGTTGCTTT CCCACCCTGC AGCGCACTGG ACTGAAGAGC TTCCCTCTTC 
1501 CTACTGCAGC ATGAACTGCA AGCTCCCCTC AGCCCATCTT GCTCCCTCTT 
1551 CAGCCCGCTG AGGAGCTTTC TTGGGCTGCC CCCATCTCTC CCAACAAGGT 
1601 GTACATATTC TGCGTAGATG CTAGACCAAC CAGCTTCCCA GGGTTCGTCG 
1651 CTGTGAGGCG TAAGGGACAT GAATTCTAGG GTCTCCTTTC TCCTTATTTA 
1701 TTCTTGTGGC TACATCATCC CTGGCTGTGG ATAGTGCTTT TGTGTAGCAA 
1751 ATGCTCCCTC CTTAAGGTTA TAGGGCTCCC TGAGTTTGGG AGTGTGGAAG 
1801 TACTACTTAA CTGTCTGTCC TGCTTGGCTG CCGTTATCGT TTTCTGGTGA 
1851 TGTTGTGCTA ACAATAAGAA GTACACGGGT TTATTTCTGT GGCCTGAGAA 
1901 GGAAGGGACC TCCACGACAG GTGGGCTGGG TGCGATCGCC GGCTGTTTGG 
1951 CATGTTCCCA CCGGGAGTGC CGGGCAGGAG CATGGGGTGC TTGGTTGTTT 
2001 CCTTCCTAAT AAAATAAACG CGGGTCGCCA TGCAAAAAAA AAA 


BLAST Results 


No BLAST result 
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Medline entries 

95014142: 

granules^''^^^^''''^ subunit for vacuolar H(+)-ATPase from chromaffin 

97215246: 

"r5^5i^^^^^" ^ brain gene associated with aging by 
PGR differential display method. ^ ^ 


Peptide information for frame 2 

ORF from 8 bp to 1405 bp; peptide length: 466 
Category: strong similarity to known protein 

J RCAQALWRMP WLPVFLSLAA AAAAAAAEQQ VPLVLWSSDR 

51 DLWAPAADTH EGHITSDLQL STYLDPALEL GPRNVLLFLQ DKLSIEDFTA 
101 YGGVFGNKQD SAFSNLENAL DLAPSSLVLP AVDWYAVSTL TTYLOEKtrA 
2ol nvJ-^^f^^r ^^^'^^N^S^P ALLLIRLPYT ASS^lS SdEVIG 
201 QVLSTLKSED VPYTAALTAV RPSRVARDVA WAGGLGRQL LQKQPVSPVI 
25 HPPVSYNDTA PRILn^AQNF SVAYKDQWED LTPLXreVQE Ss^ND 
?si vL^^"^^"-^^"^ RLFGTTVTFK FILANRLYPV SARHWFTMER LEVHSNGSVA 
L^^?^''''''^ SIYSFHCEYV SSLSKKGSLL VARTQPSPWQ MMLQDFQIQA 

BIASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_35b5, frame 2 
TREMBL:AF035387^1 gene: "07-1"; product: «C7-1 protein"; Rattus 

s"o;f ^^^^ ■ ^ - 

Tie-lSO^^ *^yP<^thetical protein - human. N - 1, Score « 1464, P = 
Length =463 

HSPs; 

Score =2088 (313.3 bits), Expect - 3.8e-216, P « 3.8e-216 
Identxties = 408/463 (88%), Positives = 426/463 (92%) 

^^y^^'^fJQ^J^^PW^PV^LSLAAAAAAAAAEQQVPLVLWSSDR^^^^ 63 

b R A LW + LSL A AAA AAECX)VPLVLWSSDRDLWAP ADTHFrw 

SRIRTGTRWAPVLW LLLSLVAVAAAVSAESvPLVLMsioRDLSAPVAD^HEGH 61 

J™f°^S''^''^^''^*^^"5P'^^''™DKLSIEDFrAyGGVrcNKQDSAFSNLENALDLA 123 
62 ITSDMQLSTYLOPAI,EaGPWIVLLFWDKLSIEDFTAYGevntMirnn««St^f„?rS^ , 


Query: 

4 

Sbjct : 

8 

Query: 

64 

Sbjct: 

62 

Query: 

124 

Sbjct: 

122 

Query: 

184 

Sbjct: 

182 

Query: 

244 

Sbjct: 

242 

Query: 

304 


«!fJ*!^J'!*^°"^*^^'^^''^^^<^^KLGASPLHVDLATLRELKLNASLPALLLIRLPYTAS 
^ff^^^P^^^W^^+STLTTYLQEKLGASPLHVDLATL+ELKLNASLPALLLlS^^ 
PSSLVLPAVDWYAISTLTTYLOEKLGASPT.HvnT.aTT.«f=.i I.T . , ™ 


^ocT„ '^'^ ''"^''^^^^^^^Q^^^^^SPLHVDLATL+ELKLNASLPALLLIRLPYTAS 

psslvlpavdwyaistlttylqeklgasplhvdiatlkelklnaslpallliSpy?as 

^J^I^n^^y^'^^^^^^^^^^^^'^^^^^'^^PYTAALTAVRPSRVARDVAWAGGLGRQLL 
rJ1^o^,^^''^'''''^'''^^^''^^''^^^°^^^^'^^TAVRPSRVARDVAWAGGL^ 

glmaprevltgndevigqvlstlesedvpytaaltavrpsrvardvamvagglgrqll. 


is 183 
PYTASS 

IS 181 


iQT 241 

^~ Z~ ""'"'^"**'«^""W"'^vftiivuywtULTPLTFGVQELNLTGSFWNDSFA 303 

Q SP IHPPVSYNDTAPRILFWAQHFSVAYKD+W+OLT LTFGV+ LNLTGSnraOSFA 
QVASPAIHPPVSYNDTAPRILFWAQNFSVAYKDBWKDLTSLTFGVENUILTGS^ 301 

FULSLTYERLFGTTVTFKFILANRLYPVSARHWFTMERLEVHSNGSVAYFNASQVTGPSIY 363 
LSLTYE LFG TVTFKFILA+R YPVSAR^WrTMERLE+HSNGSvLFN SQVTGPS^^ 
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Sbjct: 302 MLSLTYEPLFGATVTFKFILASRFYPVSARYWFTMERLEIHSNGSVAHFNVSQVTGPSIY 361 

Query: 364 SFHCEYVSSLSKKGSLLVARTQPSPWQMMLQDFQIQAFNVMGEQFSYASOCASFFSPGIW 423 

SFHCEYVSSLSKKGSLLV PS WQM L +FQIQAFNV GEQFSYASDCA FFSPGIW 
Sbjct: 362 SFHCEYVSSLSKKGSLLVTNV-PSLWQMTLHNFQIQAFNVTGEQFSYASDCAGFFSPGIW 420 

Query: 424 MGLLTSLFMLFIFTYGLHMILSLKTMDRFDDHKGPTISLTQIV 4 66 

MGLLT+LFMLFI FTYGLHMILSLKTMDRFDD KGPTI +LTQI V 
Sbjct: 421 MGLLTTLFMLFIFTYGLHMILSLKTMDRFDDRKGPTITLTQIV 463 


Pedant information for DKF2phtes3_35b5, frame 2 


Report for DKFZphtes3_35b5 . 2 


[LENGTH] 

466 


(MWJ 

51621.44 


Ipl] 

5-73 


(HOMOL) 

TREMBL:AF035387_1 

gene: "07-1"; product: "C7-1 protein"; Rattus norvegicus C7-1 

protein {C7-1) 

mRNA, complete cds 

i. 0-0 

[PIRKW] 

hydrolase 0.0 


IPROSITE) 

MYRISTYL 7 


[PROSITE] 

CAMP PHOSPHO SITE 

1 

[PROSITE] 

CK2 PHOSPHO SITE 

7 

(PROSITE] 

TYR PHOSPHO SITE 

1 

(PROSITE] 

PKC PHOSPHO SITE 

8 

[PROSITE] 

ASN GLYCOSYLATION 

7 

(KW) 

SIGNAL PEPTIDE 38 


[KWJ 

TRANSMEMBEtANE 1 


[KW} 

LOW COMPLEXITY 

11.59 % 


S EQ MATARVRMGPRCAQALWRMPWLPVFLSLAAAAAAAAAEQQVPLVLWSSDRDLWAPAADTH 

SEG xxxxxxxxx 

PRD ccceeeecccchhhhhhhcccchhhhhhhhhlihhhhhhhccceeeecccccccccccccc 

MEM 

SEQ EGHITSDLQLSTYLDPALELGPRNVLLFLQDKLSIEDFTAYGGVFGNKQDSAFSNLENAL 

SEG 

PRD ccccccchhhhhccccccccccccceeecccccccccccccccccccccchhhhhhhhcc 

MEM 

SEQ DLAPSSLVLPAVDWYAVSTLTTYLQEKLGASPLHVDLATLRELKLNASLPALLLIRLPYT 

SEG xxxxxxxxxxxxxxx . . . 

PRD ccccccccccccceeeeehhhhhhhhhhccccchhhhhhhhhhhhhhcchhhhhhhcccc 

MEM 

SEQ ASSGLMAPREVLTGNDEVIGQVLSTLKSEDVPYTAALTAVRPSRVARDVAVVAGGLGRQL 

SEG xxxxxxxxxxxxxxxxxxxx . . 

PRD cccccceeeeeecccccchhhhhhhccccccchhhhhhhccccceeehhhhhccccchhh 

MEM 

SEQ LQKQPVSPVIHPPVSYNDTAPRILFWAQNFSVAYKDQWEDLTPLTFGVQELNLTGSFWND 

SEG 

PRD hhhhccccccccccccccccceeeeeccccceeeeccccccccceeeeeecccccccccc 

MEM 

SEQ SFARLSLTYERLFGTTVTFKFILANRLYPVSARHWFTMERLEVHSNGSVAYFNASQVTGP 

SEG 

PRD hhhhhhhhhhhhccceeeeeeecccccccccchhhhhhhhhhcccccceeeeeecccccc 

MEM 

SEQ SI YS FHCEYVS SLS KKGS LL VARTQPS PWQMMLQDFQIQAFNVMGEQFS YAS DCAS FFS P 

SEG xxxxxxxxxx 

PRD ceeeeeeeeeeecccccceeeeeccccchhhhhhhhheeeeccccccccccccccccccc 

MEM MMMMMM 

SEQ GIWMGLLTSLFMLFIFTYGLHMILSLKTMDRFDDHKGPTISLTQIV 

SEG ; 

PRD ccchhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccceeeeccc 

MEM -MMMMMMMMMMMMMMMMMMMMMMM . 


Prosite for DKFZphtes3_35b5.2 

PSOOOOl 166->170 ASN_GLYCOSYLATION PDOCOOOOl 

PSOOOOl 257->261 ASN_GLYCOSYLATION PDOCOOOOl 
PSOOOOl 269->273 ASN^GLYCOSYLATION PDOCOOOOl 
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PSOOOOl 

292-->296 

PSOOOOl 

299->303 

PSOOOOl 

346->350 

PSOOOOl 

353->357 

PSO0O04 

375->379 

PSO0O05 

3->6 

PSO0O05 

48~>51 

PS00005 

159->162 

PS00005 

205->20B 

PSO0O05 

318->321 

PS00005 

331->334 

PSO00O5 

374->377 

PS00005 

445->44B 

PS00006 

48->52 

PS00006 

72->76 

PS00006 

94->98 

PS00006 

114->118 

PS00006 

159->163 

PS00006 

193->197 

PS00006 

255->259 

PS00007 

207->214 

PS00008 

102->108 

PS00008 

103->109 

PS00008 

200->206 

PS00008 

295->30l 

PS00008 

314->320 

PS00008 

421->427 

PS00008 

425->431 


ASK^GLYCOSYLATION 

ASN_GLyCOSyLATION 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

CAMP_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC_PHOSPH0~SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PH0SPH0_SITE 

CK2 PHOSPHO_SITE 

CK2~PH0SPH0_SITE 

TYR_PHOSPHO SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 


PDOCOOOOl 
PDOCGOOOl 
PDOCOOOOl 

PDcxrooool 

PDOC00004 
PDOC00005 
PDOC00005 

PDcxroooos 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00007 

PDOC00008 

PDCX:00008 

PDOC00008 

PDOC00008 

PDOC00008 

PDGC00008 

POOC00008 


(No Pfam data available for DKF2phtes3_35b5.2) 
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DKFZphtes3_35e21 


group : di f f erentiation/development 

DKFZphtes3_35e21.2 encodes a novel 104 amino acid putative interleukin precursor, related to 
interleukin-7. 

Due to the close relationship to human interleukin-7, the novel interleukin is expected to act 
as a new growth factor for human B lineage cells. Additionally, the protein should induce the 
gene rearrangement of the T-cell receptor repertoire, leading to thymocyte commitment, and 
subsequently induce both cytotoxic T-cell- and lymphocyte-activated killer cells. 

This new interleukin could find clinical application in a variety of conditions of 
hematolymphopoietic failure and different tumours, because of its recruitment of B cell 
lineage cells, cytotoxic T-cell- and lymphocyte -activated killer cells. 


similarity to interleukin-7 precursor 

complete cDNA, con?)lete cds, EST hits 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 2095 bp 

Poly A stretch at pos. 2085, polyadenylation signal at pos. 2067 


1 GGATGAAAGT GATTTAATTC ATTTTTAGAA tTTTTTTTTT GTTTTGTTTT 
51 AGCAACATGC TGAACAACTA ATTTACTTTA AAAATAAGCC AGTTAAAACA 
101 AAGGACGCTA AGCCCAAGTG GGGGGCAATA TTAGTCAGGA TCTTTGGGGT 
151 CTAATTCCAG ACCAACTTTC AGAAGCACTT CTTTGTCTCT GTTCTCACCT 
201 CTGCTGTCCC TCTCTTCCCT CATCCCCTAA GAGAGACAAA GATAAAAGCC 
251 CACCTGCATC CCTAAGTCTT ACTGAGATCA GCCACCCCAG GGGAGAGAAA 
301 CTGGATCTAC TTACAGCCAC CCCCTGTTTC CATCCATATA CTTACTTCCC 
351 CCAATTTGCA TGTGATTATG GAAACAAGTC ATGCTCATGA AAGCAACTGT 
401 AAAATAAAAG GTTATGGAGT AGTTCAGCAA CTTCTTCACA GCCAGCTTTG 
451 TGGAGCTGGG GAGGACTTAG GGCCCATTGG AGTCTCTTAT GTGTACAGCT 
501 TCAGGGCTGT CCCTTTCAGT TTGATTTTAA GCAATGCCTC ACTTCATAGC 
551 TTAGGGGGTA AGGATTCCAT TCAGGTAGGT TGTCTAAAGG AACTAATGGG 
601 ACCTCTCAGT GAATTAGCTG ACCAGATTTT AGGAAATCTT TTTAATTTCT 
651 ATGATTTTCC TTCTCACATT TTGAAATGGT AAAATTGACT GGAAATAATT 
701 TTTCTTGGTG CCTTATTGGT TTTCCTTGCA AACCTTTCTC ATATTTTCTC 
751 ATGACCATTG CCAGTGACCA AGGCCCATGT GTGTGTTGTG TGTAATTGTG 
801 GGCATGTACA AGCTTAAATA ACGTGCCGAC AGCACTGTTT CAAAGTTGGT 
851 ATTCATTAGG CTGTTGCCTC CTGGGCTGGA GCTGCGCTAA TCCTGACACC 
901 GGCTGCCAGG AGAAAACCTC ATGGATCACA CACCAAACCT TAATAACAGC 
951 ATCCGTGACC TGCACTCTCC AGTACAGAAT GGGAACCCCA GAGCTAGGAA 
1001 ATGTAGTTGT ATATTTTAAT GAACTGCTAC CCCAGCCAAA GAAGCTTCTT 
1051 TCACTTTTGT GCTCTACAGA AAGCCCAAGG GGGGTAGGAG GGACAGAGCT 
1101 TTGAATAACT GCTTTCTAAC ACTAAATGTG GCCAACAGGA CAGAGCACAT 
1151 CACACGTATA GGCAGGTGTG AGGGACAGTG GCTAAGAATT GCCTGCTCCC 
1201 TCTGCATGCT CTTTCTTGTT TCCAAAGTCC AATCAAGTGA TCCTGGGAAA 
1251 CAAATCTGTC TGGATTGCGG AGGGTGGTTC TGAAAGAACT GCCAAGACGT 
1301 TAAAGAAGGG TGAAGAGTAG GCAGAATATA AGTAGCTAAC CTGAGTCAAG 
1351 ACTCTCAAAA GCTAGCAGCC TGATGACAAT AGGATTTATT TCAGCCAGGA 
1401 TAGTGTCTGT CTGTGAGTGC ATCATTTTAA GACAGTATGA CTTCATGTTG 
1451 TTACAAACTA TGTATAGTAT GTATGTTTTG TGGGTTGTAT ATATACATAA 
1501 TATATATTAT ATATATATAT GAGAGATTTG GTGACTTTTG ATACGGGTTT 
1551 GGTGCAGGTG AATTTATTAC TGAGCCAAAT GAGGCACATA CCGAGTCAGT 
1601 AGTTGAAGTC CAGGGCATTC GATACTGTTT ATGATTTCCA TATATGTATA 
1651 GTGCCTATCC CATGCTGTAG TCACTGTTAT GTTAAATCCA GAAGTTACAC 
1701 TAGAGCCAGC GATACTTTAT TTGTAGACAA TCAATTTGAA TCCATATGTT 
1751 ATTACTGGCA GATGATACAT GATTACAGTT CTGAATCTGT AACACTTACA 
1801 AAAGGAAACC CAGAGCAGCT TGATGAGTTT TTGTTTCTGC TTCGTTCCTG 
1851 GGAGTCAGTA GAAACAGCAG TTGTATGTGG TTATGTTAGT CTCAAGATAC 
1901 TTAATTTGTT GACCTTACTT CAGAAAAATT TTGTATGTAT TATATTTGTG 
1951 GGAAGGTAAA ATAATCATTT GAGATTTTTA TCAAATATGA AGATTAGTTA 
2001 TTTATGAAAA ACAAAGAAAT GTCTATTTTT CTTTGTTCCC AATTAATGTA 
2051 GATAAATTTT AAAATGCATT AAAGTAATGG TCCGGAAAAA aaaaa 


BLAST Results 


No BLAST result 
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Medline entries 

89098903: 

Human interieukin 7: molecular cloning and growth factor 
activity on human and murine B-lineage cells. 


Peptide information for frame 2 


ORF from' 368 bp to 679 bp; peptide length: 104 
Category: similarity to known protein 


1 METSHAHESN CKIKGYGVVQ QLLHSQLCGA GEDLGPIGVS YVYSFRAVPF 
51 SLILSNASLH SLGGKDSIQV GCLKELMGPL SELADQILGN LFNFYDFPSH 
101 ILKW 

BLAST? hits 

Entry B32223 from database PIR: 
interleukin-7 precursor {clone 1) - human 

Score = 66, P « 7.0e-01, identities = 21/70, positives = 33/70 


Alert BLAST? hits for DKFZphtes3_35e21, frame 2 

PIR:B32223 interleukin-7 precursor (clone 1) - human, N = 1. Score = 
66, P = 0-72 

TREMBL:PADAL1_1 gene: "dall"; P.abies dall mRNA, N • 2, Score - 59, P 

PIR:C32223 interleukin-7 precursor (clone 4) - human, N = 1, Score = 
66, P = 0.79 

TRENBL:PRU76726_1 gene: "PrMADS3"; product: "MADS-box protein"; Pinus 
radiata MADS-box protein {PrMADS3) mRNA, conmiete cds., N = 2, Score = 
59, P - 0.94 

>PIR:B32223 interleukin-7 precursor (clone 1) - human 
Length «= 133 

HSPs: 

Score = 66 {9.9 bits). Expect = 1.3e+00, P = 7.2e-01 
Identities = 21/68 (30%), Positives * 33/68 (48%) 

Query: 39 VSyvySFRAVPFSLIL SNASLHSLGGK— DSIQVGCLKELMGPLSELADQILGNL 91 

VS+ Y F P L+L 5+ + GK +S+ + + +L+ + E+ L N 

Sbjct: 4 VSFRYIFGLPPLILVLLPVASSDCDIEGKDGKQYESVLMVSIDQLLDSMKErGSNCLNNE 63 

Query: 92 FNFYDFPSHI 101 

FNF F HI 
Sbjct: 64 FNF— FKRHI 71 

Pedant information for DKFZphtes3_35e21, frame 2 
Report for DKFZphtes3_35e21 .2 

[LENGTH! 104 

IMW) 11339:12 

(pi) 5.87 

[PROSITE] MYRISTYL 2 

(PROSITE] PKC_PHOSPHO_SITE 1 

(PROSITE] ASN GLYCOSYLATION 1 

[KW| Alpha_Beta 

SEQ METSHAHESNCKIKGYGWQQLLHSQLCGAGEDLGPIGVSYVYSFRAVPFSLILSNASLH 
PRD ccchhhhhcccccccchhhhhhhhhhhcccccccccceeeeeeeccccceeeeecccccc 
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SEQ 
PRD 


SLGGKDSIQVGCLKELMGPLSELADQILGNLFNFYDFPSHILKW 
cccccceeeccccccccccchhhhhhhhcccccccccccccccc 


Prosite for DKr2phtes3_35e21 . 2 


PSOOOOl 56->60 ASN_GLYCOSYLATION PDOCOOOOl 

PS00005 44->47 PKC_PHOSPHO_SITE PDOC00005 

PS00008 63->69 MYRISTYL PDOC00008 

PS00008 89->95 MYRISTYL PDOC00008 


(No Pfam data available for DKrZphtes3 35e21.2) 
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DKFZphtes3_35g6 
group: testes derived 

^pnf«nc^^K;:^^Hf encodes a novel 482 amino acid protein with high partial similarity to H. 
sapiens chromosome 19, cosmid R27216. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

genes^" P"texn can find application in studying the expression profile of testis-specific 

strong similarity to R27216_l 
complete cDNA, complete cds, EST hits 
Sequenced by DKFZ 
Locus: /map="15" 
Insert length: 3177 bp 

Poly A stretch at pos. 3167, polyadenylation signal at pos. 3148 

1 GGAGGCAGCG CCGGCCTCCG GAGGCGGCCT GGGCGATGGC GGCGGAGTTT 
51 TGTCCATAAC CTGGGCAACC GCGCAGCTGG AGGATGGCCT CACTCGGGCC 
101 TGCCGCAGCT GGGGAGCAGG CGTCGGGGGC TGAGGCGGAG CCGGGCCCCG 
151 CGGGGCCGCC GCCGCCGCCC TCACCGTCCT CTCTGGGGCC CCTGCTCCCC 
201 CTGCAGCGGG AACCTCTCTA CAACTGGCAG GCGACCAAGG CGTCGCTGAA 
251 GGAGCGCTTC GCCTTCCTCT TCAACTCGGA GCTGCTGAGC GATGTGCGCT 
301 TCGTACTGGG CAAGGGTCGC GGCGCCGCCG CCGCTGGGGG CCCGCAGCGC 
351 ATCCCCGCCC ACCGCTTCGT GCTGGCGGCC GGCAGCGCCG TCTTTGACGC 
401 CATGTTCAAC GGCGGCATGG CCACCACGTC GGCCGAGATC GAGCTGCCGG 
451 ACGTGGAGCC CGCAGCCTTC CTGGCGCTGC TGAGATTTCT ATATTCAGAT 
501 GAAGTTCAAA TTGGTCCAGA AACAGTTATG ACCACTCTTT ATACTGCCAA 
551 GAAATACGCA GTCCCAGCCT TGGAAGCACA CTGTGTAGAA TTTCTCACCA 
601 AACATCTTAG GGCAGATAAT GCCTTTATGT TACTTACTCA GGCTCGATTA 
651 TTTGATGAAC CTCAGCTTGC TAGTCTTTGT CTAGATACAA TAGACAAAAG 
701 CACAATGGAT GCAATAAGTG CAGAAGGGTT TACTGATATT GATATAGATA 
751 CACTCTGTGC AGTTTTAGAG AGAGACACAC TCAGTATTCG AGAAAGTCGA 
801 CTTTTTGGAG CTGTTGTACG CTGGGCAGAA GCAGAATGTC AGAGACAACA 
851 ATTACCTGTG ACTTTTGGGA ATAAACAAAA AGTTCTAGGA AAAGCACTTT 
901 CCTTAATCCG GTTCCCACTG ATGACAATTG AGGAATTTGC AGCAGGTCCT 
951 GCTCAATCTG GAATTTTGTC AGATCGTGAA GTGGTAAACC TCTTTCTTCA 
1001 TTTTACTGTC AACCCTAAAC CCCGAGTTGA ATACATTGAC CGACCAAGAT 
1051 GCTGTCTCAG GGGAAAGGAA TGCTGCATCA ATAGATTCCA GCAAGTAGAA 
1101 AGCCGCTGGG GTTACAGTGG GACGAGTGAT CGAATCAGAT TCACAGTTAA 
1151 TAGAAGGATC TCTATAGTTG GATTTGGCTT GTATGGATCT ATTCATGGCC 
1201 CTACAGATTA TCAAGTGAAT ATACAGATCA TTGAATATGA GAAAAAGCAA 
1251 ACCCTGGGAC AGAATGATAC CGGCTTTAGT TGTGATGGGA CAGCTAACAC 
1301 ATTCAGGGTC ATGTTCAAGG AACCCATAGA GATCCTGCCC AATGTGTGCT 
1351 ACACAGCATG TGCAACACTC AAAGGTCCAG ATTCCCACTA TGGCACAAAA 
1401 GGATTGAAGA AAGTAGTGCA TGAGACACCT GCTGCAAGCA AGACTGTTTT 
14 51 TTTCTTTTTT AGTTCCCCTG GCAATAATAA TGGCACTTCA ATAGAAGATG 
1501 GACAAATTCC AGAAATCATA TTTTATACAT AATTTAGCAT TATAATACAT 
1551 CTTGGCTAAA TAATACCATA CAATCTAGTG TCAAAAACAT AAATGGCCAC 
1601 AAAAAAGTAG TTTGAGTGTT ATGAATATTT AAAATTGTAA GATAAGAAAC 
1651 AGTTTCTTAG AGCAGATAGA AAAATGCTTA TTTAAATCTT TGCATGATTT 
1701 AAAAACAGAT TTTCCATTTT CTTACAACTT TAAGAGAAAA GAACTGGGTT 
1751 TAATGGTTTA AAAAAAAGCA CAGCTTTTTC ACCTTCATCT TGTATAATTT 
1801 CATAGATTGG CTGACTTAGG GTCTTTCAAT AGTTTGGGAA TTGAAAGATT 
1851 CTTGTTATAT ATAGCTAGTT TGGGTTTGTT TTTGTTTTAA CTATTTTGAA 
1901 GGTTAGGTGA GATGGGCAAA TAGGCTTAAC TATTTTGAAG GTTGGATGAA 
1951 AAGAGATGGG TCAGTATTCG TACAGAATTC TTATTAACTC AAATAACTAA 
20O1 ATTTCAGAAA ATTAAGAAGC TGACTTTATA TTTGGTGGTT TGAAGTATCT 
2051 TGTTGTTAGC ATTTGTAATA ATGCTAAAAA AGGCCTAATA AAATGCCCAA 
2101 GAAAATATTC AGTGCATTTA TAGAGAAGGA TATTTTGTAG TAGTATAGTA 
2151 ATGTGTTATG TAGTACAGTT TTAAAGCTAT AAATGGAATT TTGTGTAAAT 
2201 TCACAAAAAT GTGATATAAA CAGGATCTAA GACTGGATTC CCTGTCACTA 
2251 AACTGCACCA CTATACCTGT CTCTCTGTGT GGGGGACACT GCTGATGATT 
2301 CCCAAGATTG AGATGATGAC GGTGATGACG ACTGGGTGAA CAGCCATCAC 
2351 TTCAACATTG TGATAATCCT TCACAGCAAG AAACCGAATA AAATACTAAC 
2401 ATTTCTAACA ACTGCTCTGA CATTGTAAAG AGATCCAACA GAATCACTCC 
2451 TGCTGAAAAA TACGCTTTCT GCCACCTACA CATTTCTATT TAGGAAGTAA 
2501 AATTTGCTTC ATGGTCATGA CCCCATTAGT CAGTGTTACA GCTGTGTTGG 
2551 GGATAGGAAG TATATCTGGC AGATTGACAT TTATACACTT TTTTATAAAG 
2601 CAGATTTTAA AATATAGTAA CATCCATTTT TTTCCCTTGA AAGTGATTCT 
2651 CTTATAAAAA ATGAAAGTGG AGTTTAAGGT ATATCAAATC GTTGTGGAAG 
2701 GTGATTAAAA ATCAAAATTC TTTTAAATAT CAACTTAATT TTTTCTAAGT 
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2751 AAGATACAAA AAATTTTCAT CTAAAGTAAT ATTTCACTTT ATATTGTAAA 
2801 GAAGGTAGGT ATATTGGTGG CTGAGGTCTC TTGAAATTGC TAAAGGGAAA 
2851 TTTTTCTATG GTAATGCTCT TACGGATATA AGCCTCAGTT AAATGGAATT 
2901 ATCTATGGGA TGTGTGGTTC TGGTTAACTA AAAATTAACC AGTAAACACT 
2951 CTGTAGTAAC CATTACAGAA AATACTTCTG CCTTAAAAAA TATGATATGC 
3001 CAGAGATGAG TTAGTGTTTC TTGACGTTGG AGACCTATAA ATGCCTCATC 
3051 TGTTGTACTG AACAATTGAA ACTGCATGCA GCCATAAAAG GGACAAGAAA 
3101 CAGAACTGTT TACTAACTTT GGGACATCCC CTGGAGTTTT TAAAAATAAA 
3151 TAAATATATA TATATATAAA AAAAAAA 


Entry G37753 from database EMBL: 
SHGC-63477 Human Homo sapiens STS genomic. 
Score = 1627, P = 3.0e-66, identities = 327/329 

Entry G37752 from database EMBL: 
SHGC-63476 Human Homo sapiens STS genomic. 
Score = 1578, P « 6.2e-64, identities « 320/324 


ORF from QA bp to 1529 bp; peptide length: 482 
Category: similarity to unknown protein 


1 MASLGPAAAG EQASGAEAEP GPAGPPPPPS PSSLGPLLPL QREPLYNWQA 
51 TKASLKERFA FLFNSELLSD VRFVLGKGRG AAAAGGPQRI PAHRFVLAAG 
101 SAVFDAMPNG GMATTSAEIE LPDVEPAAFL ALLRFLYSDE VQIGPETVMT 
151 TLYTAKKYAV PALEAHCVEF LTKHLRADNA FMLLTQARLF DEPQLASLCL 
201 DTIDKSTMDA ISAEGFTDID IDTLCAVLER DTLSIRESRL FGAVVRWAEA 
251 ECQRQQLPVT FGNKQKVLGK ALSLIRFPLM TIEEFAAGPA QSGILSDREV 
301 VNLFLHFTVN PKPRVEYIDR PRCCLRGKEC CINRFQQVES RWGYSGTSDR 
351 IRFTVNRRIS IVGFGLYGSI HGPTDYQVNI QIIEYEKKQT LGQNDTGFSC 
401 DGTANTFRVM FKEPIEILPN VCYTACATLK GPDSHYGTKG LKKWHETPA 
451 ASKTVFFFFS SPGNNNGTSI EDGQIPEIIF YT 


Entry AC005306_2 from database TREMBL: 

product: "R27216_1"; Homo sapiens chromosome 19, cosmid R27216, 
complete sequence. 

Score = 1298, P = 1.9e-132, identities = 245/297, positives = 268/297 

Entry CEF38H4_9 from database TREMBLNEW: 

gene: "F38H4-7"; Caenorhabdi tis elegans cosmid F38H4 

Score = 1237, P = 5.6e-126, identities = 248/446, positives » 322/446 

Entry AC004678_1 from database TREMBL: 

product: "R34094_l**; Homo sapiens chromosome 19, cosmid R34094, 
complete sequence. 

Score = 555, P « l.De-53, identities - 112/137, positives = 123/137 


BLAST Results 


Medline entries 


No Medline entry 


Peptide information for frame 3 


BLASTP hits 


Alert BLASTP hits for DKFZphtes3_35g6, frame 3 


No Alert BLASTP hits found 


Pedant information for DKF2phtes3_35g6, frame 3 


Report for DKFZphtes3_35g6. 3 


(LENGTH! 

(MWJ 

CpIJ 


482 

52771.47 
5.79 
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Lo^?^^ . TREMBL:AC005306_2 product: "R27216_l"; Homo sapiens chromosome 19, cosmid 
R27216, complete sequence, le-142 

(BLOCKS) BL01075D Acetate and butyrate kinases family proteins 

[SUPFAM] poz domain homology 3e-08 

[SUPFAMI A55R protein middle region homology 5e-06 

tSUPFAM) A55R protein 5e-06 

(SUPFAMJ A55R protein carboxyl -terminal homology 5e-06 

{ PROS I TE J MYRISTYL 6 

[PROSITE] CAMP PHOSPHO_SITE 2 

[PROSITE) CK2 PHOSPHOSITE 9 

[PROSITE] TYR_PHOSPHO_SITE 1 

[PROSITE] PKC_PHOSPHO_SITE 7 

(PROSITE] ASN_GLYCOSYLATION 2 

[KW] Alpha_Beta 

[KW] LOW_C0MPLEXITY 11.20 % 

SEQ MASLGPAAAGEQASGAEAEPGPAGPPPPPSPSSLGPLLPLQREPLYNWQATKASLKERFA 

SEG XXXXKXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 

PRD cccccccchhhhhhhhcccccccccccccccccccccccccccccchhhhhhhhhhhhhh 

SEQ FLFNSELLSDVRFVLGKGRGAAAAGGPQRIPAHRFVLAAGSAVFDAMFNGGMATTSAEIE 
SEG xxxxxxxxxxx 

PRD hhhccccccceeeeecccccccccccccchhhhheeecccchhhhhhhhcchhhhhhhee 

SEQ LPDVEPAAFLALLRFLYSDEVQIGPETVMTTLYTAKKYAVPALEAHCVEFLTKHLRADNA 
SEG 

PRD ecccchhhhhhhhhhhhccceeechhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccch 

S EQ FMLLTQARLFDEPQIASLCLDTI DKSTMDAI SAEGFTDI DI DTLCAVLERDTLSIRESRL 
SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhccccchhhhhh 

SEQ FGAVVRWAEAECQRQQLPVTFGNKQKVLGKALSLIRFPLMTIEEFAAGPAQSGILSDREV 
SEG 

PRD hhhhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhcceeecccccccccccccchhhlih 

SEQ VNLFLHFTVNPKPRVEYIDRPRCCLRGKECCINRFQQVESRWGYSGTSDRIRFTVNRRIS 
SEG 

PRD hhhhheeeccccceeeeecccceeeccceeehhhhhhhhhccccccccccchhhhhceee 

SEQ IVGFGLYGSIHGPTDYQVNIQIIEYEKKQTLGQNDTGFSCDGTANTFRVMFKEPIEILPN 

PRD eeeccccccccccchhhhhhhcchhhhhhhhccccccccccccccceeeeeccceeeccc 

SEQ VCYTACATLKGPDSHYGTKGLKKWHETPAASKTVFFFFSSPGNNNGTSIEDGQI PEI I F 

xxxxxx 

PRD ccceeeeecccccccccccceeeeeeeccccceeeeeeeecccccccccccccccceeec 

SEQ YT 
SEG 

PRD cc 


PSOOOOl 

394->398 

PSOOOOl 

466->470 

PS00004 

357->361 

PS00004 

387->391 

PS00005 

54->57 

PS00005 

154->157 

PS00005 

234->237 

PS00005 

296->299 

PS00005 

348->351 

PS00005 

4O6->409 

PS00005 

428->431 

PS00006 

14->18 

PS00006 

54->58 

PS00006 

115->119 

PS00006 

206->210 

PS00006 

217->221 

PS00006 

234->238 

PS00006 

2Bl->285 

PS00006 

296->300 

PS00006 

468->472 

PS00007 

430->437 

PS00008 

80->86 

PS00008 

110->116 

PSOO0O8 

365->371 


Prosite for DKFZphtes3_35g6.3 

ASN GLYCOSYLATION PDOCOOOOl 

ASN^GLYCOSYLATION PDOCOOOOl 

CAMP_PHOSPHO_SITE PDOCO0004 

CAMP_PHOSPHO_SITE PDOC00004 

PKC_PHOSPHO_SITE PDOC00005 

PKC_PHOSPHO_SITE PDOC00005 

PKC PHOSPHO SITE PDOC000Q5 

PKC^PHOSPHO~SITE PDOC00005 

PKC_PHOSPHO_SITE PDOC00005 

PKC_PHOSPHO_SITE PDOC00005 

PKC_PHOSPHO_SITE PDOC00005 

CK2_PH0SPH0_SITE PDOC00006 

CK2 PHOSPHO_SITE PDOC00006 

CK2~PHOSPHO_SITE PDOC00006 

CK2_PH0SPH0_SITE PDOCO0006 

CK2 PHOSPHO_SITE PDOC00006 

CK2^PH0SPH0_SITE PDOC00006 

CK2_PHOSPHO_SITE PDOC00006 

CK2_PH0SPH0_SITE PDOC00006 

CK2__PHOSPHO_SITE PDOC00006 

TYR_PHOSPHO"siTE PDOC00007 

MYRTSTYL PDOC00008 

MYRISTYL PDOC00008 

MYRISTYL PDOC00008 
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PS00008 392->398 MYRISTYL PDOC00008 

PS00008 402->408 MYRISTYL PDOC00008 

PS0O0O8 463->469 MYRISTYL PDOC00008 

{No Pfam data available for DKFZphtes3_35g6. 3) 
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DKF2phtes3_35kl6 


group: metabolism 

DKFZphtes3_35kl6 encodes a novel 666 amino acid protein with weak similarity to fatty acid-CoA 
synthetaseses/ligases. • 

The novel protein contains a putative AMP-binding domain signature, which is present in 
enzymes, which act via an ATP-dependent covalent binding of AMP to their substrate. This 
domain is found m several CoA synthetases, such as acetate-CoA ligase (EC 6,2.1.1), lona- 
chain-fatty-acid-CoA ligase {EC 6.2.1.3), 

substrate'^°^ ligase. Therefore it is a new fatty acid-CoA synthetasese/ligase with unknown 

The new protein can find application in modulation of fatty acid metabolism and as a new 
enzyme for biotechnologic production processes. 


similarity to acyl-CoA synthetase 

complete cDNA, complete cds, potential start codon at Bp 50, 
few EST hits, seems to be a testis specific cDNA, 
5 of 6 EST hits are from testis derieved librarys 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 2520 bp 

Poly A stretch at pos. 2510, polyadenylation signal at pes. 2490 


1 CAGATGTCCC AGCTCCAGTG CTGTGGAGCA TGGTTTCTGC ACACCTGGAA 
51 TGACTGGAAC CCCAAAGACT CAAGAAGGAG CTAAAGATCT TGAAGTAGAC 
101 ATGAATAAAA CAGAAGTTAC TCCCAGGCTG TGGACCACCT GTCGAGATGG 
151 AGAAGTCCTT CTGAGGCTAT CCAAACACGG ACCAGGCCAT GAGACCCCGA 
201 TGACCATCCC TGAATTTTTT CGAGAGTCAG TCAACCGATT TGGAACTTAT 
251 CCAGCCCTCG CATCCAAGAA TGGCAAAAAG TGGGAAATTC TGAATTTCAA 
301 CCAGTACTAT GAGGCTTGTC GGAAGGCTGC AAAATCCTTG ATCAAGCTGG 
351 GTTTGGAGCG TTTCCACGGA GTTGGTATCC TGGGGTTTAA CTCTGCAGAG 
401 TGGTTTATCA CTGCTGTTGG TGCCATCCTA GCCGGGGGTC TTTGTGTTGG 
451 TATTTATGCC ACCAACTCTG CCGAGGCTTG TCAATATGTC ATCACTCATG 
501 CCAAAGTGAA CATCTTGCTG GTTGAGAATG ATCAACAGTT ACAGAAAATC 
551 CTTTCGATTC CACAGAGCAG CCTAGAGCCC CTAAAAGCGA TCATCCAGTA 
601 CAGACTGCCA ATGAAGAAGA ACAACAACTT GTACTCTTGG GATGATTTCA 
651 TGGAACTTGG CAGAAGTATC CCTGACACCC AACTGGAGCA GGTCATCGAG 
•701 AGCCAGAAGG CGAATCAATG CGCAGTGCTC ATCTACACTT CAGGGACCAC 
751 AGGCATACCC AAGGGAGTGA TGCTCAGTCA TGACAACATC ACGTGGATTG 
801 CAGGAGCAGT GACAAAGGAC TTTAAACTGA CAGACAAGCA TGAGACGGTG 
851 GTTAGCTACC TCCCACTCAG CCATATTGCA GCACAGATGA TGGACATCTG 
901 GGTACCCATA AAGATTGGGG CGCTCACATA CTTTGCTCAA GCAGATGCTC 
951 TCAAGGGCAC CTTGGTAAGT ACTCTAAAGG AGGTAAAACC TACTGTCTTC 
1001 ATTGGAGTGC CTCAAATTTG GGAGAAGATA CATGAGATGG TGAAGAAAAA 
1051 TAGTGCCAAG TCCATGGGCT TGAAGAAGAA GGCATTCGTG TGGGCAAGAA 
1101 ACATTGGCTT CAAGGTCAAC TCAAAAAAGA TGTTGGGGAA ATATAATACT 
1151 CCCGTGAGCT ACCGCATGGC TAAGACTCTC GTGTTCAGCA AAGTCAAGAC 
1201 ATCCCTTGGC TTGGATCACT GTCACTCTTT TATCAGTGGG ACTGCGCCCC 
1251 TCAACCAAGA GACTGCCGAG TTCTTTCTAA GCTTGGACAT ACCTATAGGC 
1301 GAGTTGTATG GGTTGAGTGA GAGCTCGGGA CCCCACACGA TATCCAACCA 
1351 GAATAACTAC AGGCTTCTAA GCTGTGGCAA GATCTTGACT GGGTGTAAGA 
1401 ATATGCTGTT CCAGCAGAAC AAGGATGGCA TTGGGGAGAT CTGCCTCTGG 
1451 GGTAGGCACA TCTTCATGGG CTATCTGGAA AGTGAGACTG AAACTACAGA 
1501 GGCCATCGAT GATGAAGGCT GGCTACACTC TGGGGATCTG GGCCAGCTGG 
1551 ACGGTCTGGG TTTCCTCTAT GTCACCGGCC ACATCAAAGA AATCCTTATC 
16C1 ACTGCTGGTG GTGAAAATGT GCCCCCCATT CCTGTTGAGA CCTTGGTTAA 
1651 GAAGAAGATC CCCATCATCA GTAACGCCAT GTTAGTAGGA GATAAACTGA 
1701 AGTTTCTGAG CATGTTGCTG ACGCTGAAGT GTGAGATGAA TCAGATGAGC 
1751 GGAGAACCTC TGGACAAGCT GAACTTCGAG GCCATCAACT TCTGTCGGGG 
1801 TCTGGGCAGC CAGGCATCCA CCGTGACTGA GATGGTGAAG CAGCAAGACC 
1851 CCCTGGTCTA CAAGGCCATC CAGCAAGGCA TCAATGCTGT GAACCAGGAA 
1901 GCCATGAACA ATGCACAGAG GATTGAAAAG TGGGTCATCT TGGAGAAGGA 
1951 CTTTTCCATC TATGGTGGAG AGCTAGGTCC AATGATGAAA CTTAAGAGAC 
2001 ATTTTGTAGC CCAGAAATAC AAAAAACAAA TTGATCACAT GTACCACTGA 
2051 CTGCTTTGAT GGAGCTGCTC TCAGCTGTTC TGATGCCTTC AGCAGGAAGA 
2101 CCTCATTGCA ATAAGTGAAA TGCTGCTCTA GGTAGAAGCT CTCCCTGCTG 
2151 TTTTTAAGAA GCCACATTCC TCATTGGTCA GTTTCTTGAT TGTTCGTCTG 
2201 TTGGAGAGGT GCTCCCTAGA AGAACCTGCC ATACGTTTCA AAGCAATAAA 
2251 ATCACTGTAT ATCTTTCTAA GGACCTTCAA GTCATGACTC CAGGGAAGCC 
2301 TATTGGGAAG TCTACTAAAA ACTGCCTGAT TTACAAGAAA GACCTGAACT 
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2351 TGTGGGCTCC CATTTGATTT TTTTCTCCTC AGGGGACTCA GACATTAGAA 
2401 AGAAAAAGCC TCACAGATTT GAAGAACTGG ACCCCCAAAT CAACTCACCT 
2451 GCCTGGAAGC AACTGGGAAA CCCTTCCAAT AAGTCCTGAT AATAAAGCAC 
2501 TTCAGGGTCC AAAAAAAAAA 


BLAST Results 


No BLAST result 


Kedline entries 


No Medline entry 


Peptide information for frame 2 


ORF from 50 bp to 2047 bp; peptide length: 666 
Category: similarity to known protein 


1 MTGTPKTQEG AKDLEVDMNK TEVTPRLWTT CRDGEVLLRL SKHGPGHETP 
51 MTIPEFFRES VNRFGTYPAL ASKNGKKWEI LNFNQYYEAC RKAAKSLIKL 
101 GLERFHGVGI LGFNSAEWFI TAVGAILAGG LCVGIYATNS AEACQYVITH 
151 AKVNILLVEN DQQLQKILSI PQSSLEPLKA IIQYRLPMKK NNNLYSWDDF 
201 MELGRSIPDT QLEQVIESQK ANQCAVLIYT SGTTGIPKGV MLSHDNITWI 
251 AGAVTKDFKL TDKHETWSY LPLSHIAAQM MDIWVPIKIG ALTYFAQADA 
301 LKGTLVSTLK EVKPTVFIGV PQIWEKIHEM VKKNSAKSMG LKKKAFVWAR 
351 NIGFKVNSKK MLGKYNTPVS YRMAKTLVFS KVKTSLGLDH CHSFISGTAP 
401 LNQETAEFFL SLDIPIGELY GLSESSGPHT ISNQNNYRLL SCGKILTGCK 
451 NMLFQQNKDG IGEZCLWGRH IFMGYLESET ETTEAIDDEG WLHSGDLGQL 
501 DGLGFLYVTG HIKEILITAG GENVPPIPVE TLVKKKIPII SNAMLVGDKL 
551 KFLSMLLTLK CEMNQMSGEP LDKLNFEAIN FCRGLGSQAS TVTEMVKQQD 
601 PLVYKAIQQG INAVNQEAMN NAQRIEKWVI LEKDFSIYGG ELGPMMKLKR 
651 HFVAQKYKKQ IDHMYH 

BLASTP hits 

NO BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_35kl6, frame 2 

TREMBL:AB014 531_1 gene: "KIAA0631"; product: "KlAA063i protein"; Homo 
sapiens mRNA for KIAA0631 protein, partial cds., N = 1, Score = 1641, P 
= 8.9e-169 

PIR:E70937 probable fadDlS - Mycobacterium tuberculosis (strain H37RV) , 
N « 2, Score » 532, P = 3.6e-62 

PIR:H64041 long-chain- fatty-acid — CoA ligase homolog - Haemophilus 
influenzae (strain Rd KW20), N = 2, Score " 486, P 6.5e-59 


>TREMBL:AB014 531_1 gene: "KIAA0631-; product: "KIAA0631 protein"; Homo 
sapiens mRNA for KIAA0631 protein, partial cds. 
Length = 634 

HSPs: 

Score = 1641 (246.2 bits). Expect = 8.9e-169, P = 8.9e-169 
Identities = 319/628 (50%), Positives « 440/628 (70%) 

Query: 38 LRLSKHGPGHETPMTIPEFFRESVNRFGTYPALASKNGKKWEILNFNQYYEACRKAAKSL 97 

LR+ P + P T+ F E+++++G AL K KWE ++++QYY R+AAK 
Sbjct: 2 LRIDPSCP--QLPYTVHRMFYEALDKYGDLIALGFKRQDKWEHISYSQYYLLARRAAKGF 59 

Query: 98 IKLGLERFHGVGILGFNSAEWFITAVGAILAGGLCVGIYATHSAEACQYVITHAKVNILL 157 

+KLGL++ H V ILGFNS EWF +AVG + AGG+ GIY T+S EACQY+ N+++ 
Sbjct: 60 LKLGLKQAHSVAILGFNSPEWFFSAVGTVFAGGIVTGIYTTSSPEACQYIAYDCCANVIM 119 

Query: 158 VENDQQLQKILSIPQSSLEPLKAIIQYRLPM-KKNNNLYSWDDFMELGRSIPDTQLEQVI 216 

V+ +QL+KIL I L LKA++ Y+ P K N+Y+ ++FMELG +P+ L+ +1 
Sbjct: 120 VDTQKQLEKILKI-WKQLPHLKAVVIYKEPPPNKMANVYTMEEFMELGNEVPEEALDAII 178 

Query: 217 ESQKANQCAVLXYTSGTTGIPKGVMLSHDNITWIA— GAVTKDFKLTD-KHETWSYLPL 273 
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Sbjct: 

179 

Query: 

• 274 

Sbjct: 

239 

Query: 

334 

Sbjct: 

299 

Query: 

394 

Sbjct: 

358 

Query: 

454 

Sbjct; 

418 

Query: 

514 

Sbjct: 

478 

Query: 

574 

Sbjct: 

538 

Query: 

634 

Sbjct: 

598 


++Q+ NQC VL+YTSGTTG PKGVMLS DNITW A G+ D + + + E VVSYLPL 


SHIAAQ+ D+W 1+ GA FA+ DALKG+LV+TL+EV+PT +GVP++WEKI E +++ 


+A+S +++K +WA ++ + N 


P + R+A LV +KV+ +LG 


G AP+ ET FFL L+I + YGLSE+SGPH +S+ NYRL S GK++ GC+ L 


Q+ +GIGEICLWGR IFMGYL E +T EAID+EGWLH+GD G+LD GFLY+TG +K 


E++ITAGGENVPP+P+E VK ++PI ISNAML+GD+ KFLSMLLTLKC ++ + + 


L +A+ FC+ +GS+A+TV+E+++++D VY+AI++GI VN A 


DFSI GGELGP MKLKR V +KYK ID Y 


I+KW ILE+ 


Pedant information for DKFZphtes3_35kl6, frame 2 


Report for DKF2phtes3_35Jcl6.2 


t LENGTH] 
[MW] 
[pD 
[HOMOL} 
mRNA for 
[FUNCATl 
[FUNCAT] 
[FUNCAT] 
[FUNCAT J 
[FUNCAT] 
2e'-29 
(FONCAT] 
2e-23 
[FUNCAT) 
palmityla 
[BLOCKS] 
[SCOPl 
[EC) 
[EC) 
[EC] 
[EC] 
[PIRKW] 
[PIRKW] 
[PIRKW J 
[PIRKW] 
(PIRKW) 
(PIRKW) 
[PIRKW] 
[PIRKW J 
[PIRKW] 
[PIRKW J 
(PIRKW) 
[PIRKW] 
[PIRKW] 
{ PIRKW] 
[PIRKW] 
[PIRKW] 
tSUPFAM] 
ISUPFAM) 
[SDPFAM] 
ISUPFAM] 
(SUPFAM] 
(SUPFAM) 


product: "KIAA0631 protein'*; Homo sapiens 


666 

74344.97 
8. 67 

TREMBL:AB014531_1 gene: "KIAA0631" 
KIAA0631 protein, partial cds. le-176 

i lipid metabolism (H. influenzae, HI0002] 2e-55 

08.10 peroxisomal transport (S. cerevisiae, YER015w) 2e-29 

30.19 peroxisomal organization [S. cerevisiae, YER015w] 2e-29 

01.06.13 lipid and fatty-acid transport (S. cerevisiae, YEROlSwJ 2e-29 

01.06.07 lipid, fatty-acid and sterol utilization [S. cerevisiae, YEROlSw) 

01.06.01 lipid, fatty-acid and sterol biosynthesis [S, cerevisiae, YMR246w) 

06.07 protein modification (glycolsylation, acylation, myristylation, 
tion, farnesylation and processing) [S. cerevisiae, YMR246w] 2e-23 

BL00455 

dllci 5.19.1.1.1 Luciferase [Firefly (Phontinus pyralis) le-49 

1.13.12.7 Photinus-luciferin 4-monooxygenase (ATP-hydrolysing) 9e-17 
6.2.1.3 Long-c)iain- fatty-acid — CoA ligase 4e-34 

5.1.1.11 Phenylalanine racemase (ATP-hydrolysing) 6e-08 

6.2.1.12 4-Coumarate — CoA ligase 8e-18 
duplication 6e-07 
phosphopantetheine 3e-12 
multifunctional enzyme 3e-06 
ligase 6e-08 
acid-thiol ligase 4e-34 
transmembrane protein 5e-22 
monooxygenase 9e-17 
hydrolase 4e-34 
peroxisome 9e-15 
antibiotic biosynthesis 3e-12 
isomerase 6e-08 
flavonoid biosynthesis le-17 
magnesium 9e-15 
ATP 5e-22 

oxidoreductase 9e-17 
liver 2e-31 

alpha-aminoadipyl-cysteinyl -valine synthetase 3e-07 
human long-chain- fatty-acid — CoA ligase 4e-34 
gramicidin S synthetase I 6e-08 
peptide synthetase ppsE 7e-06 

gramicidin S synthetase I repeat homology 3e-12 
peptide synthetase ppsD 2e-07 
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(SUPFAM] probable acyl-CoA ligase medium chain 2e-09 

[SUPFAM] acetate--CoA ligase 8e-10 

(SUPFAM] acetate— CoA ligase homology 4e-54 

{SUPFAM] surfactin synthetase 3e-12 

[SUPFAM] 4-coumarate — CoA ligase 8e-18 

(SUPFAM] short-chain alcohol dehydrogenase homology 8e-07 

(SUPFAM] acyl carrier protein homology 2e-29 

[PROSITE] MYRISTYL 12 

[PROSITEJ AMP_BINDING 1 

[PROSITE] AMIDATION 1 

(PROSITEJ CAMP_PHOSPHO_SITE 1 

(PROSITEJ CK2_PH0SPH0_SITE 9 

(PROSITEJ TYR_PHOSPHO_SITE 3 

(PROSITE] . PKC_PHOSPHO_SITE 10 

(PROSITE) ASN_GLYC0SYLATION 2 

[PE'AM] AMP-binding enzymes 

[KW] Irregular 

[KWJ 3D 

[KW] LOW_COMPLEXITY 1.80 % 

SEQ MTGTPKTQEGAKDLEVDMNKTEVTPRLWTTCRDGEVLLRLSKHGPGHETPMTIPEFFRES 
SEG 


llci- 


SEQ VNRFGTYPALASKNGKKWEILNFNQYYEACRKAAKSLIKLGLERFHGVGILGFNSAEWFI 

SEG 

iici- !!!!!!!!!!!!!!!!!!!!!!! 

SEQ TAVGAILAGGLCVGIYATNSAEACQYVITHAKVNILLVENDQQLQKILSIPQSSLEPLKA 

SEG 

iici- !!!!!!!!!!!!!!! 

SEQ IIQYRLPMKKNNNLYSWDDFMELGRSIPDTQLEQVIESQKANQCAVLIYTSGTTGIPKGV 

SEG , 

iici- i i ] 

SEQ MLSHDNITWIAGAVTKDFKLTDKHETWSYLPLSHIAAQMMDIWVPIKIGALTYFAQADA 

SEG 

iici- 

SEQ LKGTLVSTLKEVKPTVFIGVPQIWEKIHEMVKKNSAKSMGLKKKAFVWARNIGFKVNSKK 

SEG 

iici- iiiliil!!!!!! !!!!!!!!!!!!!! !!!!!!! 

SEQ MLGKYNTPVSYRMAKTLVFSKVKTSLGLDHCHSFISGTAPLNQETAEFFLSLDIPIGELY 

SEG 

Iici- TTTTCEEETTTTCCCHHHHHHKHHHCCCCBCEE 


SEQ GLSESSGPHTISNQNNYRLLSCGKILTGCKNMLFQQNKDGIGEICLWGRHin-IGYLESET 

SEG 

llci- ECGGGTTEEEECCCCCCEEEEETTTTEEEEETTTTTCEETTEEEEEErrTTTCCEETT^ 

SEQ ETTEAIDDEGWLHSGDLGQLDGLGFLYVTGHIKEILITAGGENVPPIPVETLVKKKIPII 

SEG xxxxxxxxxxxx 

llci- HHHHHBTTTTCEEEEEEEEETTTTCEEE ECEEETTEETCHHHHHHHHHHT-TTE 

SEQ SNAMLVGDKLKFLSMLLTLKCEMNC»fSGEPLDKLNFEAINFCRGLGSQASTVTEMVKQQD 

llci- EEEEEEE ,* 

SEQ PLVYKAIQQGINAVNQEAMNNAQRIEKWVILEKDFSIYGGELGPMMKLKRHFVAQKYKKQ 

SEG 

llci- i i ! i i i !!!!! i !!!!!!!!! ! 

SEQ IDHMYH 

SEG 

llci- 


Prosite for DKF2phte53_35kl6.2 


PSOOOOl 
PSOOOOl 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 


19->23 
246->250 
332->336 
4->7 
24->27 
30->33 
218->221 
261->264 


ASN_GL YCOS YLAT I ON 
ASN GLYCOSYLATION 
CAMP_PHOSPHO_S ITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC_PHOS PHOS I TE 
PKC_PHOS PHO_S ITE 
PKC PHOSPHO SITE 


PDOCOOOOl 
PDOCOOOOl 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
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PS00005 

308->311 

PKC_PHOSPHO 

SITE 

PDOC00005 

PS00005 

335->338 

PKC_PHOSPH0^ 

'site 

PDOC00005 

^ f\ f\ f\ r\ r- 

PS00005 

358->36l 

PKC_PHOSPHO] 

'site 

PDOC00005 

PS00005 

370->373 

PKC_PHOSPHO" 

"site 

PDOC00005 

PS00005 

558->561 

PKC_PHOSPHO" 

'site 

PDOC00005 

PS00006 

30->34 

CK2 PHOSPHO' 

"site 

PDOC00006 

PS00006 

52->56 

CK2_PH0SPH0~ 

"site 

PDOC00006 

PS00006 

173->177 

CK2_PH0SPH0* 

"site 

PDOC00006 

PS00006 

196->200 

CK2_PH0SPH0* 

"site 

PDOC00006 

PS00006 

206->210 

CK2 PHOSPHO' 

'site 

PDOC00006 

PS00006 

210->214 

CK2 PHOSPHo" 

"site 

PDOC00006 

PS00006 

308->312 

CK2 PHOSPHO" 

'site 

PDOC00006 

PS00006 

478->482 

CK2 PHOSPHO' 

SITE 

PDOC00006 

PS00006 

591->595 

CK2 PHOSPHO 

SITE 

PDOC00006 

PS00007 

659->666 

TYR PHOSPHO 

SITE 

PDOC00007 

PSOO0O7 

658->666 

TYR PHOSPHO'" 

SITE 

PDOC00007 

PS00007 

597->605 

TYR_PHOSPH0" 

SITE 

PDOC00007 

PS00008 

3->9 

MYRISTYL 


PDOC00008 

PS00008 

65->71 

MYRISTYL 


PDOC00008 

PS00008 

124->130 

MYRISTYL 


PDOC00008 

PS00008 

130->136 

MYRISTYL 


PDOC00008 

PS00008 

134->140 

MYRISTYL 


PDOC00008 

PS00008 

235->241 

MYRISTYL 


PDOC00008 

PS00008 

239->245 

MYRISTYL 


PDOC00008 

PS00008 

303->309 

MYRISTYL 


PDOC00008 

PS00008 

387->393 

MYRISTYL 


PDOC00008 

PS00008 

421->427 

MYRISTYL 


PDOC00008 

PS00008 

498->504 

MYRISTYL 


PDOC00008 

PS00008 

586->592 

MYRISTYL 


PDOC00008 

PS00009 

74->78 

AMIDATION 


PDOC00009 

PS00455 

227->239 

AMP BINDING 


PDOC00427 


Pfam for DKFZphtes3_35kl6. 2 


HMM_NAME AMP-binding enzymes 

HMM *TyRELNERANRLARHLRsekGIrPGDiVgIMMDRSMWMIVaMLGIWKAG 
+ + +E +A L+ +G VGI+ +S + G + AG 

Query 82 NFNQYYEACRKAAKSLI-KLGLERFHGVGILGFNSAEWFITAVGAILAG 129 

HMM GAYVPIDPeYPdERIqYMLEDSGArLLITQrh HmqRIPdemwwvdH 

G +V I +E QY++ ++ + +L+++ + + IP++++ + 

Query 130 GLCVGIYATNSAEACQYVITHAKVNILLVENDQQLQKILSIPQSSLEPLK 179 

HMM liviDWe WddlWWHedeeNpqpWvdPeDLAYIIY 

+I++ + + ++++ + E ++ ++++ A +IY 

Query 180 AIIQYRLPMKKNMNLYSWDDFMELGRSIPDTQLEQVIESQKANQCAVLIY 229 

HMM TSGTTGKPKGVMIEHrNIvNycqWMnWRYgMteeDDRILWFtSDpYWFDa 
TSGTTG PKGVM++H NI+ + +++ +T+ +++ + + ++ A 
Query 230 TSGTTGIPKGVMLSHDNITWIAGAVTKDFKLTDKHETWSYLP-LSHIAA 278 

HMM SVHDMFHpLLnGaTLYIpPeEtRrDPerWWqYIqRHglTWWylTPSHFRM 

+++D++ P+ GA Y + ++ + ++++ ++T+ ++P +++ 

Query 279 QMMDIWVPIKIGALTYFAQADAL—KGTLVSTLKEVKPTVFIGVPQIWEK 326 

HMM LMpd 

+ + 

Query 327 IHEMVKKNSAKSMGLKKKAFVWARNIGFKVNSKKMLGKYNTPVSYRMAKT 376 

HMM psLRh VMFgGEpLs PehWdWWRk r f g f kgRI INMYWPT 

++ + +++G PL++E+++ ++ + ++I Y+ + 
Query 377 LVFSKVKTSLGLDHCHSFISGTAPLNQETAEFFL-SLD— IPIGELYGLS 423 

HMM ETTVWtTwMrliPdepeqWrwiPIGRPIpNTqWYIMDdnMQlQPiGViGE 
E++ T+ + + R +++G+ + + + + +N G IGE 

Query 424 ESSGPHTISNQNN— Y RLLSCGKILTGCKNMLFQQN KDG-IGE 463 

HMM LYIgGWPGVARGYWNRPELTEERFipNPFWPGEYRrGWNrRMYRTGDLAR 
+++ G ++ GY+ + +T E+ + ++ ++GDL++ 
Query 464 ICLWG-RHIFMGYLESETETTEAIDDEGW LHSGDLGQ 499 

HMM WlPDGnlEYLGRID . DQVKIRGYRIELGEIEhqLr . qHPglqEAW* 

+ G+++ G I + G+++ + +E+ + ++P 1+ A 
Query 500 LDGLGFLYVTGHIKEILITAGGENVPPIPVETLVKKKIPIISNAML 545 
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DKFZphtes3_35k24 


group: transmembrane protein 

DKr2phtes3_35k24 encodes a novel 514 amino acid protein without similarity to known proteins. 

The novel protein contains 5 transmembrane regions. 

No informative BLAST results; No predictive prosite, pfara or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes and as a new marker for testicular cells. 


unknown ; 

membrane regions: 5 

Summary DKFZphtes3_35k24 encodes a novel 514 amino acid protein. 
No homolouges found in bacteria yeast and C.elegans, specific for 
mammalians? 


unknown 

complete cDNA, complete cds, few EST hits 

Sequenced by DKF2 

Locus : unknown 

Insert length: 2706 bp 

Poly A stretch at pos. 2696, polyadenylation signal at pos. 2675 


1 CCGTGTGCAG TCGCCCCGCG CCCCGCGCGA CCCTTCGGGT AAACTACGAA 
51 CTGGGAGTTC TGAAGAATGG GTAAAGACTT TCGTTACTAT TTCCAGCATC 
101 CCTGGTCTCG CATGATTGTG GCTTACTTGG TGATCTTCTT TAACTTCTTA 
151 ATATTTGCGG AGGACCCAGT TTCTCATAGC CAAACAGAAG CCAATGTTAT 
201 TGTTGTTGGA AACTGTTTTT CATTTGTTAC AAATAAATAC CCTAGAGGAG 
251 TTGGCTGGAG GATTTTGAAG GTGCTTCTAT GGCTACTTGC CATTCTCACA 
301 GGACTAATAG CTGGCAAATT TCTGTTCCAT CAGCGTTTGT TTGGTCAGTT 
351 GCTCCGATTA AAAATGTTTC GAGAAGATCA TGGGTCGTGG ATGACAATGT 
401 TCTTCAGCAC AATTCTCTTT CTCTTCATAT TTTCTCACAT ATACAACACG 
451 ATTCTTCTAA TGGATGGGAA CATGGGAGCA TATATCATTA CAGACTATAT 
501 GGGCATCCGA AATGAAAGTT TCATGAAATT AGCTGCAGTA GGGACCTGGA 
551 TGGGGGACTT TGTCACAGCT TGGATGGTCA CTGATATGAT GCTTCAGGAC 
601 AAACCCTATC CTGACTGGGG AAAATCAGCA AGAGCTTTCT GGAAGAAAGG 
651 AAATGTTAGG ATCACTTTAT TCTGGACAGT TCTTTTTACT CTGACGTCTG 
701 TGGTTGTACT TGTGATTACA ACGGACTGGA TCAGCTGGGA CAAGCTGAAT 
751 CGGGGATTTT TGCCCAGTGA TGAAGTTTCC AGAGCATTCC TTGCTTCTTT 
801 TATCTTGGTC TTTGACCTTC TTATTGTGAT GCAGGACTGG GAATTCCCAC 
851 ATTTCATGGG AGATGTTGAT GTAAATCTCC CTGGTTTGCA CACCCCTCAC 
901 ATGCAGTTCA AGATTCCTTT CTTCCAGAAA ATCTTCAAGG AGGAATATCG 
951 TATTCACATA ACAGGCAAAT GGTTTAACTA TGGAATTATC TTCCTCGTCT 
1001 TGATTTTGGA TCTTAATATG TGGAAGAACC AAATATTTTA TAAACCTCAT 
1051 GAATATGGGC AATATATCGG CCCGGGGCAG AAGATATATA CAGTGAAAGA 
1101 CTCAGAAAGT TTAAAAGATT TGAACAGAAC CAAGCTATCC TGGGAATGGA 
1151 GGTCCAATCA CACTAACCCT CGGACTAATA AAACATATGT TGAGGGAGAC 
1201 ATGTTCTTAC ACAGCAGGTT CATAGGAGCC AGTCTTGATG TCAAGTGTCT 
1251 GGCCTTTGTT CCAAGCCTGA TAGCCTTTGT GTGGTTTGGA TTCTTTATTT 
1301 GGTTCTTTGG ACGATTTTTG AAAAATGAGC CACGCATGGA GAATCAAGAC 
1351 AAAACTTACA CTCGCATGAA AAGAAAATCT CCATCAGAAC ATAGCAAAGA 
1401 CATGGGAATC ACTCGAGAAA ACACCCAGGC TTCAGTAGAA GACCCCTTGA 
1451 ATGACCCTTC TTTGGTTTGC ATCAGGTCTG ACTTCAATGA GATCGTCTAC 
1501 AAGTCTTCCC ACCTAACCTC GGAAAACTTG AGCTCACAGT TGAACGAATC 
1551 TACTAGTGCA ACAGAAGCTG ATCAAGACCC AACGACTTCT AAAAGTACAC 
1601 CTACGAACTA GACTCGGAGA TAGACTTGGA GATAACACAA AAAGCAACCT 
1651 TGAGTGTAAC TTTAAAAATT TAGTCTTTCC TTTTGTATAT GTAAGGTTTA 
1701 CGTAGTGTTA GGTAAAAATA TGAACAATGC CACAACGGTG CTCAACATGC 
1751 TTTTTCTAGG ATTCATTGTT TTCTATTTGT ATTATAATAC ACGTGCCTAC 
1801 TGTATACTCA ACAGTCCTCT AGAGATTGCT TTTCACAATT GCACAAGCTA 
1851 TTACTGACTT TACAGCATAG TGGAAGATTA GCTGATGACC CATGTATCTG 
1901 ATGTTCAACC ATAGTGGTGC CTTGAGACAT TAAACTGTTT TTAACTGTAC 
1951 CAGAAATGAA GTGTGGAACA GTTACCTAAC CTATTTCACA TGGGCGTTTT 
2001 GTATACAACT ATTTTGATCT ACACTTGATG TCTGAGCAGA AAACAGAAAT 
2051 AGCTAAATGT GACTCAGGAA GTATCTCTTG GTTTCTTATT CAGCAGCAGA 
2101 GTTGGTGACT TTGACAACTG GACTGCAGAG AAACATGGTG ATCACCTTTT 
2151 AATTTTTATT GGCTGTCTGC CAAATATAAA TACAGATGCA AAATTCAGTA 
2201 ATAGGAGATC CATAACCCAA CATGGGTCAC TACTCGTGAA ATGTGACTTT 
2251 CTCCCACCAG TAATTGAAAT GAGGTGATGA TACCTAATTA TGTTTTCCTA 
2301 ATTAAAGATA AATTGCTACT TGATTAAAAA TCCTGCCCTT CACCTTTGGG 
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2351 AACAAAGGTT AAGAGACACA GTTGGGCGAA CTCTCAAATT TATTGGCATT 

2401 TACACAAAGT CCCAGACAAC CAAGGAACTG AAGTTTTCAT CATATGAGAG 

24 51 CAGCACATCC CACCATTTAC AATATTCGTA TATCTTTCTG CAAATATGGC 

2501 TCTGGATAGT GAAAATTGAA AAACATATGC CAACCCTGAG CAAGGGAACT 

2551 CCTCAAAAAA TCATGCAGCG GAACCTTGTC AGGTAGAGAA GCCGTGCATG 

2 601 AAAGAATTTG TTTAATGTCT TGTTTTGCGT ATGTGTTTTT TGTTTTTGTT 

2651 TTTTAAGAAC TAAATATTGC ACATTAATAA ATAAGAATTA TACAGCAAAA 

2701 AAAAAA 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 1 


ORF from 67 bp to 1608 bp; peptide length: 514 
Category: putative protein 


1 MGKDFRYYFQ HPWSRMIVAY LVIFFHFLIF AEDPVSHSQT EANVIWGNC 
51 FSFVTHKYPR GVGWRILKVL LWLLAILTGL lAGKFLFHQR LFGQLLRLKM 
101 FREDHGSWMT MFPSTILFLF IFSHIYNTIL LKDGNMGAYI ITDYMGIRNE 
151 SFMKLAAVGT WMGDFVTAWM VTDMMLQDKP YPDWGKSARA FWKKGNVRIT 
201 LFWTVLFTLT SWVLVITTD WISWDKLNRG FLPSDEVSRA FLASFILVFD 
251 LLIVMQDWEF PHFMGDVDVN LPGLHTPHMQ FKIPFFQKIF KEEYRIHITG 
301 KWFNYGIIFL VLILDLNMWK NQIFYKPHEY GQYIGPGQKI YTVKDSESLK 
351 DLNRTKLSWE WRSNHTNPRT NKTYVEGDMF LHSRFIGASL DVKCLAFVPS 
401 LIAFVWFGFF IWFFGRFLKN EPRMENQDKT YTRMKRKSPS EHSKDMGITR 
451 ENTQASVEDP LNDPSLVCIR SDFNEIVYKS SHLTSENLSS QLNESTSATE 
501 ADQDPTTSKS TPTN 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKrzphtes3_35k24, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_35k24, frame 1 


Report for DKFZphtes3_35k24 . 1 


[LENGTH J 

514 


(MWJ 

60185.03 


[pI] 

8.67 


CPROSITE] 

MYRISTYL 5 


[PROSITE] 

CAMP PHOSPHO SITE 

1 

(PROSITE] 

CK2 PHOSPHO SITE 

8 

(PROSITE) 

TYR PHOSPHO SITE 

1 

[PROSITE) 

PKC""PHOSPHO SITE 

7 

(PROSITE) 

ASN^GLYCOSYLATION 

6 

(KW) 

SIGNAL PEPTIDE 32 


[KWl 

TRANSMEMBRANE 5 


[KWJ 

LOW COMPLEXITY 15, 

.37 


SEQ MGKDFRYYFQHPWSRMIVAYLVIFFNFLIFAEDPVSHSQTEANVIVVGNCFSFVTNKYPR 

SEG 

PRO cccceeeeeecccchhhhhhhhhhhhhhhhccccccccccceeeeeecccceeeeccccc 

MEM 

SEQ GVGWRILKVLLWLLAILTGLIAGKFLFHQRLFGQLLRLKMFREDHGSWMTMFFSTILFLF 

SEG xxxxxxxxxxxxxxxxx xxxxxxxxxxxx 

PRO cchhhhhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhccccceeeehhhhhhhh 

MEM MMMMMMMMMMMMMMMMM MMMMM 
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SEQ I FSH I YNTI LLMDGNMGAY I IT DYMGI RNES FMKLAAVGTWMGDFVTAWMVTDMMLQDKP 

SEG XXX 

PRO hhhhhhhhhhccccccceeeeecccccchhhhhhhhhhccccccccchhhhhhhhhhccc 
MEM MMMMMMMMMMMM 

SEQ YPDWGKSARAFWKKGNVRITLFWTVLFTLTSVVVLVITTDWISWDKLNRGFLPSDEVSRA 

xxxxxxxxxxxxxxxxxxxxx 

cccccchhhhhhhcccceeehhhhhhhhhhhheeeeecccccccccccccccccchhhhh 
MMMMMMMMMMMMMMMMM M 


SEG 
PRD 
MEM 


SEQ FLASFILVFDLLIVMQDWEFPHFMGDVDVNLPGLHTPHMQFKIPFFQKIFKEEYRIHITG 

xxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhcccccccccccccccccccccccccchhhhhhhhhhhhhhcccc 

MEM MMMMMMMMMMMMMMMM 

SEQ KWFNYGIIFLVLILDLNMWKNQIFYKPHEYGQYIGPGQKIYTVKDSESLKDLNRTKLSWE 

SEG 

PRD ccceeeeeehhhhhhhcccccceeeccccccccccccceeeeecccccccccccchhhhh 

MEM 

SEQ WRSNHTNPRTNKTYVEGDMFLHSRFIGASLDVKCLAFVPSLIAFVWFGFFIWFFGRFLKN 

• ' xxxxxxxxxxxxxx 

PRD hhcccccccccccccccchhhhhhccccccceeeeeehhhhheeeeccceeeeeeeeccc 

MEM MMMMMMMMMMMMMMMMM 

SEQ EPRMENQDKTYTRMKRKSPSEHSKDMGITRENTQASVEDPLNDPSLVCIRSDFNEIVYKS 

SEG 

PRD cccccccccchhhhhhccccccccccceeeccccccccccccccceeeeccccceeeeec 

MEM 

SEQ SHLTSENLSSQLNESTSATEADQDPTTSKSTPTN 

SEG 

PRD cccccccccccccccccccccccccccccccccc 

MEM 


Prosite for DKFZphtes3_35k24 . 1 


PSOOOOl 
PSOOOOl 
PSOOOOl 
PSOOOOl 
PSOOOOl 
PSOOOOl 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00007 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 


149->153 
353->357 
364->368 
371->375 
487->491 
493->497 
435->439 
55->58 
187->190 
299->302 
342->345 
348">351 
370->373 
507->510 
38->42 
342->346 
348->352 
373->377 
438->442 
456->460 
497->501 
499->503 
326->334 
48->54 
79->85 
106->112 
134->140 
159->165 


ASNGLYCOSYLATION 

AS N_GL YCOS Y LAT I ON 

ASN_GL YCOS Y LAT ION 

AS N_GL YCOS Y LAT I ON 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

CAMP_PHOS PHO_S I TE 

PKC_PHOSPHO_SITE 

PKC_PHOSPH0_SITE 

PKC^PHOS PHO_S I TE 

PKC_PHOSPHO_SITE 

PKC_PHOS PHO_S I TE 

PKC_PHOSPH0_SITE 

PKC_PHOSPHO_STTE 

CK2_PHOSPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PHOS PHO_S I TE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PHOSPHO_SITE 

CK2_PH0SPH0_SITE 

CK2_PHOSPH0_SITE 

TYR_PHOSPH0_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 


PDOCOOOOl 
PDOCOOOOl 
PDOCOOOOl 
PDOCOOOOl 
PDOCOOOOl 
PDOCOOOOl 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 


(NO Pfam data available for DKFZphtes3_35k24 . 1) 
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group: metabolism 

DKFZphtes3_35nl2 encodes a novel 315 amino acid protein with strong similarity to ADP,ATP 
carrier T (ANT) proteins - 

The novel protein contains three mitochondrial energy transfer signatures and is closely 
related to the ADP/ATP translocator, or adenine nucleotide translocator (ANT), a protein most 
abundant in mitochondria. In its functional state, it is a homodimer of 30-kD subunits 
embedded asymmetrically in the inner mitochondrial membrane. The dimer forms a gated pore 
through which ADP is moved from the matrix into the cytoplasm. 

The new protein can find application in modulation of ADP-transport and energy metabolism in 
cells/mitochondria. 


strong similarity to ADP/ATP carrier proteins 
EST hits to mouse and drosophila 
Sequenced by DKFZ 

Locus: unlcnown 

Insert length: 1803 bp 

Poly A stretch at pos. 1793, polyadenylaticn signal at pos. 1772 


1 AGCGTCCCAA GAGCCACTTT CTCGCCAGTA CGATCK;TGCA GCGGTTTTCC 
51 GGTTTTCCGC TTCCCTTCAT CGTAGCTCCC GTACTCATTT TTAGCCACTG 
101 CTGCCGGTTT TTATATCCTT CTCCATCATG CATCGTGAGC CTGCGAAAAA 
151 GAAGGCAGAA AAGCGGCTGT TTGACGCCTC ATCCTTCGGG AAGGACCTTC 

201 tggccggcgg agtcgcggca gctgtgtcc:a AGACAGCGGT GGCGCCCATC 
251 GAGCGGGTGA AGCTGCTGCT GCAGGTGCAG gcgtcgtcga agcagatcag 
301 ccccgaggcg cggtacaaag gcatggtgga ctgcctggtg cggattcctc 
351 gcgagcaggg tttcttcagt ttttggcgtg gcaatttggc aaatgttatt 
401 cggtattttc caacacaagc tctaaacttt gcttttaagg acaaatacaa 
4 51 gcagctattc atgtctggag ttaataaaga aaaacagttc tggaggtggt 
501 ttttggcaaa cctg(;cttct ggtggagctg ctggggcaac atccttatgt 
551 gtagtatatc ctcta<»ttt tgcccgaacc cgattaggtg tcgatattgg 
601 aaaaggtcct gaggagcgac aattcaaggg tttaggtgac tgtattatga 
651 aaatagcaaa atcagatgga attgctggtt tataccaagg gtttggtgtt 

701 TCAGTACAGG GCATCATTGT GTACCGAGCC TCTTATTTTG GAGCTTATGA 
751 CACAGTTAAG GGTTTATTAC CAAAGCCAAA GAAAACTCCA TTTCTTGTCT 
801 CCTTTTTCAT TGCTCAAGTT GTGACTACAT GCTCTGGAAT ACTTTCTTAT 
851 CCCTTTGACA CAGTTAGAAG ACGTATGATG ATGCAGAGTG GTGAGGCTAA 
901 AC<3GCAATAT AAAGGAACCT TAGACTGCTT TGTGAAGATA TACCAACATG 
951 AAGGAATCAG TTCCTTTTTT CGT(KK:GCCrr TCTCCAATGT TCTTCGCGGT 
1001 ACAGGGGGTG GTTTGGTGTT GGTATTATAT GATAAAATTA AAGAATTCTT 
1051 TCATATTGAT ATTGGTGGTA GGTAATCGGG AGAGTAAATT AAGAAATAAC 
1101 ATGGATTTAA CTTGTTAAAC ATACAAATTA CATAGCTGCC ATTTGCATAC 
1151 ATTTTGATAG TGTTATTGTC TGTATTTTGT TAAAGTGCTA GTTCTGCAAT 
1201 AAAGCATACA TTTTTTCAAG AATTTAAATA CTAAAAATCA GATAAATGTG 
1251 GATTTTCCTC CCACTTAGAC TCAAACACAT TTTAGTGTGA TATTTCATTT 
1301 ATTATAGGTA GTATATTTTA ATTTGTTAGT TTA7UUVTTCT TTTTATGATT 
1351 AAAAATTAAT CATATAATCC TAGATTAATG CTGAAATCTA GGAAATGAAA 
1401 GTAGCGTCTT TTAAATTGCT ATTCATTTAA TATACCTGTT TTCCCATCTT 
1451 TTGAAGTCAT ATGGTATGAC ATATTTCTTA AAAGCTTATC AATAGATGTC 
1501 ATCATATGTG TAGGCAGAAA TAAGCTTTGT tCTATATCTC TTCTAAGACA 
1551 GTTGTTATTA CTGTGTATAA TATTTACAGT ATCAGCCTTT GATTATAGAT 
1601 GTGATCATTT AAAATTTGAT AATGACTTTA GTGACATTAT AAAACTGAAA 
1651 CTGGAAAATA AAATGGCTTA TCTGCTGATG TTTATCTTTA AAATAAATAA 
1701 AATCTTGCTA GTGTGAATAT ATCTTAGAAC AAAAGGTATC CTCTTGAAAA 
1751 TTAGTTTGTA TATTTTGTTG ACAATAAAGG AAGCTTAACT GTTAAAAAAA 
1801 AAA 


No BLAST result 


BLAST Results 


Medline entries 


96289608: 

Molecular biological and quantitative abnormalities of 
ADP/ATP carrier protein in cardiomyopathic hamsters. 
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Peptide information for frame 2 


ORF from 128 bp to 1072 bp; peptide length: 315 

Category: strong similarity to known protein 

Classification: Metabolism 

Prosite motifs: MITOCH_CARRIER (40-50) 

MITOCH_CARRIER (145-155) 

MITOCH_CARRIER (242-252) 


1 MHREPAKKKA EKRLFDASSF GKDLLAGGVA AAVSKTAVAP lERVKLLLQV 
51 QASSKQISPE ARYKGMVDCL VRIPREQGFF SFWRGNLANV IRYFPTQALN 
101 FAFKDKYKQL FMSGVNKEKQ FWRWFLANLA SGGAAGATSL CVVYPLDFAR 
151 TRLGVDIGKG PEERQFKGLG DCIMKIAKSD GIAGLYQGFG VSVQGIIVYR 
201 ASYFGAYDTV KGLLPKPKKT PFLVSFFIAQ VVTTCSGILS YPFDTVRRRM 
251 MMQSGEAKRQ YKGTLDCFVK lYQHEGISSF FRGAFSNVLR GTGGALVLVL 
301 YDKIKEFFHI DIGGR 

BLAST? hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_35nl2, frame 2 

PIR:S37210 ADP,ATP carrier protein Tl - mouse, N = l. Score « 1127, p - 
2.7e-114 

PIR:A44778 ADP,ATP carrier protein Tl - human, N = 1, Score « 1125. P = 
4 . 4e-114 

tre:mbL:DMADPATPT_,2 product: "ADP/ATP translocase" ; Drosophila 
melanogaster gene encoding ADP/ATP translocase, N « 1, Score » 1124, P 


PIR:XWBO ADP,ATP carrier protein Tl - bovine, N = 1, Score « 1121, P « 


>PIR:S37210 ADP,ATP carrier protein Tl - mouse 
Length = 298 

HSPs: 

Score " 1127 (169.1 bits). Expect = 2.7e-114, P = 2.7e-114 
Identities = 214/293 (73%), Positives = 248/293 (84%) 

Query: 17 assfgkdllaggvaaavsktavapiervklllqvqasskqispearykgmvdclvripre 76 

A SF KD LAGG+AAAVSKTAVAPIERVKLLLQVQ +SKQIS E +YKG++DC+VRIP+E 
Sb3Ct: 5 ALSFLKDFLAGGIAAAVSKTAVAPIERVKLLLQVQHASKQISAEKQYKGIIDCWRIPKE 64 

Query: 77 QGFFSFWRGNLANVIRYFPTQALNFAFKDKYKQLFMSGVNKEKQFWRWFLANLASGGAAG 136 
OK- . SFWRGNLANVIRYFPTQALNFAFKDKYKQ+F+ GV++ KQFWR+F NLASGGAAG 

Sb3Ct: 65 QGFLSFWRGNLANVIRYFPTQALNFAFKDKYKQIFLGGVDRHKQFWRYFAGNLASGGAAG 124 

Query: 137 ATSLCVVYPLDFARTRLGVDIGKGPEERQFKGLGDCIMKIAKSDGIAGLYQGFGVSVQGt 196 
ew . ,oc ^"^^^^ VYPLDFARTRL D+GKG +R+F GLGDC+ KI KSDG+ GLYQGF VSVQGI 
Sb^ct: 125 ATSLCFVYPLDFARTRLAADVGKGSSQREFNGLGDCLTKIFKSDGLKGLYQGFSVSVQGI 184 

Query: 197 IVYRASYFGAYDTVKGLLPKPKKTPFLVSFFIAQWTTCSGILSYPFDTVRRRMMMQSGE 256 
QH^^. ,oc Jt!^^^^^ ^^'^ ^^""^^ ^^S-' VT +G++SYPFDTVRRRMMMQSG 

SbjCt: 185 IIYRAAYFGVYDTAKGMLPDPKNVHIIVSWMIAQSVTAVAGLVSYPFDTVRRRMMMQSGR 244 

Query: 257 — AKRQYKGTLDCFVKIYQHEGISSFFRGAFSNVLRGTGGALVLVLYDKIKEF 307 

A Y GTLDC+ KI + EG ++FF+GA+SNVLRG GGA VLVLYD+IK++ 
Sb^ct: 245 KGADIMYTGTLDCWRKIAKDEGANAFFKGAWSNVLRGMGGAFVLVLYDEIKKY 297 


Pedant information for DKF2phtes3_35nl2, frame 2 


Report for DKFZphtes3_35nl2.2 


(LENGTH] 315 
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[MW] 

[pl] 

IHOMOL] 

[FUNCAT] 

IFUNCAT] 

[FUNCAT J 

[FUNCAT J 

[FUNCAT) 

cerevisiae, 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

(FUNCAT] 

le-13 

[FUNCAT] 

(FUNCAT] 

6e-12 

[FUNCAT] 

[FUNCAT J 

[FUNCAT] 

[FUNCAT) 

[FUNCAT] 

[FUNCAT] 

[BLOCKS) 

[BLOCKS] 

[PIRKW) 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[SUPFAM) 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[PROSITE] 

[PFAM] 

[KW] 

[KWJ 


35022 
9.91 
PIR:S 
07 .16 
08.04 
30. 16 
01.03 
01.07 
YIL006W] 
07,99 
01.05 
07.07 
07.04 


03 

37210 ADp^ATP carrier protein Tl - mouse le-115 
purine and pyrimidine transporters [S. cerevisiae, YBL030c) 2e-72 

mitochondrial transport [S. cerevisiae, YBLOSOc] 2e-72 

mitochondrial organization [S. cerevisiae, YBL030c) 2e-72 

.19 nucleotide transport [S. cerevisiae, YBLOSOcj 2e-72 

.10 transport of vitamins, cofactors, and prosthetic groups [S. 
2e-14 

other transport facilitators (S. cerevisiae, YILOOSw] 2e-14 
.07 carbohydrate transport [S. cerevisiae, YPR021cl 5e-14 

sugar and carbohydrate transporters [S. cerevisiae, YPR021c) 5e-14 
.07 anion transporters (cl, so4, po4, etc.) (S. cerevisiae, YKL120wJ 

02.13 respiration [S. cerevisiae, YBR192wj 4e-13 

01.05.04 regulation of carbohydrate utilization (S. cerevisiae, YJR095wJ 

13.04 homeostasis of other ions (S. cerevisiae, YLR348cl 4e-10 

01.04.07 phosphate transport (S. cerevisiae, yLR348c) 4e-10 
01.01.07 amino-acid transport (S. cerevisiae, YORl30c] le-06 

07.10 amino-acid transporters [S. cerevisiae, YOR130c) le-06 

99 unclassified proteins [S. cerevisiae, YPR128cJ 2e-06 

04,05.03 mrna processing (splicing) |S. cerevisiae, YKR052cl 2e-06 
BL00215B Mitochondrial energy transfer proteins 
BL00215A Mitochondrial energy transfer proteins 
duplication le-115 
phosphate transport 2e-09 
heart 3e-24 

transmembrane protein le-115 
mitochondrial inner membrane 7e-72 
transport protein 4e-08 
acetylated amino end le-115 
adipose tissue 5e-13 
mitochondrion le-115 
alternative splicing 2e-09 
methylated amino acid le-115 
chloroplast 2e-14 
homodimer le-115 

hypothetical protein YFR045w 3e-07' 
ADP,ATP carrier protein le-115 
Btl protein 2e-14 

ADP,ATP carrier protein repeat homology le-115 

probable carrier protein ypr021c le-12 

MITOCH_CARRIER 3 

Mitochondrial carrier proteins 

TRANSMEMBRANE 2 

LOW COMPLEXITY 4-76 % 


SEQ 
SEG 
PRD 

MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
HEM 

SEQ 
SEG 
PRD 
HEM 

SEQ 
SEG 
PRD 
MEM 


MHREPAKKKAEKRLFDASSFGKDLLAGGVAAAVSKTAVAPIERVKLLLQVQASSKQISPE 
ccchhhhhhhhhhhhhchhhhhhhhhchhhhhhhhhhhcchhhhhhhhhhhhhhhhlihlih 

ARYKGMVDCLVRIPREQGFFSFWRGNLANVIRYFPTQALNFAFKDKYKQLFMSGVNKEKQ 
hhhhhhhheeeeccccceeeeecccccceeeeecccchhhhhhhhhhhhhhccccccccc 


FWRWFLANLASGGAAGATSLCVVYPLDFARTRLGVDIGKGPEERQFKGLGDCIMKIAKSD 

xxxxxxxxxxxxxxx 

eeeecccccccccccceeeeeeeccchhhhhhhhhhccccchhhhhhcccceeeeeeccc 

GIAGLYQGFGVSVQGIIVYRASYFGAYDTVKGLLPKPKKTPFLVSFFIAQWTTCSGILS 

cccccccccceeeccceeehhhhhccccccccccccccccccchhhhhhhhlihlieeeeec 
MMMMMMMMMMMNMMMMMMMMMMM MMMMMMMMMMMMMMMM 

YPFDTVRRRMMMQSGEAKRQYKGTLDCFVKIYQHEGISSFFRGAFSNVLRGTGGALVLVL 

cccchhhhhhhhhcccceeeecccchhhhhhhhhcccccccccchhhhhccccceeeeee 

MMMMMMMtMMM 


SEQ 
SEG 
PRD 
MEM 


YDKIKEFFHIDIGGR 
hhhhhhheeeecccc 


851 


wo 01/12659 


PCT/IB00/OI496 


Prosite for DKrzphtes3_35nl2 . 2 


PS00215 40->50 MITOCH_CARRIER PDOC00189 

PS00215 145->155 MITOCH^CARRIER PDOC00189 
PS00215 242->252 MITOCH CARRIER PDOC00189 


Pfam for DKFZphtes3_35nl2.2 

HMM_NAME Mitochondrial carrier proteins 

*pFwkciFLAGGIAGmMeHTvMFPIDtIKTRMQlQgEMpM. .ahpRYkGMI 
+F+KD+LAGG+A+++++T+++PI+++K+++Q+Q +++ RYKGM+ 
""®^y 19 SFGKDLLAGGVAAAVSKTAVAPIERVKLLLQVQASSKQISPEARYKGMV 67 

dCFRwIwkNEGWRGLWRGLgANvIRYI PqWalRFGFYEFMKeMFiDy f ge 
DC+ +I++++G++++WRG++ANVIRY+P++A++F+F++ +K +F + +++ 
€8 DCLVRIPREQGFFSFWRGNLANVIRYFPTQALNFAFKDKYKQLFMSGVNK 117 

ddnyWmWFwranYMaGsmAGEwisvIitYPMWWKTRLQaDqkHphsQp R 
++W+WF+ N+++G++AG ++S+ ++YP+++++TRL D +++++ R 
Q"®^y 118 EKQFWRWFLANLASGGAAG-ATSLCWYPLDFARTRLGVD--IGKGPEER 164 

HMM hYNGvWNcWrklYReEGgFkGLYRGWtPTWMRMIPYqmiYFfvYEtLKeW 
+++G+ +C KI +++G ++GLY+G++ +++++I+Y++ YF++Y+T K + 
Query 165 QFKGLGDCIMKIAKSDG-IAGLYQGFGVSVQGIIVYRASYFGAYDTVKGL 213 

l-ynytgYnPgprelCMddsPwWhWilgWmlAGMiaWivSYPfDVVRTRMM 
_ ^ ^ +++ + ++ ++++I+SYPFD+VR+RMM 

Q"^^y 214 LP KPK—KTPFLVSFFIAQWT-TCSGILSYPFDTVRRRMM 251 

«dsm. edhkYqSmlDCWMqlYKnEGFkOFWKGFWPRIMRiMPWtAIMFmr 
M+S+ ++++Y+++LDC+++IY++EG+ +F++G+ +++R+ ++A+++++ 
Query 252 MQSGEAKRQYKGTLDCFVKIYQHEGISSFFRGAFSNVLRGT-GGALVLVL 300 

HMM YEqMKwFL* 
Y+ +K+F+ 

Query 301 ydkikEFF 308 
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DKFZphtes3_35n24 


group: testes derived 

DKF2phtes3_35n24 encodes a novel 365 amino acid protein without similarity to known proteins. 

The novel protein contains a Prosite Ig (Iromunoglubulin) -MHC pattern. This pattern represents a 
domain, approximately one hundred amino acids long and including a conserved intra-domain 
disulfide bond (ilg domaini) . Thus, the novel protein is a new member of the Ig-super family. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes. 


unknown 

complete cDNA, complete cds, EST hits 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 1589 bp 

Poly A stretch at pos. 1579, polyadenylation signal at pos. 1560 


1 CGATCGTCAC GTGACGCCGG GGTTCAGCGT ATCCTTGCTG GGCAACCGTC 
51 TTAGAGACCA GCACTGCTGG CTGCACCATG AATGTGATCT ACCCACTGGC 
101 AGTCCCCAAG GGGCGCAGAC TCTGCTGTGA GGTGTGCGAA GCCCCAGCCG 
151 AGCGGGTGTG CGCGGCCTGC ACAGTCACTT ATTACTGTGG GGTGGTACAT 
201 CAGAAGGCTG ACTGGGACAG CATCCATGAG AAAATATGTC AGCTCTTGAT 
251 TCCACTGCGC ACTTCCATGC CCTTCTACAA TTCAGAGGAA GAACGGCAGC 
301 ATGGCCTGCA GCAGCTGCAG CAGCGGCAGA AGTATTTGAT TGAATTCTGC 
351 TACACCATAG CCCAGAAATA CCTCTTTGAA GGGAAACACG AAGATGCTGT 
401 ACCAGCAGCT TTGCAGTCCC TTCGCTTCCG TGTGAAGCTG TATGGCCTGA 
451 GCTCCGTAGA GCTTGTGCCT GCTTACCCGC TGTTGGCCGA GGCCAGCCTT 
501 GGTCTGGGCC GAATCGTTCA GGCTGAAGAA TATCTATTCC AAGCCCAGTG 
551 GACAGTCCTC AAATCAACTG ACTGTAGTAA TGCCACCCAC TCTTTACTGC 
601 ATCGGAATCT GGGACTTCTC TATATAGCTA AGAAAAACTA TGAAGAGGCC 
651 CGTTATCATC TGGCCAATGA TATTTATTTT GCCAGTTGTG CATTTGGAAC 
"701 AGAGGACATT AGGACTTCAG GAGGCTACTT CCACCTGGCT AATATATTCT 
751 ATGACCTTAA AAAGTTGGAC CTGGCAGACA CATTGTACAC CAAGGTCTCT 
801 GAGATCTGGC ATGCATATTT GAACAATCAC TATCAAGTCC TCTCACAGGC 
851 TCACATCCAA CAAATGGATT TACTGGGCAA ACTATTTGAG AATGACACTG 
901 GCTTGGATGA AGCCCAAGAA GCAGAAGCCA TTCGCATCCT GACTTCAATC 
951 TTGAACATTC GAGAATCTAC ATCTGACAAA GCCCCCCAAA AAACCATCTT 
1001 TGTTCTGAAG ATCCTGGTCA TGCTTTACTA CCTGATGATG AATTCTTCAA 
1051 AGGCACAGGA ATATGGCATG AGGGCCCTCA GTCTAGCCAA AGAACAACAG 
1101 CTTGATGTCC ATGAGCAAAG CACCATTCAA GAGTTATTAA GTCTCATTTC 
1151 AACTGAAGAC CATCCCATTA CTTAGTGACC CATGAGCTCT GCATCAAGGG 
1201 TTATTCCAGG GGCTACTGAA GATCTAATAT ATTCCAGCCT TGCACAACTG 
1251 CTTTGAGGTA CTGTAGACTG CTGAAGTTTC CACCCTCTTC CCCTGGGATT 
1301 GCACACATAG CTGTTATTTT TTTCTTACAC AGCATATTAA GGGAATATAA 
1351 AGCTTTAGGC ATAGAAATCA CTAAAAACTG TGTTTGTCAT GACCTTTGTA 
1401 CTTGATTTAT CATGACTTTG TATGACTGAG TAATATGTAG TCAGATCACT 
1451 AATATGGTAT TTGTAATTAA ACTACAAATA GTTTGTCATT TCCCAGAAGT 
1501 CTTCCAACGA TGCATGTTTC ATACACTTTT GCTAAAGGAG GGGTAAAGGA 
1551 GGGGGTAGGG AATAAAGCTA TATTGGAACA AAAAAAAAA 


BLAST Results 


Mo BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 3 


ORF from 78 bp to 1172 bp; peptide length: 365 
Category: putative protein 
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Prosite motifs: IG_MHC (35-42) 


1 MNVIYPLAVP KGRRLCCEVC EAPAERVCAA CTVTYYCGVV HQKADWDSIH 
51 EKICQLLIPL RTSMPFYNSE EERQHGLQQL QQRQKYLIEF CYTIAQKYLF 
101 EGKHEDAVPA ALQSLRFRVK LYGLSSVELV PAYPLLAEAS LGLGRIVQAE 
151 EYLFQAQWTV LKSTDCSNAT HSLLHRNLGL LYIAKKNYEE ARYHLANDIY 
201 FASCAFGTED IRTSGGYFHL ANIFYDLKKL DLADTLYTKV SEIWHAYLNN 
251 HYQVLSQAHI QQMDLLGKLF ENDTGLDEAQ EAEAIRILTS ILNIRESTSD 
301 KAPQKTIFVL KILVMLYYLM MNSSKAQEYG MRALSLAKEQ QLDVHEQSTI 
351 QELLSLISTE DHPIT 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_35n24, frame 3 
No Alert BLASTP hits found 

Pedant information for DKF2phtes3_35n24, frame 3 


Report for DKFZphtes3_35n24 . 3 


t LENGTH J 

365 


[MWJ 

41768.24 


Ipll 

5.82 


[BLOCKS] 

BL00273 Heat-Stable 

enterotoxins proteins 

[PROSITE] 

MYRISTYL 1 

(PROSITE] 

IG MHC 1 


(PROSITE J 

AMIDATION 1 


(PROSITE] 

CK2 PHOSPHO SITE 

7 

(PROSITE) 

TYR PHOSPHO SITE 

4 

(PROSITE] 

PKC PHOSPHO SITE 

3 

(PROSITE) 

ASNGLYCOSYLATION 

3 

(KW) 

Alpha Beta 


IKW) 

LOW_COMPLEXITY 4 

.11 % 


SEQ MNVIYPLAVPKGRRLCCEVCEAPAERVCAACTVTYYCGWHQKADWDSI HEKICQLLIPL 

SEG 

PRD ccceeeeeccccceeeeeeeehhhhhhhheeeeeeeeeecccccccchhhhhhhhheeec 

SEQ RTSMPFYNSEEERQHGLQQLQQRQKYLI EFCYTI AQKYLFEGKHEDAVPAALQSLRFRVK 

SEG xxxxxxxxxxxxxxx 

PRD cccccccchhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhcccccccchhhhhhhhhhhh 

SEQ LYGLSSVELVPAYPLLAEASLGLGRIVQAEEYLFQAQWTVLKSTDCSNATHSLLHRNLGL 

SEG 

PRD hhccceeeeccccchhhhhccccchhhhhhhhhhhhhhhccccccccccccccccccccc 

SEQ LYIAKKNYEEARYHLANDIYFASCAFGTBDIRTSGGYFHLANIFYDLKKLDLADTLYTKV 

SEG 

PRD eeeehhhhhhhhhhhhhheeeeeccccccccccccceeehhhhhhhhhhhhccceeeeeh 

SEQ SEIWHAYLNNHYQVLSQAHIQQMDLLGKLFENDTGLDEAQEAEAIRILTS ILNIRESTSD 

SEG 

PRD hhhhhhhhcccchhhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhhhccccc 

SEQ KAPQKTI FVLKI LVMLY YLMMNSSKAQEYGMRALSLAKEQQLDVHEQSTIQELLSLI STE 

SEG 

PRD ccccceeeehhhhhhhhhhhhcccchhhhhchhhhhhhhhhhhhhhhhhhhhhhhhhhcc 

SEQ DHPIT 

SEG 

PRD ccccc 


Prosite for DKFZphtes3_35n24 .3 


PSOOOOl 168->172 ASN_GLYC0SYLATION PDOCOOOOl 

PSOOOOl 272->276 ASN GLYCOSYLATION PDOCOOOOl 

PSOOOOl 322->326 ASN^GLYCOSYLATION PDOCOOOOl 

PS00005 H4->117 PKC_PHOSPHO_SITE * PDOC00005 

PS00005 299->302 PKC_PHOSPHO_SITE PDOC00005 

PS00005 323->326 PKC_PHOSPHO SITE PDOC00005 
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PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00007 
PS00007 
PS00007 
PS00007 
PS00008 
PS00009 
PS00290 


48->52 
69->73 
125->129 

274- >278 
297->301 
349->353 
358->362 

85->93 
186->194 
186->194 
185->194 

275- >281 
11->15 
35->42 


CK2_PH0SPH0 

CK2_PHOSPHO' 

CK2_PH0SPH0" 

CK2_PHOSPHO' 

CK2_PH0SPH0~ 

CK2_PH0SPHO'' 

CK2 PHOSPHO" 

TYR^PHOSPHO" 

TYR_PHOSPHO" 

TYR_PHOSPHO" 

TYR_PHOSPHO* 

MYRISTYL 

AMIDATION 

IG MHC 


SITE 
SITE 

SITE 

'site 

SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 


PDOC00006 
PDOC00006 

PDOC00006 
PDOC0O006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDCX:00007 
PDOC00007 
PDOC00007 
PDOC00008 
PDOC00009 
PDOC00262 


(No Pfam data available for DKFZphte33_35n24 . 3) 


855 


wo 01/12659 


PCT/IBOO/01496 


DKFZphtes3_35n9 


group: metabolism 

DKFZphf tes3_35n9 encodes a novel 607 amino acid protein which is a splice variant of human 
carboxylesterase (EC 3.1.1.1). 

The novel protein contains both, one carboxylesterase Bl and one B2 pattern. In comparison to 
EC 3.1.1.1, DKFZphtes3_35n9 shows a N-terminal extension and aa 456-474 are missing. 

The new protein can find application in modulation of carboxylester metabolism and as a new 
enzyme for biotechnologic production processes. 


carboxylesterase, splice variant 

5* extension of mRNA and N-terminal elongation of protein (64 aa), 
missing exoni aa 458-474 of JC5408 are missing 

Sequenced by DKFZ 

Locus : un)cnown 

Insert length: 2888 bp 

Poly A stretch at pos. 2878, no polyadenylation signal found 


1 CTCGGCCTGA GGTGCGAGAG AAGCGGTGAC CGCGGCCCTG GCTGCTCGGA 
51 CCCGGGAACA TGATGGTCGC TGGAGCAGAA GGCGCTGAGA AGGGACCACG 
101 GCGGCGCTGG GTCGTGCGAG CCAGTAGCGG GCTGAAACGT AGAGGCCAGA 
151 ACCAGGTCTC AGGGGGCACT AAAGGCGGTC GGAGGTAATC CCCACACCGC 
201 TTCCTCCTGG AAGTCAGGCT GGCCGGGAGC TCCCGTATCC AGGACGGTTG 
251 GTCGCCTCTG GCCTGGCAGG GATCCTAGTG TCTCGGGACC TCCCGGTGAC 
301 GCGCCTGCCT CCCCTGCTGC ACCATAGGCC CGGGAGTACG GCGTCCCCAC 
351 AGCTTGGACC GGCAGGGGCT CGTGAAATGT TTGTCAAGTG GATAAATGAC 
401 CATGGCCGTG GTCTCCGCGG GAGGTGAGGA AACTGAAAGC CACCGAGGAA 
4 51 AAGGGGGGCG CTCCTTAAGA AGTGCCGCGG TCACGTGTAC GTTTCAAAAG 
501 AATGGCGTGA CTGAGTAGGG AGGGGACCGC GGAGACCCTC AGACCCTGGA 
551 CTGTAAGGAG ATGAGGGGCC GTGAAGGGGA ACCCAGGAAA CTGAGTCCTG 
601 AAAGCAAGGA GGAACTTCCA GAATGAAGGG CGCCGACACT CCTTCCTGCC 
651 TTTGCTCAAG CGGTTCCTTC ACCCCGATCA AGTTCCTTCC CATTTCTCCA 
701 TCTGGGGGAT CCTGAACGTG CACATCCTCA GAGAAGCCCT CCTGGGGTCT 
751 CCAATTCTAG TTTATTGCCC CCTCCTATCG ATCCCCCAGC GCGCTCATCG 
801 GGCCTGTGGA CAAGGACAGG TTTGAAGAGA GGATTCCCTG GATCGCGGAA 
851 GGGCTGCAGG AATGGCACAG CCCCTTCCGA GGATGCCAAA GGAGCCCGGG 
901 CAAAGGAAAG TGGCCGTGCC CGGGCCTGCC TACCACTAGA TCCCCACCCA 
951 CCTATGACTG CTCAGTCCCG CTCTCCTACC ACACCCACCT TTCCCGGCCC 
1001 AAGCCAGCGC ACCCCGCTGA CTCCCTGCCC AGTCCAAACT CCAAGGCTGG 
1051 GCAAGGCACT GATCCACTGC TGGACAGACC CGGGGCAGCC TCTGGGTGAA 
1101 CAGCAGCGTG TCCGCCGGCA GCGAACCGAG ACCAGCGAGC CGACCATGCG 
1151 GCTGCACAGA CTTCGTGCGC GGCTGAGCGC GGTGGCCTGT GGGCTTCTGC 
1201 TGCTTCTTGT CCGGGGCCAG GGCCAGGACT CAGCCAGTCC CATCCGGACC 
1251 ACACACACGG GGCAGGTGCT GGGGAGTCTT GTCCATGTGA AGGGCGCCAA 
1301 TGCCGGGGTC CAAACCTTCC TGGGAATTCC ATTTGCCAAG CCACCTCTAG 
1351 GTCCGCTGCG ATTTGCACCC CCTGAGCCCC CTGAATCTTG GAGTGGTGTG 
1401 AGGGATGGAA CCACCCATCC GGCCATGTGT CTACAGGACC TCACCGCAGT 
1451 GGAGTCAGAG TTTCTTAGCC AGTTCAACAT GACCTTCCCT TCCGACTCCA 
1501 TGTCTGAGGA CTGCCTGTAC CTCAGCATCT ACACGCCGGC CCATAGCCAT 
1551 GAAGGCTCTA ACCTGCCGGT GATGGTGTGG ATCCACGGTG GTGCGCTTGT 
1601 TTTTGGCATG GCTTCCTTGT ATGATGGTTC CATGCTGGCT GCCTTGGAGA 
1651 ACGTGGTGGT GGTCATCATC CAGTACCGCC TGGGTGTCCT GGGCTTCTTC 
1701 AGCACTGGAG ACAAGCACGC AACCGGCAAC TGGGGCTACC TGGACCAAGT 
1751 GGCTGCACTA CGCTGGGTCC AGCAGAATAT CGCCCACTTT GGAGGCAACC 
1801 CTGACCGTGT CACCATTTTT GGCGAGTCTG CGGGTGGCAC GAGTGTGTCT 
1851 TCGCTTGTTG TGTCCCCCAT ATCCCAAGGA CTCTTCCACG GAGCCATCAT 
1901 GGAGAGTGGC GTGGCCCTCC TGCCCGGCCT CATTGCCAGC TCAGCTGATG 
1951 TCATCTCCAC GGTGGTGGCC AACCTGTCTG CCTGTGACCA AGTTGACTCT 
2001 GAGGCCCTGG TGGGCTGCCT GCGGGGCAAG AGTAAAGAGG AGATTCTTGC 
2051 AATTAACAAG CCTTTCAAGA TGATCCCCGG AGTGGTGGAT GGGGTCTTCC 
2101 TGCCCAGGCA CCCCCAGGAG CTGCTGGCCT CTGCCGACTT TCAGCCTGTC 
2151 CCTAGCATTG TTGGTGTCAA CAACAATGAA TTCGGCTGGC TCATCCCCAA 
2201 GGTCATGAGG ATCTATGATA CCCAGAAGGA AATGGACAGA GAGGCCTCCC 
2251 AGGCTGCTCT GCAGAAAATG TTAACGCTGC TGATGTTGCC TCCTACATTT 
2301 GGTGACCTGC TGAGGGAGGA GTACATTGGG GACAATGGGG ATCCCCAGAC 
2351 CCTCCAAGCG CAGTTCCAGG AGATGATGGC GGACTCCATG TTTGTGATCC 
2401 CTGCACTCCA AGTAGCACAT TTTCAGTGTT CCCGGGCCCC TGTGTACTTC 
2451 TACGAGTTCC AGCATCAGCC CAGCTGGCTC AAGAACATCA GGCCACCGCA 
2501 CATGAAGGCA GACCATGTTA AATTCACTGA GGAAGAGGAG CAGCTAAGCA 
2551 GGAAGATGAT GAAGTACTGG GCCAACTTTG CGAGAAATGG GAACCCCAAT 
2601 GGCGAGGGTC TGCCACACTG GCCGCTGTTC GACCAGGAGG AGCAATACCT 
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2651 GCAGCTGAAC 
2701 TCCAGTTCTG 
2751 CCTGAAGAGA 
2801 GTGGGTTCGC 
2851 CTAAGGAGAA 


CTACAGCCTG 
GAAGAAGGCG 
GACACACAGA 
TGACAGGCGA 
AGAAGTTGAT 


CGGTGGGCCG 
CTGCCCCAAA 
GCTGTAGCTC 
GGGTCAGCCT 
TCCTTCATAA 


GGCTCTGAAG 
AGATCCAGGA 
CCTGTGCCGG 
GCTGTGCCCA 
AAAAAAAA 


GCCCACAGGC 
GCTCGAGGAG 
GGAGGAGGGG 
CACACACCCA 


BLAST Results 


Entry D50579 from database EMBL: 

Homo sapiens mRNA for carboxylesterase, complete cds 
Score = 7197, P = O.Oe+00, identities = 1441/1443 

Entry JC5408 from database PIR: 
carboxylesterase (EC 3.1.1.1) - human 
Score - 2808, p = l,2e-291, identities 
frame +3 


542/559, positives = 543/559, 


Medline entries 


No Medline entry 


Peptide information for frame 3 


ORF from 954 bp to 2774 bp; peptide length: 607 
Category: known protein 
Classification: Metabolism 

Prosite motifs: CARBOXYLESTERASE B 1 (279-295) 
CARB0XYLESTERASE_B_2 (185-196) 


1 MTAQSRSPTT 
51 QRVRRQRTET 
101 HTGQVLGSLV 
151 DGTTHPAMCL 
201 GSNLPVMVWI 
251 TGOKHATGNW 
301 LVVSPISQGL 
351 ALVGCLRGKS 
401 SIVGVNNNEF 
451 DLLREEYIGD 
501 EFQHQPSWLK 
551 EGLPHWPLFD 
601 EERHTEL 


PTFPGPSQRT 
SEPTMRLHRL 
HVKGANAGVQ 
QDLTAVESEF 
HGGALVFGMA 
GYLDQVAALR 
FHGAIMESGV 
KEEILAINKP 
GWLIPKVMRI 
NGDPQTLQAQ 
NIRPPHHKAO 
QEEQYLQLNL 


PLTPCPVQTP 
RARLSAVACG 
TFLGIPFAKP 
LSQFNMTFPS 
SLYDGSMLAA 
WVQQNIAHFG 
ALLPGLIASS 
FKMIPGVVDG 
YDTQKEMDRE 
FQEMMADSMF 
HVKFTEEEEQ 
QPAVGRALKA 


RLGKALIHCW 
LLLLLVRGOG 
PLGPLRFAPP 
DSMSEDCLYL 
LENVVVVIIQ 
GNPDRVTIFG 
ADVISTVVAN 
VFLPRHPQEL 
ASQAALQKML 
VIPALQVAHF 
LSRKMMKYHA 
HRLQFWKKAL 


TDPGQPLGEQ 
QDSASPIRTT 
EPPESWSGVR 
SIYTPAHSHE 
YRLGVLGFFS 
ESAGGTSVSS 
LSACDQVDSE 
LASADFQPVP 
TLLMLPPTFG 
QCSRAPVYFY 
NFARNGNPNG 
PQKIQELEEP 


BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_35n9, frame 3 
PIR:JC5408^carboxylesterase (EC 3.1.1.1> - human, N - 1, Score - 2808, 

TREMBL:HSU60553_1 gene: -hCE-2"; product: "carboxylesterase"; Human 
S^i®^*"*"^^ ^^^^'^^ co«»Plete cds., N =1, Score - 2761/? o 

1 • 8e""Z87 

1985?'p'!%ae-205"''' ''''' ' " ^ = ^^^^ 

TREMBL:D50580_1 product: "carboxylesterase precursor"; Rattus 
norvegicus^mRNA^for carboxylesterase, partial cds., N « 1, Score = 


>PIR;JC5408 carboxylesterase (EC 3.1.1.1) 
Length = 559 


htiman 


HSPs: 
Score 


2808 (421.3 bits). Expect = 1.9e-292, P = 1.9e-292 
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Identities = 542/559 {96%), Positives = 543/559 (97%) 

Query: 


Sbjct : 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct : 

Query: 

Sbjct: 

Query : 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 


6 5 MRLHRLRARLSA VACGLLLLL VRGQGQDS AS P I RTTHTGQVLGSLVHVKGAN AGVQTFLG 124 
MRLHRLRARLSAVACGLLLLLVRGQGQDSASPIRTTHTGQVLGSLVHVKGANAGVQTFLG 
1 MRLHRLRARLSAVACGLLLLLVRGQGQDSASPIRTTHTGQVLGSLVHVKGANAGVQTFLG 60 

125 IPFAKPPLGPLRFAPPEPPESWSGVRDGTTHPAMCLQDLTAVESEFLSQFNMTFPSDSMS 184 
IPFAKPPLGPLRFAPPEPPESWSGVRDGTTHPAMCLQDLTAVESEFLSQFNMTFPSDSMS 
61 IPFAKPPLGPLRFAPPEPPESWSGVRDGTTHPAMCLQDLTAVESEFLSQFNMTFPSDSMS 120 

185 EDCLYLSIYTPAHSHEGSNLPVMVWIHGGALVFGMASLYDGSMLAALENVVWIIQyRLG 244 

EDCLYLSIYTPAHSHEGSNLPVMVWIHGGALVFGMASLYDGSMLAALENVVWIIQYRLG 
121 EDCLYLSIYTPAHSHEGSNLPVMVWIHGGALVFGMASLY£>G$MIAALENWVVIIQYRLG 180 

245 VLGFFSTGDKHATGNWGYLOQVAALRWVQQNIAHFGGNPDRVTIFGESAGGTSVSSLVVS 304 

VLGFFSTGDKHATGNWGYLDQVAALRWVQQNIAHFGGNPDRVTIFGESAGGTSVSSLWS 
181 VLGFFSTGDKHATGNWGYLDQVAALRWVQQNIAHFGGNPDRVTIFGESAGGTSVSSLWS 240 

305 PISQGLFHGAIHESGVALLPGLIASSADVISTVVANLSACDQVDSEALVGCLRGKSKEEI 364 

PISQGLFHGAIMESGVALLPGLIASSADVISTWANLSACDQVDSEALVGCLRGKSKEEI 
241 PISQGLFHGAIMESGVALLPGLIASSADVISTWANLSACDQVDSEALVGCLRGKSKEEI 300 

365 LAINKPFKMIPGVVDGVFLPRHPQELLASADFQPVPSIVGVNNNEFGWLIPKVMRIYDTQ 424 

LAINKPFKMIPGVVDGVFLPRHPQELLASADFQPVPSIVGVNNNEFGWLIPKVMRIYM-Q 
301 lAINKPFKMIPGVVDGVFLPRHPQELLASADFQPVPSIVGVNNNEFGWLIPKVMRIYDTQ 360 

425 KEMDREASQAALQKMLTLLMLPPTFGOLLREEYIGDNGDPQTLQAQFQEMMADSMFVIPA 484 

KEMDREASQAALQKMLTLLMLPPTFGDLLREE YI GDNGDPQTLQAQFQEMMADSMFVI PA 
361 KEMDREASQAALQKMLTLLMLPPTFGDLLREEYI GDNGDPQTLQAQFQEMMADSMFVI PA 420 

485 LQVAHFQCSRAPVYFYEFQHQPSWLKNIRPPHMKADH VKFTEEE 528 

LQVAHFQCSRAPVYFYEFQHQPSWLKNIRPPHMKADH +KFTEEE 
421 LQVAHFQCSRAPVYFYEFQHQPSWLKNIRPPHMKADHGDELPFVFRSFFGGNYIKFTEEE 480 

529 EQLSRKMMKYWANFARNGNPNGEGLPKWPLFDQEEQYLQLNLQPAVGRALKAHRLQFWKK 588 

EQLSRKMMKYWANFARNGNPNGEGLPHWPLFDQEEQYLQLNtQPAVGRALKAHRLQFWKK 
481 EQLSRKMMKYWANFARNGNPNGEGLPHWPLFDQEEQYLQLNLQPAVGRALKAHRLQFWKK 540 

589 ALPQKIQELEEPEERHTEL 607 

ALPQKIQELEEPEERHTEL 
541 ALPQKIQELEEPEERHTEL 559 


Pedant information for DKFZphtes3_35n9, frame 3 


Report for DKFZphtes3 35n9.3 


(LENGTH] 

(MW) 

tpl] 

[HOMOL] 

(BLOCKS) 

(BLOCKS I 

(BLOCKS) 

[BLOCKS] 

[BLOCKS] 

(BLOCKS] 

[BLOCKS] 

(BLOCKS) 

[SCOP] 

[SCOP] 

(SCOP J 

(EC) 

(EC) 

(EC] 

(EC) 

(EC) 

lECl 

(PIRKW] 

[PIRKW] 

(PIRKW) 

[PIRKW] 

(PIRKW] 

(PIRKW) 

(PIRKW) 

(PIRKW) 

[PIRKW] 

(PIRKW) 

[PIRKW] 


607 

67051.20 
6.11 

PIR:JC5408 carboxylesterase {EC 3.1.1.1) - human 0.0 
BL0il73A Lipolytic enzymes -G-D-X-G" family, histidine 

BL00122G 
BL00122F 
BL00122E 

BL00122D Carboxylesterases type-B serine proteins 
BL00122C Carboxylesterases type-B serine proteins 
BL00122B Carboxylesterases type-B serine proteins 
BL00122A Carboxylesterases type-B serine proteins 

^^^^^ 3.56.1.1.4 Bile-salt activated lipase (Bovine (Bos taurus le-158 

^2acJc 3.56.1.1.1 Acetylcholinesterase (Electric ray (Torped le-170 

^It^g 3.56.1.9.7 type-B carboxylesterase/lipase (fungu le-149 

3.1.1.13 Sterol esterase le-52 

3.1.1.7 Acetylcholinesterase 5e-74 
3.1.1.1 Carboxylesterase 0.0 

3.1.1.8 Cholinesterase 5e-68 

3.1.1.59 Juvenile-hormone esterase le-34 

3.1.1.3 Triacylglycerol lipase 3e-52 

duplication 2e-47 

homotetramer 3e-67 

transmembrane protein 9e-44 

microsome le-130 

pancreas 3e-52 

endoplasmic reticulum le-134 
homot rimer le-134 

phosphatidylinositol linkage 5e-l' ^ 

synapse 3e-73 

liver le-131 

heparin binding 3e-52 
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(PIRKWl 

[PIRKWJ 

[PIRKWJ 

{PIRKW} 

[PIRKWJ 

[PIRKWJ 

[PIRKWJ 

[PIRKWJ 

(PIRKWJ 

[PIRKWJ 

(PIRKWJ 

(PIRKWJ 

(PIRKWJ 

[PIRKWJ 

[PIRKWJ 

(PIRKWJ 

(PIRKWJ 

[PIRKWJ 

(SUPFAMJ 

(SUPFAMJ 

(SUPFAMJ 

(SUPFAMJ 

(SUPFAMJ 

(SUPFAMJ 

[SUPFAMJ 

(PROSITEJ 

(PROSITEJ 

[PFAMJ 

(KWJ 

(KWJ 

(KWJ 


phosphoprotein 7e-25 

glycoprotein le-I34 

thyroid hormone biosynthesis 2e-47 

carboxylic ester hydrolase 0.0 

monomer 2e-42 

disulfide bond 2e-3l 

mammary gland 3e-52 

alternative splicing 5e-74 

iodine 2e-47 

pyroglutamic acid 6e-39 

hydrolase le-135 

muscle 3e-73 

thyroid gland 2e-47 

membrane protein 3e-73 

neurotransmitter degradation 3e-73 

cholesterol 3e-52 

homodimer 2e-47 

nerve 3e-73 

cholinesterase 0.0 

triacylglycerol lipase le-32 

cholinesterase homology 0.0 

thyroglobulin 2e-47 

thyroglobulin type I repeat homology 2e-47 
juvenile -hormone esterase 2e-35 
probable lipolytic protein ybaC le-07 
CARB0XYLESTERASE_B_2 1 ^ 
CARB0XYLESTERASE_B_1 I 
Carboxylesterases ~ 
Alpha Beta 
3D 

LOW_COMPLEXITY 3.95 % 


SEQ 
SEG 
lacj- 

SEQ 
SEG 
lacj- 

SCQ 
SEG 
lacj- 

SEQ 
SEG 
lacj- 

SEQ 
SEG 
lacj- 

SEQ 
SEG 
lacj- 

SEQ 
SEG 
lacj- 

SEQ 
SEG 
lacj- 

SEQ 
SEG 
lacj- 

SEQ 
SEG 
lacj- 

SEQ 
SEG 
lacj- 


MTAQSRSPTTPTFPGPSQRTPLTPCPVQTPRLGKALIHCWTDPGQPLGECJQRVRRQRTET 
xxxxxxxx. . . 

SEPTMRLHRLRARLSAVACGLLLLLVRGQGQDSASPIRTTHTGQVLGSLVHVKGANAGVQ 
XXXXX 

ETTEEEECEEEEETTEE— EE 

TFLGIPFAKPPLGPLRFAPPEPPESWSGVRDGTTHPAMCLQDLTAVESEFLSQFNMTFPS 
EEEEEECEETTTGGGTTTCCEECCCCCCEEECrcCCCBCCCCCCm 
DSMSEDCLYLSIYTPAHSHEGSNLPVMVWIHGGALVFGMASLYDGSMLAALENVWVIIQ 
CCBTTTTCEEEEEET--TTTTTTEEEEEEECT'iTT^^ 

YRLGVLGFFSTGDKHATGNWGYLDQVAALRWVQQNIAHFGGNPDRVTIFGESAGGTSVSS 
CCCCGGGCCCTTTTTTTCCHHHHHHHHHHHHHHHCGGGGCBE^ 

LWSPISQGLFHGAIMESGVALLPGLIASSADVISTWAKLSACDQVDSEALVGCLRGKS 

KEEILAINKPFKMIPGVVDGVFLPRHPQELLASADFQPVPSIVGVNNNEFGWLIPK7MRI 
HHHHHHHHTCCCTTTCBTTTTTTTTTHHHHHHHTTTCCCCEEE^ 

YDTQKEMDREASQAALQKMLTLLMLPPTFGDLLREEYIGDNGDPQTLQAQFQEMMADSMF 
TTTCCCCCHHHHHHHHHHHTTOTCHHHHH^ 

VIPALQVAHFQCSRAPVYFYEFQHQPSWLKNIRPPHMKADHVKFTEEEEQLSRKMMKYWA 
HHHHHHHHHHHHCCCCEEEEEECCCC 

NFARNGNPNGEGLPHWPLFiX)EEQYLQLNLQPAVGRALKAHRLQFWKKALPQKIQELEEP 

XXXXX 

HHHHHCCCCCCC— CCCCBTTTTBEEEECCCCCEEETTTHHHHHHHHHHHHH. 

EERHTEL 
xxxxxx . 


Prosite for DKFZphtesS 3Sn9.3 
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PS00122 
PS00941 


279->295 
185->196 


CARB0XYLESTERASE_B_1 
CARBOXYLESTERASE B 2 


PDOC00112 
PDOC00112 


Pfam for DKFZphtes3_35n9 . 3 


HMM^NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 


Carboxylesterases 


69 


*MfMnwlimFLLwinItWli . WheqaprpPdPyiVdtnnCGklRGmNedtD 
+ ^L+++ ++++++++ I T+ G + G ++ + 

RLRARLSAVACGLLLLLVRGQGQDSASP— IRTTHT-GQVLGSLVHVK 


NG . . pYYvFlGIPYAEPPVGNLRFKePQPYhePWtNVWNATnYPPMCMQW 
+ + +FLGIP+A+PP+G LRF +P+P +E W++V++ T+ P MC+Q+ 
114 GANAGVQTFLGIPFAKPPLGPLRFAPPEP-PESWSGVRDGTTHPAMCLQD 

ndFGFWlFdmieMWNeniP. .eMSEDCLYLNVWTPWnrkPNskLPVMVWI 
+++ +++N++ P +MSEDCLYL+++TP+ + ++S+LPVMVWI 
163 LTAV— ESEFLSQFNMTFPSDSMSEDCLYLSIYTPAHSHEGSNLPVMVWI 

HGGGFMFGSGhsYPliqYDgeylMMeeNVIVVtlNYRLGPFGFLSTgDid 
HGG+++FG + ++YDG+ L++ ENV+W I+YRLG++GF+STGD + 

211 HGGALVFGMA SLYDGSMLAALENWWIIQYRLGVLGFFSTGDKH 

1 PPHGNWGLWDQRMALQWVQDNI An FGGDPNNIT I FGESAGGMS VHl HML 
+ GNWG++DQ++AL+WVQ+NIA+FGG+P+++TIFGESAGG+SV+ ++ 
256 AT — GNWG YLDQV AALRWVQON I AH FGGN PDRVT I FGES AGGTS VSS L W 

SYGGDNPPmfKqLFHRAIMQSGsAmcPWvIQsnyNaRqRAfRFArimGCN 
S P + +LFH AIM+SG A+ P++I S++ + +A++ C+ 

304 S PISQGLFHGAIMESGVALLPGLIASSA—DVISTVVANLSACD 

rmDssEMIqCLRsKPwEELWdACWnFWmWfYfPFlPWFFgPVIDGDDaPE 
+ DS++++ CLR K+ EE++++++ +F + + +DG+ 
346 QVDSEALVGCLRGKSKEEILAINK PFKMIPGV VDGV 


382 


aFIPDHPeeMI kEGkFnDVPWl IGYNnDEGiWFapMmMnf nWfdEDeWId 
F+P+HP+E++++ F VP I+G+NN E++W++P M + + +E++ 

-FLPRHPQELLASADFQPVPSIVGVNNNEFGWLIPKVMRIYDT-QKEMDR 


itNedWyeWMPYIlFYrddmsNikDMDDYiDkvyEeYPgWWDrFPqESYW 
++ + ++ M +L + + + D ++EEY+G+ + PQ 

430 EASQAALQKMLTLLMLPPT-F GDLLREEYIGDNGD-PQTLQA 

nLqDMFTDYLFWCPtRihadnHRkHwgsPVYMYeFDHPpSFGYgQFFtaWR 
+ +Q+M+ D F++P + ++H++ +PVY+YEF+H PS + 
470 QFQEMMADSMFVI P - - ALQVAHFQCSRAPVYFYEFQHQPSW LKN 


WWPpWMgvdH* 
+PP+M++DH 
512 IRPPHMKADH 


521 


♦tEEEiissMRinMMNYWINFAKhGNPNnthnglCWWPqYTsnEQYdMIMe 
TEEE+ +S R MM+YW+NFA++GNPN++ GL++WP ++++EQY++ + 
525 TEEEEQLS-RKMMKYWANFARNGNPNGE--GLPHWPLFDQEEQYLQLNL 


1 1 1 mi QmC rrar DP YCN FW * 
+ +++++ + FW 

571 QPAVGRALKAHR— LQFW 


113 


162 


210 


255 


303 


345 


381 


429 


469 


511 


570 


586 
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DKF2phtes3_35pl7 


group: testes derived 


DKFZphtes3 35pl7 encodes a novel 505 amino acid protein with weak similarity to 
Proteins of the armadillo family. 

Proteins of the armadillo family are involved in diverse cellular processes in higher 
?n^^rcPii»;.^°"^' '"'^ armadillo, beta-catenin and plakoglobins have duaJ functions 

subf^mfii^^f junctions and signalling cascades. Others, belonging to the import in-alpha- 
arLdTilo ?amiirSavf "^J^^ recognition and nuclear transport, while some members of the 
^f^^to L i^- unknown functions. The novel protein shows similarity to S. 

repeals ^""""^^^ ^''^ ''^"^^ "^'^ b-catenin, but contains no armadillo (arm) 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

genes^" Protein can find application in studying the expression profile of testis-specific 

similarity to S.cerevisiae VAC8 

complete cDNA, complete cds, few EST hits 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 1966 bp 

Poly A stretch at pos. 1956, polyadenylation signal at pos. 1935 


1 AAGTCAAATG TAAGATTGGT TCATTAAAAA TACTGAAGGA AATCAGTCAT 
51 AATCCTCAAA TCAGACAGAA TATTGTT6AC CTTGGGGGCT TACCAATTAT 
101 GGTGAATATA CTTGATTCTC CACACAAGAG TCTAAAATGT TTGGCAGCCG 
151 AGACTATCGC GAATGTTGCC AAGTTTAAAA GAGCACGGCG GGTGGTGAGG 
201 CAGCACGGGG GTATCACCAA ACTGGTTGCT CTACTAGACT GTGCACATGA 
251 TTCCACAAAA CCTGCCCAAT CGAGTCTGTA TGAGGCCAGA GACGTGGAAG 
301 TGGCTCGCTG TGGGGCACTG GCCCTGTGGA GCTGCAGTAA GAGTCATACG 
351 AATAAAGAAG CCATCCGCAA AGCTGGGGGC ATTCCTCTGT TGGCTCGGCT 
401 GCTGAAGACT TCTCATGAAA ACATGCTAAT TCCAGTGGTG GGGACATTGC 
4 51 AAGAGTGTGC ATCAGAGGAA AACTACCGGG CTGCAATCAA AGCAGAAAGG 
501 ATCATTGAAA ACCTTGTCAA GAACCTAAAT AGTGAGAATG AGCAGCTGCA 
551 GGAGCACTGC GCCATGGCCA TTTACCAGTG TGCTGAAGAT AAGGAAACCC 
601 GGGACCTCGT TAGGCTGCAC GGAGGACTTA AGCCCTTGGC CAGTCTACTC 
651 AATAACACTG ACAATAAAGA GCGGTTAGCT GCTGTCACAG GGGCTATATG 
701 GAAATGTTCC ATCAGCAAAG AGAATGTTAC CAAGTTTCGG GAATACAAAG 
751 CCATTGAAAC CTTGGTGGGA CTTCTAACAG ATCAGCCTGA AGAAGTACTT 
801 GTGAATGTGG TTGGGGCCTT GGGAGAATGC TGCCAAGAAC GTGAAAACCG 
8 51 AGTCATTGTC CGGAAATGTG GTGGCATTCA ACCACTTGTG AACCTCCTTG 
901 TTGGAATAAA CCAAGCTCTT CTTGTGAATG TTACAAAAGC AGTTGGTGCT 
951 TGTGCAGTAG AACCTGAAAG TATGATGATA ATTGATCGCT TAGATGGAGT 
1001 TCGTTTGTTG TGGTCCCTGC TGAAAAATCC TCACCCAGAC GTGAAGGCCA 
1051 GCGCAGCATG GGCACTCTGT CCATGCATCA AAAATGCAAA GGATGCTGGG 
1101 GAAATGGTTC GTTCCTTTGT TGGTGGTTTG GAACTTATTG TCAATTTACT 
1151 GAAATCAGAT AACAAAGAAG TTCTGGCAAG TGTATGTGCT GCCATTACCA 
1201 ACATAGCAAA AGATCAAGAA AATTTAGCTG TTATCACAGA TCATGGAGTT 
1251 GTTCCTTTAT TGTCCAAACT GGCAAATACA AATAACAATA AATTGAGACA 
1301 TCATCTAGCA GAAGCTATTT CACGTTGCTG TATGTGGGGC AGGAATAGAG 
1351 TGGCCTTCGG TGAGCACAAA GCAGTGGCTC CACTAGTGCG TTATCTGAAA 
1401 TCAAATGACA CCAACGTGCA TCGGGCGACA GCTCAGGCCT TGTACCAACT 
1451 CTCAGAAGAC GCCGATAACT GCATCACCAT GCATGAGAAT GGTGCAGTAA 
1501 AGCTTCTACT GGATATGGTT GGGTCCCCTG ACCAGGATCT CCAGGAAGCT 
1551 GCAGCTGGTT GTATATCCAA TATCCGCAGG CTGGCTCTTG CTACAGAGAA 
1601 GGCAAGATAC ACTTGAAATT TAAATGGACA TTACAAGCTA TCAAATTCTA 
1651 CATGACACAG GACATGTCAC TCCCATGGCC AGAAAGCCTA AATTGGGAAA 
1701 CAGTTGTTAG CAAACCCTTT CAACCATCTA AATGAAAACA CACAAATTGA 
1751 AAATGCACAG AATGTTTTTC ATCTGAAAAT TGCATGGAGA CTTTTGTTTC 
1801 TATTTAATGT TTTCGAGATA TGACATGTGA TAAGATGGAA AGCCAATAAA 
1851 CCTGTGATAA GTTTCTAAGA ATATGAGAAT ATACGTATAT GATGTATTTT 
1901 TAGTTCAGTG ATGCTTTTGT ATTTGTGGCG ATTTTAATAA AGGATATGGC 
1951 CTTCCCAAAA AAAAAA 


BLAST Results 


No BLAST result 
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Medline entries 


98413148: 

Yel013p (Vac8p), an armadillo repeat protein related to plakoglobin and 
importin alpha is associated with the yeast 
vacuole membrane . 

98330438: 

,YEB3/VAC8 encodes a myristylated armadillo protein of the Saccharomyces 
cerevisiae vacuolar membrane that 
functions in vacuole fusion and inheritance. 

98158703: 

Vac8p, a vacuolar protein with armadillo repeats, functions in both 
vacuole inheritance and protein targeting from the 
cytoplasm to vacuole. 


Peptide information for frame 3 


ORF from 99 bp to 1613 bp; peptide length: 505 
Category: similarity to known protein 
Classification: unset 


1 MVNILDSPHK SLKCLAAETI ANVAKFKRAR RWRQHGGIT KLVALLDCAH 
51 DSTKPAQSSL YEARDVEVAR CGALALWSCS KSHTNKEAIR KAGGIPLLAR 
101 LLKTSHENML IPVVGTLQEC ASEENYRAAI KAERIIENLV KNLNSENEQL 
151 QEHCAMAIYQ CAEDKETRDL VRLHGGLKPL ASLLNNTDNK ERLAAVTGAI 
201 WKCSISKENV TKFREYKAIE TLVGLLTDQP EEVLVNWGA LGECCQEREN 
251 RVIVRKCGGI QPLVNLLVGI NQALLVNVTK AVGACAVEPE SMMIIDRLDG 
301 VRLLWSLLKN PHPDVKASAA WALCPCIKNA KDAGEMVRSF VGGLELIVNL 
351 LKSDNKEVLA SVCAAITNIA KDQENLAVIT DHGVVPLLSK LANTNNNKLR 
401 HHLAEAISRC CMWGRNRVAF GEHKAVAPLV RYLKSNDTNV HRATAQALYQ 
451 LSEDADNCIT MHSNGAVKLL LDMVGSPDQD LQEAAAGCIS NIRRLALATE 
501 KARYT 

BLASTP hits 

Ko BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_35pl7, frame 3 

PIR;S50446 VAC8 protein - yeast (Saccharomyces cerevisiae), N = 1, 
Score = 237, P » 7.8e-17 

PIR:T00403 T13E15.9 protein - Arabidopsis thaliana, N « 1, Score = 215, 
P = 4.9e-14 

TREMBL:DR41081_1 product: "b~catenin"; Danio rerio b-catenin mRNA, 

complete cds,, N = 1, Score = 195, P = 5.8e-12 


>PZR:S50446 VAC8 protein - yeaat (Saccharomyces cerevisiae) 
Length « 578 

HSPs: 

Score - 237 (35.6 bits). Expect = 7.8e-17, P = 7.8e-17 
Identities = 106/401 (26%), Positives = 177/401 (44%) 

Query: 92 AGGIPLLARLLKTSHENMLIPVVGTLQECASEENYRAAIKAERIIENLVKNLNSENEQLQ 151 

+GG PL A +N+ + L E Y + E ++E ++ L S++ Q+Q 

Sbjct: 45 SGG-PLKALTTLVYSDNLNLQRSAALAFAEITEKYVRQVSRE-VLEPILILLQSQDPQIQ 102 

Query: 152 EHCAMAIYQCAEDKETRDLVRLHGGLKPLASLLNNTDNKERIAAVTGAIWKCSISKENVT 211 

A+ A E + L+ GGL-l-PL + + DN E G I i- +N 

Sbjct: 103 VAACAALGNLAVNNENKLLIVEMGGLEPLINQMMG-DNVEVQCNAVGCITNLATRDDNKH 161 

Query: 212 KFREYKAIETLVGLLTDQPEEVLVNWGALGECCQERENRVIVRKCGGIQPLVNLLVGIN 271 

K A+ L L + V N GAL ENR + G + LV+LL ^- 

Sbjct: 162 KIATSGALIPLTKLAKSKHIRVQRNATGALLNMTHSEENRKELVNAGAVPVLVSLLSSTD 221 

Query: 272 QALLVNVTKAVGACAVEPESMMIIDRLDG— VRLLWSLLKNPHPDVKASAAWALCPCIKN 329 
+ T A+ AV+ + + + + V L SL+ +P VK A AL + 
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Sbjct : 

222 

Query: 

330 

Sbjct: 

282 

Query: 

388 

Sbjct: 

339 

Query: 

447 

Sbjct: 

398 

Score 

= 213 

Identities = 

Query: 

163 

Sbjct: 

36 

Query: 

222 

Sbjct: 

90 

Query: 

282 

Sbjct: 

150 

Query: 

342 ( 

Sbjct: 

208 ( 

Query: 

400 ] 

Sbjct: 

268 I 

Query: 

4 60 1 

Sbjct: 

328 I 

Score = 

180 1 


PDVQYYCTTALSNIAVDEANRKKLAQTEPRLVSKLVSLMDSPSSRVKCQATLALRNLASD 

AKDAGEMVRSFVGGLELIVNLLKSDNKE~VLASVCAAITNIAKDQENLAVITDHGVV-PL 
E+VR+ GGL +V L++SD+ VLASV A I NI+ N +1 D G + PL 

TSYQLEIVRA— GGLPHLVKLIQSDSIPLVLASV-ACIRNISIHPLNEGLIVDAGFLKPL 

LSKLANTNNNKLRHHLAEAISRCCMWG-RNRVAFGEHKAVAPLVRYLKSNDTNVHRATAO 
+ L ++ + + H + +NR F E AV + +v ++ 

VRLLDYKDSEEIQCHAVSTLRNLAASSEKNRKEFFESGAVEKCKELALDSPVSV-QSEIS 

ALYQLSEDAD-NCITMHENGAVKLLLDMVGSPDQDLQEAAAGCISNI 492 
A + + AD + + + E + L+ M S +Q++ AA ++N+ 
ACFAI LALADVSKLDLLEANI LDALI PMTFSQNQEVSGNAAAALANL 444 

(32.0 bits). Expect = 3.6e-l4, P = 3.6e-14 
= 81/341 (23%), Positives = 163/341 (47%) 

EDKETRDLVRLHGGLKPLASLLNNTD-NKERLAAVTGAIWKCSISKENVTKFREYKAIET 
EDK+ D G LK L +L+ + + N +R AA+ A I+++ V + + +E 

EDKDQLDFYS-GGPLKALTTLVYSDNLNLQRSAALAFA EITEKYVRQVSR-EVLEP 

LVGLLTDQPEEVLVNVVGALGECCQERENRVIVRKCGGIQPLVNLLVGINQALLVNVTKA 
tt.r \ Q V ALG EN+^^++ + GG++PL+N ++G N + N 

ILILLQSQDPQIQVAACAALGNLAVNNENKLLIVEMGGLEPLINQMMGDNVEVQCNAVGC 

VGACAVEPESMMIIDRLDGVRLLWSLLKNPHPDVKASAAWALCPCIKNAKDAGEMVRSFV 
1 ^ I + L L K+ H V+ +A AL + ++ E+V ♦ 

ITNLATRDDNKHKIATSGALIPLTKLAKSKHIRVQRNATGALLNMTHSEENRKELVNA— 

GGLELIVNLLKSDNKEVLASVCAAITNIAKDQENLAVI-TDHGWPLLSKLANTNNNKL 
G + ++V+LL S + +V A++NIA D+ N + T+ +V L L ++ ++++ 

GAVPVLVSLLSSTDPDVQYYCTTALSNrAVDEANRKKLAQTEPRLVSKLVSLMDSPSSRV 


281 
387 
338 
446 
397 


MCI 
NEG 


221 
89 
281 
149 
341 
207 
399 
267 
459 
327 


Identities = 


Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct : 
Query: 
Sbjct: 
Query : 
Sbjct: 
Query: 
Sbjct: 


145 
58 
205 
114 
265 
174 
325 
234 
385 
294 
445 
354 


Score = 155 
Identities ' 

Query: 60 

Sbjct: 93 

Query: 117 

Sbjct: 150 

Query: 176 


+ + G +K L+ ++ D + E +S +R LA ++EK R 

.IVDAGFLKPLVRLLDYKDSE— EIQCHAVSTLRNLAASSEKNR 369 

(27.0 bits). Expect «= 1.6e-10, P = 1.6e-10 
80/346 (23%), Positives = 142/346 (41%) 

SENEQLQEHCAMAIYQCAEDKETRDLVRLHGGLKPLASLLNNTDNKERLAAVTGAIWKCS 
S+N LQ A+A + E K R + R L+P+ LL + D + ++AA A+ + 

SDNLNLQRSAALAFAEITE-KYVRQVSR— EVLEPILILLQSQDPQIQVAACA-ALGMLA 

ISKENVTKFREYKAIETLVGLLTDQPEEVLVNVVGALGECCQERENRVIVRKCGGIOPLV 
+ + EN E +E L+ + EV N VG + +N-f + G + PL 

VNNENKLLIVEMGGLEPLINQMMGDNVEVQCNAVGCITNLATRDDNKHKIATSGALIPLT 

NLLVGINQALLVNVTKAVGACAVEPESMMI I DRLDGVRLLWSLLKNPHPDVKASAAWALC 

L + + N T A+ E+ + V +L SLL + PDV+ AL 

KLAKSKHIRVQRNATGALLNMTHSEENRKELVNAGAVPVLVSLLSSTDPDVQYYCTTALS 

PCIKNAKDAGEMVRSFVGGLELIVNLLKSDNKEVLASVCAAITNIAKDQENLAVITDHGV 
+ + ++++ + +V+L+ S + V A+ N+A D I G 

NIAVDEANRKKLAQTEPRLVSKLVSLMDSPSSRVKCQATLALRNLASDTSYQLEIVRAGG 

VPLLSKLANTNNNKLRHHLAEAISRCCMWGRNRVAFGEHKAVAPLVRYLKSNDTNVHRAT 
+ P L KL +++ L I + N + + PLVR L D+ + 

LPHLVKLIQSDSIPLVLASVACIRNISIHPLNEGLIVDAGFLKPLVRLLDYKDSEEIQCH 

A-QALYQLSEDAD-NCITMHENGAVKLLLDMVGSPDQDLQEAAAGCIS 490 
A L L+ ++ N E+GAV+ ++ +Q + C + 

AVSTLRNLAASSEKNRKEFFESGAVEKCKELALDSPVSVQSEISACFA 401 

(23.3 bits). Expect = 8.8e-08, P = 8.8e-08 
- 88/401 (21%), Positives = 175/401 (43%) 

LYEARD— VEVARCGALALWSCSKSHTNKEAIRKAGGI-PLLARLLKTSHENMLIPWGT 
L +++D ++VA C AL + + NK I + GG-h PL+ +++ + E + VG 

LLQSQDPQIQVAACAALG— NLAVNNENKLLIVEMGGLEPLINQMMGDNVE-VQCNAVGC 

LQECASEENYRAAIKAERIIENLVKNLNSENEQLQEHCAMAIYQCAEDKETR-DLVRLHG 
+ A+ +4 + I + L K S++ ++Q + A+ +E R +LV G 

ITNLATRDDNKHKIATSGALIPLTKLAKSKHIRVQRNATGALLNMTHSEENRKELVNA-G 

GLKPLASLLNNTDNKERLAAVTGAIWKCSISKENVTKFR^-EYKAIETLVGLLTDQPEEV 
+ L SLL++TD + T A+ ++ + N K E + + LV L+ V 


204 

113 

264 

173 

324- 

233 

384 

293 

444 

353 


116 
149 
175 
208 
233 
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Sbjct: 

209 

AVPVLVSLLSSTDPDVQYYCTT-ALSNI AVDEANRKKLAQTEPRLVSKLVSLMDS PSSRV 2 67 

Query : 

234 

LVNVVGALGECCQERENRVIVRKCGGIQPLVNLLVGINQALLVNVTKAVGACAVEPESMM 

293 



AL + ++ + + GG+ LV L+ + L++ + ++ P + 


Sbjct : 

268 

KCQATLALRNLASDTSYQLEIVRAGGLPHLVKLIQSDSIPLVLASVACIRNISIHPLNEG 

327 

Query: 

294 

IIDRLDGVRLLWSLLK-NPHPDVKASAAWALCPCIKNA-KDAGEMVRSFVGGLELIVNLL 

351 



+1 ++ L LL +++ A L ++ K+ E S G +E L 


Sbjct: 

328 

LIVDAGFLKPLVRLLDYKDSEEIQCHAVSTLRNLAASSEKNRKEFFES— GAVEKCKELA 

385 

Query: 

352 

KSDNKEVLA— SVCAAITNIAKDQENLAVITDHGWPLLSKLANTNNNKLRHHLAEAISR 

409 



V+ SCAI +AD L+++ ++ L + + N++ + A A++ 


Sbjct: 

386 

LDSPVSVQSEISACFAILALA-DVSKLDLL-EANILDALIP^f^FSQNQEVSGNAAAALAN 

443 

Query: 

4X0 

CCMWGRNRVAFGE HKAVAP-LVRYLKSNDTNVHRATAQALYQLSE 453 




C N E ++ + L+R+LKS+ + QL E 


Sbjct: 

AAA 

LCSRVNNYTKI XEAWDRPNEGIRGFLIRFLKSDYATFEHIALWTILQLLE 493 


Score 

= 139 

(20.9 bits). Expect ^ 5.0e-06, P 5.0e-06 



Identities = 80/329 (24%), Positives « 142/329 (43%) 


Query: 

37 

GGITKLVALLDCAHD-STKPAQ SSLYEARDVEVARCGALALWSCSKSHTNKEAIRKA 

92 



GITL DH+TA +L +++ + V R AL + + S N++ + A 


Sbjct: 

148 

GCITNLATRDDNKHKIATSGALIPLTKLAKSKHIRVQRNATGALLNMTHSEENRKELVNA 

207 

Query: 

93 

GGIPLLARLLKTSHENMLIPVVGTLQECASEE-NYRAAIKAE-RIIENLVKNLNSENEQL 

150 



G +P+L LL ++ ++ L A +E N + + E R++ LV ++S + ++ 


Sbjct: 

208 

GAVPVLVSLLSSTDPDVQYYCTTALSNIAVDEANRKKLAQTEPRLVSKLVSLMDSPSSRV 267 

Query: 

151 

QEHCAMAIYQCAEDKETR-DLVRLHGGLKPLASLLNNTDMKERLAAVTGAIWKCSISKEN 

209 



+ +A+ AD + ++VR GGL LL+ + D+ +A I SI N 


Sbjct: 

268 

KCQATLALRNLASDTS YQLEI VRA-GGLPHLVKLIQS-DS I PLVLASVACI RN I S I HPLN 

325 

Query: 

210 

VTKFREYKAI ETLVGLLT- DQPEEVLVN WGALGECCQERE-NRVI VRKCGGIQPLVNLL 

267 



+ ++ LV LL EE+ + VL ENR +G++ L 


Sbjct: 

326 

EGLIVDAGFLKPLVRLLDYKDSEEIQCHAVSTLRNLAASSEKNRKEFFESGAVEKCKELA 

385 

Query : 

268 

VG— INQALLVNVTKAVGACA-VEPESMMIIDRLDGVRLLWSLLKNPHPDVKASAAWA-L 

323 



+ ++ ++ A+ A A V ++ + LD + + + +N A+AA A L 


Sbjct: 

386 

LDSPVSVQSEISACFAILALADVSKLDLLEANILDAL'IPMTFSQNQEVSGNAAAALANL 

444 

Query: 

324 

CPCIKN-AKDAGBMVRSFVGGLELIVNLLKSO 354 




C + N K R G ++ LKSD 


Sbjct: 

445 

CSRVNNYTKI lEAWDRPNEGIRGFLIRFLKSD 476 


Score 

* 136 

(20.4 bits). Expect « l.le-05, P = l.le-05 


Identities = 72/304 (23%), Positives « 133/304 (43%) 


Query: 

58 

S SLYEARDVEVARCGALALWSCSKSHTNKEAI RiCA(3GI PLLARLLKTSHENMLI PWGTL 

117 



+ L +++ + V R AL + + S N++ + AG +P+L LL ++ ++ L 


Sbjct: 

173 

TKLAKSKH I RVQRNATGALLNMTHS EENRKELVNAGAVPVLVSLLSSTDPDVQY YCTTAL 232 

Query: 

118 

QECAS EE- NYRAAIKAE-RI I ENLVKNLNSENEQLQEHCAMAI YQC AEDKETR- DLVRLH 

174 



A +E N + + E R++ LV ++S + +++ +A+ AD + ++VR 


Sb j ct : 

233 

SNIAVDEANRKKLAQTEPRLVSKLVSLMDSPSSRVKCQATLALRNLASDTSYQLEIVRA- 

291 

Query: 

175 

GGLKPLASLLNNTDNKERLAAVTGAIWKCSISKENVTKFREYKAIETLVGLLT-DQPEEV 

233 



GGL L L+ + D+ + A I SI N + ++ LV LL EE+ 


Sbjct: 

292 

GGLPHLVKLIQS-DSIPLVLASVACIRNISIHPLNEGLIVDAGFLKPLVRLLDYKDSEEI 

350 

Query: 

234 

LVNWGALGECCQERE-NRVIVRKCGGIQPLVNLLVG~INQALLVNVTKAVGACA-VEP 

289 



+ V L E NR + G ++ L + ++ ++ A+ A A V 


Sbjct: 

351 

QCHAVSTLRNLAASSEKNRKEFFESGAVEKCKELALDSPVSVQSEISACFAILALADVSK 

410 

Query: 

290 

ESMMIIDRLDGVRLLWSLLKNPHPDVKASAAWA-LCPCIKN-AKDAGEMVRSFVGGLELI 

347 



++ + LD + + + +N A+AA ALC+NK RG + 


Sbjct: 

411 

LDLLEANI LDAL- 1 PMTFSQNQEVSGNAAAALANLCSRVNNYTKI I EAWDRPNEGI RGFL 

469 

Query: 

348 

VNLLKSD 354 




+ LKSD 


Sbjct: 

470 

IRFLKSD 476 


Score 

= 114 

(17.1 bits). Expect = 2.7e"03, P « 2.7e-03 


Identities = 

71/335 (21%), Positives « 132/335 (39%) 


Query: 

1 

MVNILDSPHKSLKCLAAETIANVAKFKRARRVVRQHGGITKLVALLDCAHDSTKPAQSSL 

60 



+ + SH++A +N+ +R++ G + LV+LL ST P 


Sbjct: 

172 

LTKLAKSKHIRVQRNATGALLNMTHSEENRKELVNAGAVPVLVSLLS STDP-- 

222 

Query: 

61 

YEARDVEVARCGALALWSCSKSHTNKEAI RKAGGI PLLARLLKTSHENMLI PWGTLQEC 

120 



DV+ AL+ + +++ KA++LL++ + L+ 
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Sbjct : 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 


223 
121 
279 
181 
339 
240 
399 
299 
459 


Score = 106 
Identities = 


Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 


65 
139 
123 
198 
181 
257 
241 
316 


DVQYYCTTALSNIAVDEANRKKLAQTEPRLVSKLVSLMDSPSSRVKCQATLALRNL 278 

^ff^^?[^?^^^f^^^ENLVKNLNSENEQLQEHCAMAIYQCAEDKETRDLVRLHGGLKPL 180 
AS+ +y+ I + +LVK + S++ L I + L+ G LKPL 

ASDTSYQLEIVRAGGLPHLVKLIQSDSIPLVLASVACIRNISIHPLNEGLIVDAGFLKPL 338 

ASLLNNTDNKERLAAVTGAIWKCSISKE-NVTKFREYKAIETLVGLLTDQPEEVLVNWG 239 
LL+ D++E + + S E N +F E A+E L D P V + 

VRLLDYKDSEEIQCHAVSTLRNLAASSEKNRKEFFESGAVEKCKELALDSPVSVQSEISA 398 

ALGECCQERENRVIVRKCGGIQPLVNLLVGINQAI.LVNVTKAVG-ACAVEPESMMIIDRL 298 
+++ + ++ L+ + NQ + N A+ C+ 11 + 

CFAILALADVSKLDLLEANILDALIPMTFSQNQEVSGNAAAALANLCSRVNNYTKIIEAW 458 

D GVR-LLWSLLKNPHPDVKASAAWALCPCIKNAKDAGE 335 

D G+R L LK+ + + A W + +++ D E 
DRPNEGIRGFLIRFLKSDYATFEHIALWTILQLLESHNDKVE 500 

(15.9 bits). Expect = 2.0e-02, P « 2.0e-02 
- 49/204 (24%), Positives = 89/204 (43%) 

^!^^w^'^S^^^"^"^^^^^"'^"*^^^^^^^^G^P^^«LLKTSHEN^^ 122 
+VEV +C A+ + + + NK I +G + L +L K+ H + G L S 

NVEV-QCNAVGCITNLATRDDNKHKIATSGALIPLTKLAKSKHIRVQRNATGALLNMTHS 197 

EENYRAAIKAERIIENLVKNLNSENEQLQEHCAMAIYQCAEDKETRD-LVRLHGGL-KPL 180 
EEN + + A + LV L+S + +Q +C A+ A D+ R L + L L 
EENRKELVNAGAV-PVLVSLLSSTDPDVQYYCTTALSNIAVDEANRKKIAQTEPRLVSKL 256 

ASLLNNTDNKERLAAVTGAIWKCSISKENVTKFREYKAIETLVGLLTDQPEEVLVNWGA 240 
SL+++ ++ + A T A+ + + + LV L+ +++ V 

VSLMDSPSSRVKCQA-TLALRNLASDTSYQLEIVRAGGLPHLVKLIQSDSIPLVLASVAC 315 

LGECCQERENRVIVRKCGGIQPLVNLL 267 
+ N ++ G 4-+PLV LL 

IRNISIHPLNEGLIVDAGFLKPLVRLL 342 


Pedant information for DKrzphtes3__35pl7, frame 3 


Report for DKFZphtes3_35pl7 . 3 


[LENGTH] 

[MW] 

IpIJ 

IHOMOLI 

(FUNCAT) 

[FUNCATJ 

8e-18 

[FUNCAT) 

[FUNCATJ 

( FUNCAT 1 

[FUNCATJ 

(BLOCKS J 

(BLOCKS J 

(SCOPJ 

( PIRKWJ 

[PIRKW] 

(PIRKWJ 

[PIRKWJ 

( PIRKW] 

[SUPFAM] 

(KWJ 

(KWJ 

(KWJ 


505 

55224.34 
8.43 

PIR:S50446 VACS protein - yeast (Saccharomyces cerevisiae) 2e-16 

nfin! lysosomal organization (S. cerevisiae, yEL013w] 8e-18 

06.04 protean targeting, sorting and translocation [s, cerevisiae, YEL013wJ 

oe'o? nn^r^^".^""^ lysosomal biogenesis (S. cerevisiae, YEL013wJ 8e-18 

08,01 nuclear transport [S. cerevisiae, YNL189w] 3e-06 

?nTn ^ control and mitosis [S. cerevisiae, YNL189wJ 3e-06 

Bl6i265C °^g^"^^^tion [s. cerevisiae, YNL189wI 3e-06 

BL00242A Integrins alpha chain proteins 

cytos^ 3e-ii^*^'^ beta-Catenin [Mouse (Mus musculus) 7e-18 

apoptosis 3e-ll 

carcinogenesis 3e-ll 

cell adhesion 3e-H 

cytos):eleton 3e-12 

pendulin le-07 

All_Alpha 

3D 

LOW^COMPLEXITY 2.38 % 


SEQ 
SEG 
2bct- 

SEQ 
SEG 
2bct- 

SEQ 
SEG 
2bct- 


MVNILDSPHKSLKCLAAETIANVAKFKRARRWRQHGGITKLVALLDCAHDSTKPAQSSL 
xxxxxxxxxxxx 

• HH 

YEARDVEVARCGALALWSCSKSHTNKEAIRKAGGIPLLARLLKTSHENMLIPVVGTLQEC 
HHCCCHHHHHHHiHAHiAAAicAAHHHHHAHCCHHAHiliA^ 

ASEENYRAAIKAERIIENLVKNLNSENEQLQEHCAMAIYQCAEDKETRDLVRLHGGLKPL 
HHTTTHHHHHHHHCHHHHHHHHHCCCCHHHHTO 
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SEQ ASLLNNTDNKERLAAVTGAIWKCSISKENVTKFREYKAIETLVGLLTDQPEEVLVNWGA 

SEG 

2bCt- HHHHH-HCCCHHHHHHHHHHHHHHCCCHHHHHHHHHCHHHHHHTTTTTCCHHHHHHHHHH 

SEQ LGECCQERENRVIVRKCGGIQPLVNLLVGINQALLVNVTKAVGACAVEPESMMI IDRLDG 

SEG 

2bct- H HHHHHCCCCTTTHHHHHHHHHHHHCTTTHHHHHHHHHTTTHHHHHHHH-HHCH 

SEQ VRLLWSLLKNPH PDVKAS AAWALCPCIKNAKDAGEMVRS FVGGLELI VN LLKS DNKE VLA 

SEG 

2bCt- HHHHHHHHHTTTHHHHHHHHHHHHHHHCCCCHH-HHHHHHHHHHHHHHHHCTTTTTHHHH 

SEQ SVCAAITNIAKDQENLAVITDHGVVPLLSKLANTNNNKLRHHLAEAISRCCMWGRNRVAF 

SEG 

2bCt- HHHHHHHHHHHCGGGHHHHHHHCHHHHHHHHHHHHHHTTTCCHHHHHHHHHHHHCHHHHH 

SEQ GEHKAVAPLVRYLKSNDTNVHRATAQALYQLSEDADNCITMHENGAVKLLLDMVGSPDQD 

SEG 

2bct- HTTTHHHHHHHHHCCCCHHHHHHHHHHHHHHHTTHHHHHHHHHCCHHHHHHHTTTTTTHH 

SEQ LQEAAAGCISNIRRLALATEKARYT 

SEG 

2bct- HHHHHHHHH 

(No Prosite data available for DKFZphtes3_35pl7 . 3) 
(No Pfam data available for DKFZphtes3_35pl7 . 3) 
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DKF2phtes3_35p22 


group: cell cycle 

DKFZphtes3_35p22 encodes a novel 549 amino acid protein, with similarity to oncogene 1 <tre-2 
locus) . 

The novel protein is closely raleted to human tre-2 and other enzymes involved in the 
degradation of ubiquitinated proteins. The human tre-2 oncogene encodes a deubiquitinating 
enzyme, indicating a role for the ubiquitin system in mammalian growth control. 

The novel -protein can find application in cancer diagnostics and treatment, and in regulating 
protein stability and growth control via regulation of ubiquitination. 


strong similarity to oncogene 1 (tre-2 locus) 
membrane regions: 1 

complete cDNA, complete cds, EST hits 
Sequenced by DKFZ 
Locus: map^"!?" 
Insert length: 2072 bp 

Poly A stretch at pos. 2062, polyadenylation signal at pos. 2039 

1 GTTACACACA GGCAGTGGTA TCTGTGAGCA GCTCTGTGGA CTCAAAGGTT 
51 TTCTCCCTGA GAGGCATGAC CCAGGCCAGC TGATTCATCA GAATCAGGAT 
101 GGACGTGGTA GAGGTCGCGG GCAGTTGGTG GGCACAAGAG CGAGAGGACA 
151 TCATTATGAA ATACGAAAAG GGACACCGAG CTGGGCTGCC AGAGGACAAG 
201 GGGCCTAAGC CTTTTCGAAG CTACAACAAC AACGTCGATC ATTTGGGGAT 
251 TGTACATGAG ACGGAGCTGC CTCCTCTGAC TGCGCGGGAG GCGAAGCAAA 
301 TTCGGCGGGA GATCAGCCGA AAGAGCAAGT GGGTGGATAT GCTGGGAGAC 
351 TGGGAGAAAT ACAAAAGCAG CAGAAAGCTC ATAGATCGAG CGTACAAGGG 
401 AATGCCCATG AACATCCGGG GCCCGATGTG GTCAGTCCTC CTGAACACTG 
451 AGGAAATGAA GTTGAAAAAC CCCGGAAGAT ' ACCAGATCAT GAAGGAGAAG 
501 GGCAAGAAGT CATCTGAGCA CATCCAGCGC ATCGACCGGG ACGTAAGCGG 
551 GACATTAAGG AAGCATATAT TCTTCAGGGA TCGATACGGA ACCAAGCAGC 
601 GGGAACTACT CCACATCCTC CTGGCATATG AGGAGTACAA CCCGGAGGTG 
651 GGCTACTGCA GGGACCTGAG CCACATCGCC GCCTTGTTCC TCCTCTATCT 
701 TCCTGAGGAG GATGCATTCT GGGCACTGGT GCAGCTGCTG GCCAGTGAGA 
751 GGCACTCCCT GCAGGGATTT CACAGCCCAA ATGGCGGGAC CGTCCAGGGG 
801 CTCCAAGACC AACAGGAGCA TGTGGTAGCC ACGTCACAAC CCAAGACCAT 
851 GGGGCATCAG GACAAGAAAG ATCTATGTGG GCAGTGTTCC CCGTTAGGCT 
901 GCCTCATCCG GATATTGATT GACGGGATCT CTCTCGGGCT CACCCTGCGC 
951 CTGTGGGACG TGTATCTGGT AGAAGGCGAA CAGGCGCTGA TGCCGATAAC 
1001 AAGAATCGCC TTTAAGGTTC AGCAGAAGCG CCTCACGAAG ACGTCCAGGT 
1051 GTGGCCCGTG GGCACGTTTT TGCAACCGGT TCGTTGATAC CTGGGCCAGG 
1101 GATGAGGACA CTGTGCTCAA GCATCTTAGG GCCTCTATGA AGAAACTAAC 
1151 AAGAAAGAAG GGGGACCTGC CACCCCCAGC CAAACCCGAG CAAGGGTCGT 
1201 CGGCATCCAG GCCTGTGCCG GCTTCACGTG GCGGGAAGAC CCTCTGCAAG 
1251 GGGGACAGGC AGGCCCCTCC AGGCCCACCA GCCCGGTTCC CGCGGCCCAT 
1301 TTGGTCAGCT TCCCCGCCAC GGGCACCTCG TTCTTCCACA CCCTGTCCTG 
1351 GTGGGGCTGT GCGGGAAGAC ACCTACCCTG TGGGCACTCA GGGTGTGCCC 
1401 AGCCCGGCCC TGGCTCAGGG AGGACCTCAG GGTTCCTGGA GATTCCTGCA 
14 51 GTGGAACTCC ATGCCCCGCC TCCCAACGGA CCTGGACGTA GAGGGCCCTT 
1501 GGTTCCGCCA TTATGATTTC AGACAGAGCT GCTGGGTCCG TGCCATATCC 
1551 CAGGAGGACC AGCTGGCCCC CTGCTGGCAG GCTGAACACC CTGCGGAGCG 
1601 GGTGAGATCG GCTTTCGCTG CACCCAGCAC TGATTCCGAC CAGGGCACCC 
1651 CCTTCAGAGC TAGGGACGAA CAGCAGTGTG CTCCCACCTC AGGGCCTTGC 
1701 CTCTGCGGCC TCCACTTGGA AAGTTCTCAG TTCCCTCCAG GCTTCTAGAA 
1751 GCATCTGGGC CAGGGCTCAT GGCTGGATAA TTTCCCTAGG CTTAACAACC 
1801 CAAGCAAGCT TCGCATCCTC GTTTTATTTT TGGTTAAACT TATGAAAATG 
1851 TATTAAGAAA GAGTGCAGCT CGAGAGAGAT TCAGAGATGG AACACACCAG 
1901 ACCCCAGATC ACAAAGCCAA CCATGCCCAG CCCCTCCCAG CACCCCCAGC 
1951 CCCACGACCA TCGTTCTGAA TTCTGACGAC ACCGTGAGCC TGCCTTTGTA 
2001 CTTCAAACTC ATGGAAGGAT AACCACCTTC ATGTTTTGAA ATAAATGTTT 
2051 CCTGTTGAAA TGAAAAAAAA AA 


BLAST Results 


Entry AC00397 6 from database EMBL: 

Homo sapiens chromosome 17, clone hCIT.91_J_4, complete sequence. 
Score - 4385, P = O.Oe+00, identities - 881/886 
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14 exons 

Entry HSG19723 from database EMBL: 
human STS A001W35. 
Score = 850, P « 1.9e-32, identities = 170/170 


Medline entries 


92228503: 

A novel transcriptional unit of the tre oncogene widely 
expressed in human cancer cells. 

94067315: 

The yeast D0A4 gene encodes a deubiquitinating enzyme 
related to a product of the human tre-2 oncogene. 

95176708: 

UBP5 encodes a putative yeast ubiquitin-specif ic protease 
that is related to the human Tre-2 oncogene product. 


Peptide information for frame 3 


ORF from 99 bp to 1745 bp; peptide length: 549 
Category: strong similarity to known protein 


1 MDVVEVAGSW WAQEREDIIM KYEKGHRAGL PEDKGPKPFR SYNNNVDHLG 
51 IVHETELPPL TAREAKQIRR EZ5RKSKWVD MLGDWEKYKS SRKLIDRAYK 
101 GMPMNIRGEW WSVLLNTEEM KLKNPGRYQI MKEKGKKSSE HIQRIDRDVS 
151 GTLRKHIFFR DRYGTKQREL LHILLAYEEY NPEVGYCRDL SHIAALFLLY 
201 LPEEDAFWAL VQLLASERHS LQGFHSPNGG TVQGLQDQQE HVVATSQPKT 
251 MGHQDKKDLC GQCSPLGCLI RILIDGISLG LTLRLWDVYL VEGEQALMPI 
301 TRIAFKVQQK RLTKTSRCGP WARFCNRFVD TWARDEDTVL KHLRASMKKL 
351 TRKKGDLPPP AKPEQGSSAS RPVPASRGGK TLCKGDRQAP PGPPARFPRP 
401 IWSASPPRAP RSSTPCPGGA VRCDTYPVGT QGVPSPALAQ GGPQGSWRFL 
451 QWNSMPRLPT OLOVEGPHFR HYDFRQSCWV RAISQEDQLA PCWQAEHPAE 
501 RVRSAFAAPS TDSDQGTPFR ARDEQQCAPT SGPCLCGLHL ESSQFPPGF 

BLASTP hits 

Ko BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_35p22« frame 3 

PIR:S22155 oncogene 1 (tre-2 locus) (clone 210) - human, N = 1, Score = 
2181, P = 5.5e-226 

PIR:S57867 oncogene 1 - human, N = 1, Score = 1536, P = 1.2e-157 


>PIR:S22155 oncogene 1 (tre-2 locus) (clone 210) - human 
Length = 786 

HSPs: 

Score = 2181 (327.2 bits), Expect = 5.5e-226, P = 5.5e-226 
Identities = 405/500 (81%), Positives = 440/500 (88%) 

Query: 1 MDWEVAGSWWAQEREDIINKYEKGKRAGLPEDKGPRPFRSYNNNVDHLGtVHETELPPL 60 

MD+VE A S AQER+DI+MKY+KGHRAGLPEDKGP+P N+++D GI+HETELPP+ 
Sbjct: 1 MDMVENADSLQAQERKDILMKYDKGHRAGLPEDKGPEPV-GINSSIDRFGILHETELPPV 59 

Query: 61 TAREAKQIRREISRKSKWVDMLGDWEKYKSSRKLIDRAyKGMPMNIRGP^?WSVLLNTEEM 120 

TAREAK+IRRE++R SKW++MLG+WE YK S KLIDR YKG+PMNIRGP+WSVLLN +E+ 
Sbjct: 60 TAREAKKIRREMTRTSKWMEMLGEWETYKHSSKLIDRVYKGIPMNIRGPVWSVLLNIQEI 119 

Query: 121 KLKNPGRYQIMKEKGKKSSEHIQRIDRDVSGTLRKHIFFRDRYGTKQRELLHILLAYEEY 180 

KLKNPGRYQIHKE+GK+SSEHI ID DV TLR H+FFRDRYG KQREL +ILLAY EY 
Sbjct: 120 KLRNPGRYQIMKERGRRSSEKIHHIDLDVRTTLRNHVFFRDRYGAKQRELFYILLAYSEY 179 

Query: 181 NPEVGYCRDLSHIAALFLLYLPEEDAFWALVQLLASCRHSLQGFHSPNGGTVQGLQDQQE 240 

NPEVGYCRDLSHI ALFLLYLPEEDAFWALVQLLASERHSL GFHSPNGGTVQGLQDQQE 
Sbjct: 180 NPEVGYCRDLSHITALFLLYLPEEDAFHALVQLLASERHSLPGFHSPNGGTVQGLQDQQE 239 
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Query: 241 HVVATSQPKTMGHQDKKDLCGQCSPLGCLIRILIDGISLGLTLRLWDVYLVEGEQALMPI 300 

HVV SQPKTM HQDK+ LCGQC+ LGCL+R LIDGISLGLTLRLWDVYLVEGEQ LMPI 
Sbjct: 240 HVVPKSQPKTMWHQDKEGLCGQCASLGCLLRNLIDGISLGLTLRLWDVYLVEGEQVLMPI 299 

Query: 301 TRIAFKVQQKRLTKTSRCGPWARFCNRFVDTWARDEDTVLKHLRASMKKLTRKKGDLPPP 360 

T lA KVQQKRL KTSRCG WAR N+F DTWA ++DTVLKHLRAS KKLTRK+GDLPPP 
Sbjct: 300 TSIALKVQQKRLMKTSRCGLWARLRNQFFDTWAMNDDTVLKHLRASTKKLTRKQGDLPPP 359 

Query: 361 AKPEQGSSASRPVPASRGGKTLCKGDRQAPPGPPARFPRPIWSASPPRAPRSSTPCPGGA 420 

AK EQGS A RPVPASRGGKTLCKG RQAPPGPPA+F RPI SASPP A R STPCPGGA 
Sbjct: 360 AKREQGSLAPRPVPASRGGKTLCKGYRQAPPGPPAQFQRPICSASPPWASRFSTPCPGGA 419 

Query: 421 VREDTYPVGTQGVPSPALAQGGPQGSWRFLQWNSMPRLPTDLDVEGPWFRHYDFRQSCWV 480 

VREDTYPVGTQGVPS ALAQGGPQGSWRFL+W SMPRLPTDLD+ GPWF HYDF +SCWV 
Sbjct: 420 VREDTYPVGTQGVPSLALAQGGPQGSWRFLEWKSMPRLPTDLDIGGPWFPHYDFERSCWV 479 

Query: 481 RAISQEDQLAPCWQAEHPAE 500 

RAISQEDQLA CWQAEH E 
Sbjct: 480 RAISQEDQLATCWQAEHCGE 499 


Pedant information for DKr2phtes3_35p22 , frame 3 


Report for DKFZphtes3_35p22 .3 


549 

62159.16 
9.23 

PIR:S22155 oncogene 1 (tre-2 locus) (clone 210) - human 0.0 

11.01 stress response (S. cerevisiae, YGRlOOwJ 2e-16 

04.05.01,04 transcriptional control [S. cerevisiae, YGRlOOw) 2e-16 

99 unclassified proteins [S. cerevisiae, YNL293w] 3e-15 

transmembrane protein 6e-14 

MYRISTYL 6 

AMI DAT ION 1 

CAMP_PHOSPHO_SITE 3 

CK2_PH0SPH0_SITE 4 

TYR_PHOSPHO_SITE 2 

PKC_PHOSPHO_SITE 10 

TRANSMEMBRANE 1 

LOW COMPLEXITY 5.28 % 


SEQ MOWEVAGSWWAQEREDI INKYEKGHRAGLPEDKGPKPFRSYNNNVDHLGIVHETELPPL 

SEG 

PRD ccceeeccchhhhhhhhhhhhhhccccccccccccccceeeeeccccccccccccccccc 

MEM 

SEQ T AREAKQI RRE I SRKSECWVDMLGOWEK YKSSRKLI DRAYKGMPMNI RGPMWS VLLHTEEM 

SEG 

PRD chhhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhcccccccccceeeccccccc 

MEM 

SEQ KLKNPGRYQIMKEKGKKSSEHIQRIDRDVSGTLRKHIFFRDRYGTKQRELLHILLAYEEY 

SEG 

PRD ccccccchhhhhhhccccchhhhhhhhhhhhccccccccccccccchhhhhhhhhhhhhc 

MEM 

SEQ NPEVGYCRDLSHIAALFLLYLPEEDAFWALVQLLASERHSLQGFHSPNGGTVQGLQDQQE 

SEG 

PRD ccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccchhhhhhhhhh 

MEM 

SEQ HVVATSQPKTMGHQDKKDLCGQCSPLGCLIRILIDGISLGLTLRLWDVYLVEGEQALMPI 

SEG 

PRD hhhhhhhchhhhhhhhccccccccchhhhhhhhhhccccchhhhhhhhhccccceeeehh 

MEM • . MMMMMMMMMMMMMMMMMM • 

SEQ TRIAFKVQQKRLTKTSRCGPWARFCNRFVDTWARDEDTVLKHLRASMKKLTRKKGDLPPP 

SEG 

PRD hhhhhhhhhhhhhhhcccchhhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhhhcccc 

MEM 

SEQ AKPEQGSSASRPVPASRGGKTLCKGDRQAPPGPPARFPRPIWSASPPRAPRSSTPCPGGA 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccceeeeccccccccccccccccccccccccccccccccccccc 

MEM 


[LENGTH] 

[MWl 

[pl] 

[HOMOL] 

[FUNCAT1 

[FUNCAT] 

{FUNCAT3 

(PIRKWj 

(PROSITEl 

IPROSITEI 

[PROSITEJ 

[PROSITE] 

[PROSITE) 

[PROSITE] 

(KWJ 

IKWl 
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SEQ 
5EG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 


VREDTYPVGTQGVPSPALAQGGPQGSWRFLQWNSMPRLPTDLDVEGPWFRHYDFRQSCWV 
cccccccccccccccccccccccccceeeeeccccccccccccccccccccccccccccc 

RAISQEDQLAPCWQAEHPAERVRSAFAAPSTDSDQGTPFRARDEQQCAPTSGPCLCGLHL 
cchhhhtihhhhhhhhhcchhhhhhhhccccccccccccccchhhhhcccccccccceeee 

ESSQFPPGF 

ccccccccc 


Prosite for DKFZphtes3_35p22 . 3 


PS00004 
PS00004 
PS00004 
PS00005 

PS00005 
PS00005 
PS00005 
PSOOOOS 
PS00005 
PSOOOOS 
PSOOOOS 
PSOOOOS 
PSOOOOS 
PS00006 
PS00006 
PS00006 
PS00006 
PS00007 
PS00007 
PSOOOOS 
PSOOOOS 
PSOOOOS 
PSOOOOS 
PSOOOOS 
PSOOOOS 
PS00009 


136->140 
310->314 
348->352 
61->64 

73->76 
90->93 
152->155 
216->219 
282->285 
315->318 
346->349 
351->354 
446->449 
61->65 
460->464 
484->488 
511->515 
93->100 
92->100 
8->14 
101->107 
230->236 
276->282 
366->372 
441->447 
134->138 


CAMP_PHOSPHO_SITE 
CAMP PHOSPHO_SITE 
CAMPj^PHOS PHO_SITE 
PKC_PHOSPHO_S I TE 

PKC_PH0SPHO_SITE 

PKC_PHOSPHO_SITE 

PKC PH0SPHO_SITE 

PKC"PHOSPHO_SITE 

PKC"PHOSPHO_SITE 

PKC_PHOS PHO_S I TE 

P KCPHOS PHO_S I T E 

PKC_PH0SPHO_SITE 

PKC_PHOSPKO_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

TYR_PHOS PHO_S I TE 

TYR_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

AMIDATION 


PDOC00004 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDCX:00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOCOOOOS 
PDOC00005 
PDOC00006 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOC00007 
PDOC00007 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
POOC00009 


(No Pfam data available for DKFZphtes3_35p22 .3) 
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DKFZphtes3_4b4 


group: testes derived 

DKFZphtes3_4b4 encodes a novel 497 amino acid protein similar to SCP proteins and a human 
trypsin inhibitor. 

The novel protein contains an extracellular proteins SCP/Tpx-X/Ag5/PR-l/Sc7 signature 2, 
predicted by Prosite and Pfam. This domain is found in a variety of extracellular proteins 
from eukaryotes that have been found to be evolutionary related. The exact function of these 
proteins is not yet known. In addition, the protein is similar to a human trypsin inhibitor. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes or as a new protease inhibitor. 


strong similarity to trypsin inhibitor 
might be a new protease inhibitor? 

Sequenced by AGOWA 

Locus: /raap="333.4 cR from top of Chrl6 linkage group- 
Insert length: 4574 bp 

Poly A stretch at pos. 4551, polyadenylation signal at pos. 4539 


1 GGCGGCTGCT CCCATTGAGC TGTCTGCTCG CTGTGCCCGC TGTGCCTGCT 
51 GTGCCCGCGC TGTCGCCGCT GCTACCGCGT CTGCTGGACG CGGGAGACGC 
101 CAGCGAGCTG GTGATTGGAG CCCTGCGGAG AGCTCAAGCG CCCAGCTCTG 
151 CCCGAGGAGC CCAGGCTGCC CCGTGAGTCC CATAGTTGCT GCAGGAGTGG 
201 AGCCATGAGC TGCGTCCTGG GTGGTGTCAT CCCCTTGGGG CTGCTGTTCC 
251 TGGTCTGCGG ATCCCAAGGC TACCTCCTGC CCAACGTCAC TCTCTTAGAG 
301 GAGCTGCTCA GCAAATACCA GCACAACGAG TCTCACTCCC GGGTCCGCAG 
351 AGCCATCCCC AGGGAGGACA AGGAGGAGAT CCTCATGCTG CACAACAAGC 
401 TTCGGGGCCA GGTGCAGCCT CAGGCCTCCA ACATGGAGTA CATGACCTGG 
451 GATGACGAAC TGGAGAAGTC TGCTGCAGCG TGGGCCAGTC AGTGCATCTG 
501 GGAGCACGGG CCCACCAGTC TGCTGGTGTC CATCGGGCAG AACCTGGGCG 
551 CTCACTGGGG CAGGTATCGC TCTCCGGGGT TCCATGTGCA GTCCTGGTAT 
601 GACGAGGTGA AGGACTACAC CTACCCCTAC CCGAGCGAGT GCAACCCCTG 
651 GTGTCCAGAG AGGTGCTCGG GGCCTATGTG CACGCACTAC ACACAGATAG 
701 TTTGGGCCAC CACCAACAAG ATCGGTTGTG CTGTGAACAC CTGCCGGAAG 
751 ATGACTGTCT GGGGAGAAGT TTGGGAGAAC GCGGTCTACT TTGTCTGCAA 
801 TTATTCTCCA AAGGGGAACT GGATTGGAGA AGCCCCCTAC AAGAATGGCC 
851 GGCCCTGCTC TGAGTGCCCA CCCAGCTATG GAGGCAGCTG CAGGAACAAC 
901 TTGTGTTACC GAGAAGAAAC CTACACTCCA AAACCTGAAA CGGACGAGAT 
951 GAATGAGGTG GAAACGGCTC CCATTCCTGA AGAAAACCAT GTTTGGCTCC 
1001 AACCGAGGGT GATGAGACCC ACCAAGCCCA AGAAAACCTC TGCGGTCAAC 
1051 TACATGACCC AAGTCGTCAG ATGTGACACC AAGATGAAGG ACAGGTGCAA 
1101 AGGGTCCACG TGTAACAGGT ACCAGTGCCC AGCAGGCTGC CTGAACCACA 
1151 AGGCGAAGAT CTTTGGAACT CTGTTCTATG AAAGCTCGTC TAGCATATGC 
1201 CGCGCCGCCA TCCACTACGG GATCCTGGAT GACAAGGGAG GCCTGGTGGA 
1251 TATCACCAGG AACGGGAAGG TCCCCTTCTT CGTGAAGTCT GAGAGACACG 
1301 GCGTGCAGTC CCTCAGCAAA TACAAACCTT CCAGCTCATT CATGGTGTCA 
1351 AAAGTGAAAG TGCAGGATTT GGACTGCTAC ACGACCGTTG CTCAGCTGTG 
1401 CCCGTTTGAA AAGCCAGCAA CTCACTGCCC AAGAATCCAT TGTCCGGCAC 
1451 ACTGCAAAGA CGAACCTTCC TACTGGGCTC CGGTGTTTGG AACCAACATC 
1501 TATGCAGATA CCTCAAGCAT CTGCAAGACA GCCGTGCACG CGGGAGTCAT 
1551 CAGCAACGAG AGTGGGGGTG ACGTGGACGT GATGCCCGTG GATAAi\AAGA 
1601 AGACCTACGT GGGCTCGCTC AGGAATGGAG TTCAGTCTGA AAGCCTGGGG 
1651 ACTCCTCGGG ATGGAAAGGC CTTCCGGATC TTTGCTGTCA GGCAGTGAAT 
1701 TTCCAGCACC AGGGGAGAAG GGGCGTCTTC AGGAGGGCTT CGGGGTTTTG 
1751 CTTTTATTTT TATTTTGTCA TTGCGGGGTA TATGGAGAGT CAGGAAACTT 
1801 CCTTTGACTG ATGTTCAGTG TCCATCACTT TGTGGCCTGT GGGTGAGGTG 
1851 ACATCTCATC CCCTCACTGA AGCAACAGCA TCCCAAGGTG CTCAGCCGGA 
1901 CTCCCTGGTG CCTGATCCTG CTGGGGCCCG GGQGTCTCCA TCTGGACGTC 
1951 CTCTCTCCTT TAGAGATCTG AGCTGTCTCT TAAAGGGGAC AGTTGCCCAA 
2001 AATGTTCCTT GCTATGTGTT CTTCTGTTGG TGGAGGAAGT TGATTTCAAC 
2051 CTCCCTGCCA AAAGAACAAA CCATTTGAAG CTCACAATTG TGAAGCATTC 
2101 ACGGCGTCGG AAGAGGCCTT TTGAGCAAGC GCCAATGAGT TTCAGGAATG 
2151 AAGTAGAAGG TAGTTATTTA AAAATAAAAA ACACAGTCCG TCCCTACCAA 
2201 TAGAGGAAAA TGGTTTTAAT GTTTGCTGGT CAGACAGACA AATGGGCTAG 
2251 AGTAAGAGGG CTGCGGGTAT GAGAGACCCC GGCTCCGCCC TGGCACGTGT 
2301 CCTTGCTGGC GGCCCGCCAC AGGCCCCCTT CAATGGCCGC ATTCAGGATG 
2351 GCTCTATACA CAGCAGTGCT GGTTTATGTA GAGTTCAGCA GTCACTTCAG 
2401 AGATGTATCT TGTCTTTGTC AGGCCCTTCA TCTTCATGGC CCACCTGTTT 
2451 TCTGCCGTGA CCTTTGGTCC CATTGAGGAC TAAGGATCGG GACCCTTTCT 
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2501 TTACCCCCTA CCCATTGTGG CTCCCACCCT GCCTCGGACT GGTTTACGTG 
2551 TCCTGGTTCA CACCCAGGAC TTTTCTTTGC AAGCGAACCT GTTTGAAGCC 
2601 CAAGTCTTAA CTCCTGGTCT CGTAAGGTTC CACTGAGACG AGATGTCTGA 
2651 GAACAACCAA AGAAGGCCTG CTCTTTGCTG CTTTTAAAAA ATGACAATTA 
2701 AATGTGCAGA TTCCCCACGC ACCCGATGAC CTATTTTTTC AGCCGTGGGA 
2751 GGAATGGAGT CTTTGGTACA TTCCTCACCG AGGTTAGCAG CTCAGTTTGT 
2801 GGTTATGAAA CCGTCTGTGG CCTCATGACA GCGAGAGATG GGAATACACT 
2851 AGAAGGATCT CTTTTCCTGT TTTCGTGAAA CGACTCTTGC CAAACGTTCC 
2 901 CGAGGCGCCA AGGAGTGTAG TACACCCTGG CTGCCATCAC TCTATAAAAG 
2951 TGCTTCATGA GCCCAGACCA AAAGCCCACA GTGAAATGAA GTACCCTTTT 
3001 GTAAATAGCA TTTTTTTGCA GAAGGTGAAA ATTCCACTCT CTACCACCGG 
3051 GCCAGCCAAT AGATCACTTT GGTGAATGCT AGTTTCAAAT TTGATTCAAA 
3101 ATATTTCTTA GGTGAAAGAA CTAGCAGAAA GTCAAAAACT AAGATACTGT 
3151 AGACTGGACA AGAAATTCTA CCTGGGCACC TAGGTGATGC CTTCTTTCTT 
3201 TGATTGCCTT TCTAATAAAT GCAGAATCTG AAGGTAAATA GGTTTAAAAC 
3251 AAAACAAAAA CCCACCCCTT TAAGGAGTTG GTAAAAAGCA GTTCAACTCT 
3301 TAGCTTGACT GAGCTAAAAT TCACAGGACT ACGTGCTTTG TGCATTGTAG 
3351 TCTAGTCGTA ATTCATAGGT ACTGACTCCT CAGCCCCAAA TGTCGGAGAG 
3401 GAAGAATTCG GTCAGCCTGT CAGGTCGTGA GTCCAGTTAC CACCAAACAT 
3451 CTGGGAAACT TCTGGGTGCT GGGTGCTCTG CTGCTGGACT TTTGTGGCTG 
3501 TGTCTGTGTC TGCAAGATAA ATTAGATCGC CCTGTGGGGT TTGCAGAATT 
3551 AGTGAAGGGT CCAGGACGAT CCCAGTGGGC TCGCTTCCAA AGCATCCCAC 
3601 TCAAGGGAGA CTTGAAACTT CCAGTGTGAG TTGACCCCAT CATTTAAAAA 
3651 TAAAGTCCCC GGGTTCCTTA ATGCCTCCTT CACTGGGCCT TCCTAGCAGG 
3701 ATAGAAAGTC CTTGCCCAGA GCAGGACCTG GCTGTCTTTT tTTTTTTTTT 
3751 TTTCCCGAGA CCAAGTTTCA CTCTGTTGCC CAAGGTAGAG TGCAGTGGCG 
3801 TGATCTCTGC TCATTGCAAC TGCCGCCTCC CGGGTTCAAG CAATTCTCAT 
3851 GCATCAGCCT CCCAAGTACC TGGGACTACA GGCGTGAGCT ACCATGCCCG 
3901 GCTAATTTTT GTATTTTTAG TAGAGATGGG GTTTCATTAT GTTGGCCAGG 
3951 CTGGTCTCGA ACTCCTTACC TCAGGTGATC CACCCACCTT GGCCTCCCGA 
4001 AGTGCTGGGA TTACAGGCAT GAGCCACTGC GCCCGGCCAT GGACCTGGCT 
4051 GTCTTTATCA TCCCCACAAA CATTTTGAAA CTGGAATATT TGTCTTCAGA 
4101 AAATGGAAAC AAGACTATAA ATGATAAGCC CTGTCCCTAG CACCACCTCT 
4151 CCTGTGTGTG GAATAGAGGC CCCTCGTGCT ACCAACACTT ACCCTGTGTT 
4201 TAAAAAGATC TTGTACCAAG CCAACGGCGT TCCTGGCTCT CCTGCCCACA 
4251 GGATGAACAT TTTCGGCTTC CTTAGGAGTT TTGCCCTACC GTATTCCAAA 
4301 GCGTGTGCTG GTTTCTCATA TTGTCTGTAG GCTCACTCAG CCCGCAGTTT 
4351 ATGTGTGTGC TTTTTTCTAT GAAAAATGAT GTATTTTGCT ACTTCCTGTG 
4401 TACAAAGTTT TATTGTAAAT GTTTTTTGTG CTTTGCATGA ACAGGGGCCA 
4451 CGTTGTTGCA ATTGTTTCAG TAGAACTGGT TTGATTTCTA AAATGTTCCT 
4501 GTAACATATC TTTTATGAAC AAATCTGAAC AATTTGTGAA ATAAAACATT 
4551 GAAAACCAAA AAAAAAAAAA AAAA 


BLAST Results 


Entry HS834352 from database EMBL: 
human STS wr-15502. 
Score = 1331, P = 5.4e-54, identities ^ 281/301 


Medline entries 


98146272: 

cDNA Cloning of a novel trypsin inhibitor with similarity to 

pathogenesis-related proteins, and its 

frequent expression in human brain cancer cells. 


Peptide information for frame 1 


ORF from 205 bp to 1695 bp; peptide length: 497 
Category: strong similarity to known protein 


1 MSCVLGGVIP LGLLFLVCGS QGYLLPNVTL LEELLSKYQH NESHSRVRRA 
51 IPREDKEEIL MLHNKLRGQV QPQASNMEYM TWDDELEKSA AAWASQCIWE 
101 HGPTSLLVSI GQNLGAHWGR YRSPGFHVQS WYDEVKDYTY PYPSECNPWC 
151 PERCSGPMCT HYTQIVWATT NKIGCAVNTC RKMTVWGEVW ENAVYFVCNY 
201 SPKGNWIGEA PYKNGRPCSE CPPSYGGSCR NNLCYREETY TPKPETDEMN 
251 EVETAPIPEE NHVWLQPRVM RPTKPKKTSA VNYMTQVVRC DTKMKDRCKG 
301 STCNRYQCPA GCLNHKAKIF GTLFYESSSS ICRAAIHYGI LDDKGGLVDI 
351 TRNGKVPFFV KSERHGVQSL SKYKPSSSFM VSKVKVQDLD CYTTVAQLCP 
401 FEKPATHCPR IHCPAHCKDE PSYWAPVFGT NIYADTSSIC KTAVHAGVIS 


872 


wo 01/12659 


PCT/IBOO/01496 


451 NESGGDVDVM PVDKKKTYVG SLRNGVQSES LGTPRDGKAF RIFAVRQ 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_4b4, frame 1 

TREMBLNEW:AF109674_1 gene: "Lgll"; product: "late gestation lung 
protein 1"; Rattus norvegicus late gestation lung protein 1 (Lgll) 
mRNA, complete cds., N = 1, Score = 968, P = 1.9e-97 

TREMBL: 045027^1 product: "25 kDa trypsin inhibitor"; Homo sapiens mRNA 
for 25 kDa trypsin inhibitor, complete cds., K - 1, Score = 738, P 
4 . 5e-73 

TREMBL:AB009609_1 gene: "HrTT-1"; Halocynthia roretzi HrTT-1 mRNA, 
complete cds,, N « 1, Score = 345, P = 2e-31 

PIR:JC5308 testis-specific, vespid, and pathogenesis-related protein 1 
precursor - human, N - 1, Score = 337, P = 1.7e-30 


>TREMBLNEW:AF109674_1 gene: "Lgll"; product: "late gestation lung protein 

1"; Rattus norvegicus late gestation lung protein 1 (Lgll) mRNA, complete 
cds . 

Length = 188 

HSPs: 

Score = 968 (145.2 bits). Expect = 1.9e-97, P « 1.9e-97 
Identities = 160/185 (86%), Positives - 170/185 (91%) 

Query: 61 MLHNKLRGQVQPQASNMEYMTMDDELEKSAAAWASCJCIWEHGPTSLLVSIGQNLGAHWGR 120 

MLHNKLRGQV P ASNMEYMTWD+ELE+SAAAWA +C+WEHGP SLLVSIGQNL HWGR 
Sbjct: 1 MLHNKLRGQVYPPASNMEYMTWDEELERSAAAWAQRCLWEHGPASLLVSIGQNLAVHWGR 60 

Query: 121 YRSPGFHVQSWyDEVKDYTYPYPSECNPWCPERCSGPMCTHYTQIVWATTNKIGCAVNTC 180 

YRSPGFHVQSWYDEVKDYTYPYP ECNPWCPERCSG MCTHYTQ+VWATTNKIGCAV+TC 
Sbjct: 61 YRSPGFHVQSWYDEVKDYTYPYPHECNPWCPERCSGAMCTHYTQMVWATTNKIGCAVHTC 120 

Query: 181 RKMTVWGEVWENAVYFVCNYSPKGNWIGEAPYKNGRPCSECPPSYGGSCRNNLCYREETY 240 

R M+VWG++WENAVY VCNYSPKGNWIGEAPYK+GRPCSECP SYGG CRNNLCYREE Y 
Sbjct: 121 RSMSVWGDIWENAVYLVCNYSPKGNWIGEAPYKHGRPCSECPSSYGGGCRNNLCYREEHY 180 

Query: 241 TPKPE 24 5 
KPE 

Sbjct: 181 HQKPE 185 


Pedant information for DKFZphtes3_4b4, frame 1 


Report for DKF2phtes3_4b4 . 1 


(LENGTH) 497 

{MWJ 55920.00 

(pU 8.36 

[HOMOL] TREMBL: 045027^1 product: "25 kDa trypsin inhibitor"; Homo sapiens mRNA for 25 
JcDa trypsin inhibitor, complete cds. 6e-78 

[FUNCATJ 99 unclassified proteins (S. cerevisiae, YJL078cJ 8e-12 

(BLCXKSJ BL01009E Extracellular proteins SCP/Tpx-l/Ag5/PR-l/sc7 proteins 

[BLOCKS) BL01009D Extracellular proteins SCP/Tpx-l/Ag5/PR-l/Sc7 proteins 

(BLOCKS) BL01009C Extracellular proteins SCP/Tpx-l/Ag5/PR-l/Sc7 proteins 

(BLOCKS) BL01009A Extracellular proteins SCP/Tpx-l/Ag5/PR-l/Sc7 proteins 

(PIRKW) glycoprotein 5e-22 

(PIRKW) blocked amino end 5e-13 

(PIRKW) brain 9e-30 

(PIRKW J hydrolase 4e-09 

[PIRKWJ hemolymph coagulation 4e-09 

(PIRKW) zymogen 4e-09 

(PIRKW) alternative splicing 4e-09 

(PIRKW) sperm 5e-22 

(PIRKWJ viroid-induced protein 2e-ll 

(PIRKW) venom 6e-18 

(PIRKW) pyroglutamic acid 2e-ll 

(PIRKW) transmembrane protein 2e-10 

(PIRKW) serine proteinase 4e-09 

(SUPFAM) C-type lectin homology 4e-09 

(SUPFAM) trypsin homology 4e-09 
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tSUPFAMJ 

complement factor 

H repeat homology 4e-09 

[SUPFAM) 

cysteine-rich seer 

etory protein 1 6e-24 

[SUPFAMJ 

pathogenesis-related leaf protein 7e-15 

[PROSITE] 

MYRISTYL 8 


(PROSITE) 

CAMP PHOSPHO SITE 

3 

[PROSITE] 

CK2 PHOSPHO SITE 

6 

[PROSITE) 

TYR PHOSPHO SITE 

1 

(PROSITE J 

PKC PHOSPHO SITE 

8 

(PROSITE) 

ASN GLYCOSYLATION 

3 

(PROSITE) 

SCP AGS PRl SC7 2 

1 

IPFAM) 

SCP-li)ce extracellular Proteins 

(KW) 

All Beta 


(KW) 

SIGNAL PEPTIDE 23 


[KWJ 

LOW COMPLEXITY 

1.21 % 


SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 


MSCVLGGVIPLGLLFLVCGSQGYLLPNVTLLEELLSKYQHNESHSRVRRAIPREDKEEIL 

xxxxxx 

ccceeeeeceeeeeeeecccccccccchhhhhhhhhhhhhcccchhhhhhhccchhhhhh 

MLHNKLRGQVQPQASNMEYMTWDDELEKSAAAWASQCIWEHGPTSLLVSIGQNLGAHWGR 

hhhhhhhcccccccccchhhhhhhhhhhhhhhhhhhhcccccccccccccccceeeeecc 

YRSPGFHVQSWYDEVKDYTYPYPSECNPWCPERCSGPMCTHYTQIVWATTNKIGCAVNTC 

ccccchhhhhhlihhhhccccccccccccccccccccccccceeeeeeeccccccceeeec 

RKMTVWGEVWENAVYFVCKYSPKGNWIGEAPYKNGRPCSECPPSYGGSCRNNLCYREETY 

cccccccccccceeeeeeeccccccccccccccccccccccccccccccccccccccccc 

TPKPETDEMNEVETAPIPEENHVWLQPRVMRPTKPKKTSAVNYMTQVVRCDTKMKDRCKG 

cccccccccccccccccccceeeeecccccccccccceeeeeeeeeeeeecccccccccc 

STCNRYQCPAGCLNHKAKIFGTLFYESSSSICRAAIHYGILDDKGGLVDITRNGKVPFFV 

ccccccccccccccccceeeeeeeeecccceeeeeccccccccccceeeeeccccceeee 

KSERHGVQSLSKYKPSSSFMVSKVKVQDLDCYTTVAQLCPFEKPATHCPRIHCPAHCKDE 

eccceeeeeeeeccccceeeeeeeeeecccceeeeeeeeccccccccccccccccccccc 

PSYWAPVFGTNIYADTSSICKTAVHAGVISNESGGDVDVMPVDKKKTYVGSLRNGVQSES 

ccceeeeeceeeccccceeeeeeeeccccccccccccceeecccceeeeeecccceeeee 
LGTPRDGKAFRI FAVRQ 

ccccccccceeeeeccc 


Prosite for DKFZphtes3_4b4 . 1 


PSOOOOl 
PSOOOOl 
PSOOOOl 
PS00004 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PSO0Q06 
PS00006 
PS00006 
PS00006 
PS00007 
PS00008 
PS00008 
PS00008 
PS00008 


27->31 
41->45 
451->455 
181->185 
276->280 
464->468 
170->173 
179->182 
201->204 
228->231 
241->244 
362->365 
471->474 
483->486 
29->33 
75->79 
81->85 
130->134 
453->457 
483->487 
385->393 
111->117 
115->121 
174->180 
204->210 


as n_gl yogs ylat ion 

asnglycos ylat ion 

asnglycosylation 

camp^phos phos ite 

camp_phospho_s ite 

camp_phosph0_site 

pkc_ph0sph0_site 

pkc_phospho_s i te 

pkc_phos pho_s i te 

pkc^phospho s i te 

pkcphosphosite 

pkc_phospho_site 

pkc_phospho_site 

pkc phospho_site 

ck22phospho_site 

ck2_ph0sph0_site 

ck2~phospho_site 

ck2_ph0sph0_site 

c k2_ph0s pho_s i te 

ck2_ph0sph0_site 

tyr__phospho site 

myristyl 

myristyl 

myristyl 

myristyl 


PDOCOOOOl 
PDOCOOOOl 
PDOCOOOOl 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOCG0005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00008 
PDOC00008 
PDOCQ0008 
PDOC00008 
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PSOO008 
PS00008 
PS00008 
PS00008 
PSOlOlO 


227->233 
300->306 

470->476 
195->207 


MYRISTYL 
MYRISTYL 
MYRISTYL 

MYRISTYL 

SCP AG5 PRl SC7 2 


PDOCOOOOB 
PDOC00009 
PDOCOOOOB 
PDOCOOOOB 
PDOC00772 


Pfara for DKrzphtes3_4b4 . 1 


HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 


SCP-iike extracellular Proteins 


52 


♦PQDEQDEWLNkHNDFRQQVGRGLETRGNPGPQPPAsNMnPMVWNDELAt 
P + ++E+L HN 1-R QV P ASNM M+W+DEL + 

PREDKEETLMLHNKLRGQVQ PQASNMEYMTWDDELEK 


88 


lAQnWANOCiFDHHDCCWNHsnYPYGQNIAWWSsTANnPWnWssMIQMWY 
A WA+QCI +H ++ + S GQN+ + + ++++ +Q+WY 
89 SAAAWASQCIWEHGPTSLLVSI— GQNLGAHWG RYRSPGFHVQSWY 132 

NEvkDYNYNWNTCkGG NNFmVCGHYTQMVWRnTf rlGCGRYICYC 

+EVKDY Y + + +c HYTQ+VW+ T +IGC+ C+ 

133 DEVKDYTYPYPSECNPWCPERCSGPMCTHYTQIVWATTNKIGCAVMTCRK 182 

NNNWrKPDPWKhkWYYVCNYCPpGNYmN* 
+ W + W+ +Y VCNY P+GN+++ 
183 MTVW — GEVWENAVYFVCNYSPKGNWIG 208 
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DKFZphtes3_4fl7 


group: testes derived 

DKFZphtes3_4 f 17 encodes a novel 656 amino acid protein with weak similarity to methyl -CpG- 
binding proteins. 

Methyiation at the DNA sequence 5'-CpG is required for mammalian development. Methyl -CpG- 
binding proteins bind specifically to methylated DNA via a related amino acid motif and can 
repress transcription. The novel protein does not contain such a motife. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 
genes. 


similarity to methyl-CpG-binding protein 

extension of HS557771/HSZ78337, 

there are some differences to these sequences 

Sequenced by AGOWA 

Locus: /map='*18" 

Insert length: 2320 bp 

Poly A stretch at pos. 2266, polyadenylation signal at pos. 22S1 


1 GGCAGGTTCG CGGGTCGCTG GCGGGGGTCG TGAGGGAGTG CGCCGGGAGC 
51 GGAGATATGG AGGGAGATGG TTCAGACCCA GAGCCTCCAG ATGCCGGGGA 
101 GGACAGCAAG TCCGAGAATG GGGAGAATGC GCCCATCTAC TGCATCTGCC 
151 GCAAACCGGA CATCAACTGC TTCATGATCG GGTGTGACAA CTGCAATGAG 
201 TGGTTCCATG GGGACTGCAT CCGGATCACT GAGAAGATGG CCAAGGCCAT 
251 CCGGGAGTGG TACTGTCGGG AGTGCAGAGA GAAAGACCCC AAGCTAGAGA 
301 TTCGCTATCG GCACAAGAAG TCACGGGAGC GGGATGGCAA TGAGCGGGAC 
351 AGCAGTGAGC CCCGGGATGA GGGTGGAGGG CGCAAGAGGC CTGTCCCTGA 
401 TCCAGACCTG CAGCGCCGGG CAGGGTCAGG GACAGGGGTT GGGGCCATGC 
451 TTGCTCGGGG CTCTGCTTCG CCCCACAAAT CCTCTCCGCA GCCCTTGGTG 
501 GCCACACCCA GCCAGCATCA CCAGCAGCAG CAGCAGCAGA TCAAACGGTC 
551 AGCCCGCATG TGTGGTGAGT GTGAGGCATG TCGGCGCACT GAGGACTGTG 
601 GTCACTGTGA TTTCTGTCGG GACATGAAGA AGTTCGGGGG CCCCAACAAG 
651 ATCCGGCAGA AGTGCCGGCT GCGCCAGTGC CAGCTGCGGG CCCGGGAATC 
701 GTACAAGTAC TTCCCTTCCT CGCTCTCACC AGTGACGCCC TCAGAGTCCC 
751 TGCCAAGGCC CCGCCGGCCA CTGCCCACCC AACAGCAGCC ACAGCCATCA 
801 CAGAAGTTAG GGCGCATCCG TGAAGATGAG GGGGCAGTGG CGTCATCAAC 
851 AGTCAAGGAG CCTCCTGAGG CTACAGCCAC ACCTGAGCCA CTCTCAGATG 
901 AGGACCTACC TCTGGATCCT GACCTGTATC AGGACTTCTG TGCAGGGGCC 
951 TTTGATGACC ATGGCCTGCC CTGGATGAGC GACACAGAAG AGTCCCCATT 
1001 CCTGGACCCC GCGCTGCGGA AGAGGGCAGT GAAAGTGAAG CATGTGAAGC 
1051 GTCGGGAGAA GAAGTCTGAG AAGAAGAAGG AGGAGCGATA CAAGCGGCAT 
1101 CGGCAGAAGC AGAAGCACAA GGATAAATGG AAACACCCAG AGAGGGCTGA 
1151 TGCCAAGGAC CCTGCGTCAC TGCCCCAGTG CCTGGGGCCC GGCTGTGTGC 
1201 GCCCCGCCCA GCCCAGCTCC AAGTATTGCT CAGATGACTG TGGCATGAAG 
1251 CTGGCAGCCA ACCGCATCTA CGAGATCCTC CCCCAGCGCA TCCAGCAGTG 
1301 GCAGCAGAGC CCTTGCATTG CTGAAGAGCA CGGCAAGAAG CTGCTCGAAC 
1351 GCATTCGCCG AGAGCAGCAG AGTGCCCGCA CCCGCCTTCA GGAAATGGAA 
1401 CGCCGATTCC ATGAGCTTGA GGCCATCATT CTACGTGCCA AGCAGCAGGC 
1451 TGTGCGCGAG GATGAGGAGA GCAACGAGGG TGACAGTGAT GACACAGACC 
1501 TGCAGATCTT CTGTGTTTCC TGTGGGCACC CCATCAACCC ACGTGTTGCC 
1551 TTGCGCCACA TGGAGCGCTG CTACGCCAAG TATGAGAGCC AGACGTCCTT 
1601 TGGGTCCATG TACCCCACAC GCATTGAAGG GGCCACACGA CTCTTCTGTG 
1651 ATGTGTATAA TCCTCAGAGC AAAACATACT GTAAGCGGCT CCAGGTGCTG 
1701 TGCCCCGAGC ACTCACGGGA CCCCAAAGTG CCAGCTGACG AGGTATGCGG 
1751 GTGCCCCCTT GTACGTGATG TCTTTGAGCT CACGGGTGAC TTCTGCCGCC 
1801 TGCCCAAGCG CCAGTGCAAT CGCCATTACT GCTGGGAGAA GCTGCGGCGT 
1851 GCGGAAGTGG ACTTGGAGCG CGTGCGTGTG TGGTACAAGC TGGACGAGCT 
1901 GTTTGAGCAG GAGCGCAATG TGCGCACAGC CATGACAAAC CGCGCGGGAT 
1951 TGCTGGCCCT GATGCTGCAC CAGACGATCC AGCACGATCC CCTCACTACC 
2001 GACCTGCGCT CCAGTGCCGA CCGCTGAGCC TCCTGGCCCG GACCCCTTAC 
2051 ACCCTGCATT CCAGATGGGG GAGCCGCCCG GTGCCCGTGT GTCCGTTCCT 
2101 CCACTCATCT GTTTCTCCGG TTCTCCCTGT GCCCATCCAC CGGTTGACCG 
2151 CCCATCTGCC TTTATCAGAG GGACTGTCCC CGTCGACATG TTCAGTGCCT 
2201 GGTGGGGCTG CGGAGTCCAC TCATCCTTGC CTCCTCTCCC TGGGTTTTGT 
2251 TAATAAAATT TTGAAGAAAC CAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
2301 AAAAAAAAAA AAAAAAAAAA 


BLAST Results 
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Entry HS557771 from database EMBLEST: 

Human chromosome 18 clone 2 mRNA sequence. 

Score " 7582, P = O.Oe+00, identities = 1560/1598 

Entry HSZ78337 from database EMBLEST; 

H-sapiens mRNA, expressed sequence tag lCRFp507H02194 (5') 
Score = 6339, P = 9,0e-281, identities = 1307/1347 

Entry HS095149 from database EMBL: 
human STS wi-6941. 
Score = 1210, P «> 2,2e-49, identities = 246/251 


Medline entries 

98449942: 

Identification and characterization of a family of mammalian methyl-CpG 
binding proteins. *^ 

9824997: 

Gene silencing by methyl -CpG-binding proteins. 


Peptide information for frame 3 


ORF from 57 bp to 2024 bp; peptide length: 656 
Category: similarity to known protein 


1 MEGDGSDPEP PDAGEDSKSE NGENAPIYCI CRKPDINCFM IGCDNCNEWF 
51 HGDCIRITEK MAKAIREWYC RECREKDPKL EIRYRHKKSR ERDGNERDSS 
101 EPRDEGGGRK RPVPDPDLQR RAGSGTGVGA MLARGSASPH KSSPQPLVAT 
151 PSQHHQQQQQ QIKRSARMCG ECEACRRTED CGHCDFCRDM KKFGGPNKIR 
201 QKCRLRQCQL RARESYKYFP SSLSPVTPSE SLPRPRRPLP TQQQPQPSQK 
251 LGRIREDEGA VASSTVKEPP EATATPEPLS DEDLPLDPDL YQDFCAGAFD 
301 DHGLPWMSDT EESPFLDPAL RKRAVKVKHV KRREKKSEKK KEERYKRHRQ 
351 KQKHKDKWKH PERADAKDPA SLPQCLGPGC VRPAQPSSKY CSDDCGMKLA 
401 ANRIYEILPQ RIQQWQQSPC lAEEHGKKLL ERIRREQQSA RTRLQEMERR 
451 FHELEAIILR AKQQAVREDE ESNEGDSDDT DLQIFCVSCG HPINPRVALR 
501 HMERCYAKYE SQTSFGSMYP TRIEGATRLF CDVYNPQSKT YCKRLQVLCP 
551 EHSRDPKVPA DEVCGCPLVR DVFELTGDFC RLPKRQCNRH YCWEKLRRAE 
€01 VDLERVRVWY KLDELFEQER NVRTAMTNRA GLLALMLHQT IQHDPLTTDL 
651 RSSADR 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_4fl7, frame 3 

TREMBL:CEF52B11^4 gene: "r52Bll.l"; Caenorhabditis elegans cosmid 
F52B11, N = 2, Score = 316, P = 8.8e-27 

TREMBL:HSAB2331_1 gene: "KIAA0333"; Human mRNA for KIAA0333 gene, 
partial cds., N = 2, Score = 163, P « 2.8e-13 

TREMBL:SPCC594_5 gene: "SPCC594 , OSc"; product: "putative 
transcriptional regulatory protein, phd finger containing"; S.porobe 
chromosome III cosmid c594., N = 3, Score = 168, P - 3.6e-12 

TREMBL:AF072240_1 gene: "Mbdl"; product: "methyl-CpG binding protein 
MBDl"; Mus musculus methyl-CpG binding protein MBDl (Mbdl) mRNA, 
complete cds., N = 2, Score = 189, P «= 7.6e-ll 


>TREMBL:CEF52B11_4 gene: "F52B11.1"; Caenorhabditis elegans cosmid F52BH 
Length « 523 

HSPs: 

Score - 316 (47.4 bits). Expect = B.8e-27, Sum P{2) = 8.8e-27 
Identities ■= 100/336 (29%), Positives «= 167/336 (49%) 
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Query: 333 REKKSEKKKEERYKRHRQ-KQKHKDKWKHPERADAKDPASLP-QCLGPGCVRPAQPSSKY 390 

+++K+ E Y +R +Q+ D + + +A +P P QCL P C+ ++ SKY 
Sbjct: 118 OQRKANIINERDYVPNRPTRQQSADLRRKRTQLNA-EPDKHPRQCLNPNCIYESRIDSKY 176 

Query: 391 CSDDCGMKLAANRI YEILPORIQQW QQSPCIAEEHGKKLLERIRREQQSARTRLQ 445 

CSD+CG +LA R+ EILP R +Q+ P E+ K +1 RE Q + 

Sbjct: 177 CSDECGKELARMRLTEILPNRCKQYFFEGPSGGPRSLEDEIKPKRAKINREVQKLTESEK 236 

Query: 446 EMERRFHEL-EAIILRAKQQAVREDEESNEGDSDDTDLQIFCVSCGHPINPRVAL-RHME 503 

M ++L E I + K Q + +E D +L C+ CG P P + +H+E 

Sbjct: 237 NMMAFLNKLVEFIKTQLKLQPLGTEERY DDNLYEGCIVCGLPDIPLLKYTKHIE 290 

Query: 504 RCYAKYESQTSFGSMYPTRIEGATRLFCDVYNPQSKTYCKRLQVLCPEHSRDPKVPADEV 563 

C+A+ E SFG+ P + +C+ Y+ ++ ++CKRL+ LCPEH + +V 

Sbjct: 291 LCWARSEKAISFGA— PEK— NNDMFYCEKYDSRTNSFCKRLKSLCPEHRKLGDEQHLKV 346 

Query: 564 CGCP LVRDVFELTGDF CRLPKRQCNRHYCWEKLRRAEVDLERVR 607 

CG P V ++ E+ F CR K C++H+ W R ++LE+ 

Sbjct: 347 CGYPKKWEDGMIETAKTVSELIEMEDPFGEEGCRTKKDACHKHHKWIPSLRGTIELEQAC 406 

Query: 608 VWYKLDELFEQ— ERNVRTAMTNRAGLLALMLHQTIQHDPLTTDLRSSA 654 

++ K+ EL + + N T A L++M+H+ + + LR+ A 

Sbjct: 407 LFQKMYELCHEMHKLNAHAEWTTNA— LSIMMHKQPSTEKCSFFLRNFA 453 


Score = 53 {8.0 bits). Expect = 8.8e-27, Sum P{2) = 8.8e-27 
Identities = 24/100 (24%), Positives - 41/100 (41%) 

Query: 169 CGECEACRRTEDCGHCDFCR DMKK-FGGPNKIRQKCRLRQCQLRARESYKYFPSS 222 

C C C ++CG C CR CM4+K F +K + RQ + + + 

Sbjct: 17 CMNCIRCNDEKNCGTCWPCRNGKTCDMRKCFSAKRLYNEKVK-RQTDENLK-AIMAKTAQ 74 

Query: 223 LSPVTPSESLPRPRRPLPTQQQPQPSQKLGRIR-EDEGAVASS 264 

+ + P P+ +QQ + +K GR + G A++ 
Sbjct: 75 REAAHQAATTTAPSAPWIEQQVE-KKKRGRKKGSGNGGAAAA 116 

Score - 48 (7.2 bits). Expect « 2.9e-26, Sum P(2) = 2.9e-26 
Identities « 13/39 (33%), Positives = 19/39 (48%) 

Query: 179 EDCGHCDFCRDMKKFGG— PNKIRQKCRLRQCQLRARESY 216 

EC+C CDK G P+ + C +R+C A+ Y 
Sbjct: 15 ERCMNCIRCNDSKMCGTCWPCRNGKTCCH4RKC-FSAKRLY 53 


Pedant information for DKFZphtes3_4f 17, frame 3 


Report for DKFZphtes3_4fl7 .3 


(LENGTH] 656 

[MW) 75711.71 

(pll 8.61 

[HOMOL] TREMBL:CEF52B11_4 gene: -F52B11.1"; Caenorhabditis elegans cosmid F52B11 3e-25 

[FUNCATl 99 unclassified proteins (S. cerevisiae, YPL138c] 3e-10 

[FUNCATJ 04.05,01.04 transcriptional control (S. cerevisiae, YNL097cJ 2e-04 

[PROSITEl MYRISTYL 6 

(PROSITE) AMI DAT ION 2 

(PROSITEl CK2 PHOSPHO_SITE 8 

(PROSITEl TYR~PHOSPHO_SITE 3 

(PROSITEl GLYCOSAMINOGLYCAN 1 

(PROSITEl PKC_PHOSPHO_SITE 9 

(KW] All_Alpha 

(KW] LOW_CQMPLEXITY 18.75 % 

(KW] COILED_COIL 4.57 % 


SEQ NEGDGSDPEPPDAGEDSKSENGENAPIYCICRKPDINCFMIGCDNCNEWFHGDCIRITEK 

SEG , 

PRD cccccccccccccccccccccccccceeeeeeccccceeeeecccccccccccchhhhhh 

COILS 

SEQ MAKAIREWYCRECREKOPKLEIRYRHKKSRERDGNERDSSEPRDEGGGRKRPVPDPDLQR 

SEG 

PRD hhhhhhhhhhhccccccccchhhhhhhhhccccccccccccccccccccccccccccccc 

COILS 

SEQ RAGSGTGVGAMliARGSASPHKSSPQPLVATPSQHHQQQQQQIKRSARMCGECEACRRTED 

SEG xxxxxxxxx 

PRD cccccccceeeecccccccccccccccccchhhhhhhhhhhhhhhhhhcccccccccccc 

COILS 
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SEQ 
SEG 
PRD 
COILS 


CGHCDFCRDMKKFGGPNKIRQKCRLRQCQLRARESYKYFPSSLSPVTPSESLPRPRRPLP 

xxxxxxxxxxxxxx xxxxxxxxxxxxxx 

cccccccccccccccccchhhhhhhhhhhhhhhhhhcccccccccccccccccccccccc 


SEQ TQQQPQPSQKLGRIREDEGAVASSTVKEPPEATATPEPLSDEDLPLDPDLYQDFCAGAFD 

SEG xxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 


COILS 


SEQ DHGLPWMSDTEESPFLDPALRKRAVKVKHVKRREKKSEKKKEERYKRHRQKQKHKDKWKH 

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchh 


COILS 


SEQ PERADAKDPASLPQCLGPGCVRPAQPSSKYCSDDCGMKLAANRIYEILPQRIQQWQQSPC 

SEG 

PRD hhhhhccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhccch 

COILS 


SEQ lAEEHGKKLLERIRREQQSARTRLQEMERRFHELEAIILRAKQQAVREDEESNEGDSDDT 
*********'***''*••**•♦*••••••••••»«•»••-•••-.«» xxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcc 
COILS 


- cccccccccccccccccccccccccccccc . 


cccccccccc 


SEQ 
SEG 
PRD 
COILS 


SEQ 
SEG 
PRD 
COILS 


DLQIFCVSCGHPINPRVALRHMERCYAKYESQTSFGSMYPTRIEGATRLFCDVYNPQSKT 
ceeeeeeeccccccccchhhhhhhhhhhhhhcccccccccccccccceeeeeeccccccc 

YCKRLQVLCPEHSRDPKVPADEVCGCPLVRDVFELTGDFCRLPKRQCNRHYCWEKLRRAE 
cchhhhhhhccccccccccceeeeccccchhhhhccccccccccccccchhhhhhhhhh^ 


VDLERVRVWYKLDELFEQERNVRTAMTNRAGLLALMLHQTIQHDPLTTDLRSSADR 


SEQ 
SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccc^ 
COILS 


Prosite for DKFZphtes3 4fl7.3 


PS00002 
PSOO0O5 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00007 
PS00007 
PS00007 
PSOO008 
PS00008 
PS00008 
PSOOOOB 
PS00008 
PS00008 
PS00009 
PS00009 


124->128 
58->61 
165->168 
215->218 
248->251 
265->268 
337->340 
387->390 
439->442 
627->630 
6->10 
17->21 
227->231 
265->269 
280->284 
308->312 
52I*>525 
652->656 
339->346 
500->507 
211->219 
42->48 
123->129 
125-M31 
129->135 
259->265 
396->402 
107->111 
425->429 


GLYCOSAMINOGLYCAN 

PKC_PHOSPHO_S ITE 

PKC_PHOSPHO_SITE 

PKC_PH0SPHO_SITE 

PKC_ PHOSPHORS I TE 

PKC_PHOSPHO__S ITE 

PKCPHOSPHO^SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOS PHO_S I TE 

CK2_PH0SPH0 SITE 

CK2_PH0SPH0~SITE 

C K 2_PH0S PHO_S I TE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2 PHOSPHO SITE 

CK22PHOS PHO^S ITE 

TYR_PH0S PH0~S ITE 

TYR_PH0SPH0_SITE 

TYR_PH0SPH0_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

AMIDATION 

AMI DAT ION 


PDOC00002 
PDOC00005 

PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00O05 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00007 
PDOC00007 
PDOC00008 
PDOC00008 
PDOCOOOOB 
PDOC00008 
PDOC00008 
PDOCOOOOB 
PDOC00009 
PDOC00009 


(No Pfam data available for DKFZphtes3_4f 17 .3) 
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DKF2phtes3_4f5 


group: signal transduction 

DKFZphtes3_4f5.3encodes a novel 790 amino acid protein similar to beta-transducins. 

The protein contains 3 WD-40 repeats, which are typical for the beta-transducin subunit of G- 
proteins. The beta subunits seem to be required for the replacement of GDP by GTP as well as 
for membrane anchoring and receptor recognition. In addition, a Cytochrome C family heme- 
binding site signature is present. The protein is larger (790 amino acids) than the usual 
eukaryotic G-beta transducins (about 340 amino acids) . 

The new protein can find application in modulating /bloc king G-protein-dependent pathways. 


similarity to S.pombe "beta-transducin" 

complete CDNA, EST hits 
complete cds, 

on genomic level encoded by HS313D11, at least 7 exons these exons 
match 

only partialy with the predicted transcripts in HS313D11 
Sequenced by AGOWA 
Locus: /map="16pl3.3" 
Insert length: 3166 bp 

No poly A stretch found, no polyadenylation signal found 


1 GGCGGCTTCC GGCGCGGCGG TTCCGGACAA CCGTGCGCTT TTAGTAAAAG 
51 ATTGGGGTTC GCGCGGGGGA GAAGGGCTGC CCCGGGCCCT CT<K;TTCTCG 
101 TCCCGCAGCG TCCGCTCCCC CGCGCCACTG CGCCGCTCCC aggaaccctg 
151 TACTCCGGGG TCGCCGGCTT CTCTCCTGCC TCCGGTCCCG CCAGACACCT 
201 CGAGCTCCTT AAGTAGCTCG GTCCTTGACG TCCCTCTGGG CCCTTCCCGC 
251 GTCTATCGCC TGAGTCCCCG GGCCCCTCTA GCCCTCTGTT CCCTCCCCTC 
301 TTTTGTTCCT CCCTAGAGCC CCGCCGCCCT CAGGGCTGAC AGTGTGGACG 
351 GCGGGAGTCT CCTCGCTCCC CTGCTGGGAT TGACTGACCG AGCGTTTAGT 
401 GACTGCCCAG ATCTGGCTGA TGGGGGTACC GAGAGGTGGC CTGGGCCGGG 
451 AATGTCCAGC TAGAGTCTTC CGTGGAAGTC AGACATGAAA CTGACAGGCC 
501 TAAGGGAAGC TAGGAAGTCC CCTCACCGCT CAGCCAGGGT GATGGGCTGG 
551 ACTGACAGAC TCCAGTGAAT TTGAGCTTGC CTGTCAGGCT GATTGGCTGA 
601 TAGACAGCCC TGGATTGGCT CACTAAGACT GACCAGCCCG (^GACCAAGCA 
651 GTTCTGGGGT CCCAACCTGG GTGGAAGGTC TGAACTGATG acccacccag 
701 gctgaccagg ccagcccacc TCACTGACCT cctgacccct GACCTCATCA 
751 CCTGTGCAGC CATGGAGAAG ATGTCCCGTG tgaccacagc cctgggtggc 
801 AGCGTGCTGA CAGGCCGCAC CATGCACTGC CACCTGGATG CTCCCGCCAA 
851 TGCCATCAGT GTGTGCCGCG ACGCAGCCCA GGTGGTCGTG GCAGGCCGTA 
901 GCATCTTCAA GATCTATGCC ATCGAGGAGG AACAGTTCGT GGAAAAGCTG 
951 AACCTGCGTG TGGCSGCGCAA GCCTTCGCTT AACCTGAGCT GTGCTGACGT 
1001 GGTCTGGCAC CAGATGGATG AGAACCTGCT GGCCACAGCA GCCACCAATG 
1051 GCGTGGTGGT CACGTGGAAC CTGGGCCGGC CATCCCGCAA CAAGCAGGAC 
1101 CAGCTGTTCA CAGAACACAA GCGCACGGTA AACAAAGTCT GCTTCCACCC 
1151 CACCGAAGCC CACGTGCTGC TCAGTGGCTC CCAGGATGGC TTCATGAAGT 
1201 GCTTTGACCT CCGCAGAAAG GACTCTGTCA GCACCTTCTC GGGCCAGTCG 
1251 GAGAGCGTGC GGGACGTGCA GTTCAGTATC CGGGACTACT TCACCTTCGC 
1301 CTCCACCTTT GA<3AACGGCA ATGTGCAGCT CTGGGACATC CGGCGTCCCG 
1351 ACTGGTGCGA GAGGATGTTC ACAGCCCACA ACGGACCCGT CTTCTGCTGC 
1401 GACTGGCACC CCGAGGACAG GGGCTGGTTG GCCACTGGAG GGCGCGACAA 
1451 GATGGTGAAG GTCTGGGACA TGACCACGCA CCGTGCCAAG gagatgcact 
1501 GTGTGCAGAC CATCGCCTCG GTGGCCCGTG TGAAGTGGCG GCCAGAGTGC 
1551 CGCCACCACC TGGCCACGTG CTCCATGATG GTGGACCACA ACATCTATGT 
1601 TTGGGACGTG CGCCGGCCCT TCGTGCCAGC TGCCATGTTT gaggaacacc 
1651 GAGACGTCAC CACGGGAATT GCCTGGCGCC acccccacga CCCCTCCTTC 
1701 CTGCTGTCTG GCTCCAAGGA CAGCTCGCTG TGCCAGCACC TGTTCCGCGA 
1751 CGCCAGCCAG CCCGTCGAGC GCGCCAACCC TGAGGGCCTC TGCTACGGCC 
1801 TCTTCGGGGA CCTGGCCTTC GCCGCCAAGG AGAGCCTCGT ggctgccgag 
1851 TCGGGGCGCA AGCCCTACAC TGGCGACCGG CGCCACCCCA TCTTCTTTAA 
1901 GCGCAAGCTG GACCCTGCCG AGCCCTTCGC AGGCCTCGCC TCCAGTGCCC 
1951 tCAGTGTCTT TGAGACGGAG CCAGGTGGCG GCGGCATGCC CTGGTTTGTG 
2001 GACACAGCTG AGCGTTATGC GCT6GCTGGC CGGCCACTGG CCGAGCTCTG 
2051 TGACCACAAC GCAAAGGTGG CTCGAGAGCT TGGCCGCAAC CAGGTGGCGC 
2101 AAACGTGGAC CATGCTGCGG ATCATCTACT GCAGCCCTGG CCTAGTGCCC 
2151 ACTGCAAACC TCAACCACAG TGTGGGCAAG GGTGGCTCCT GTGGCCTCCC 
2201 GCTCATGAAC AGTTTCAACC TGAAGGATAT GGCCCCAGGG TTGGGCAGTG 
2251 AGACGCGGCT G6ACCGCAGC AAAGGAGATG CACGGAGCGA CACAGTTCTG 
2301 CTCGACTCCT CGGCCACACT CATCACCAAT GAGGATAACG AGGAAACCGA 
2351 GGGCAGCGAC GTACCTGCCG ACTACCTGCT GGGTGACGTG GAAGGTGAGG 
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2401 AGGACGAGCT GTACCTGCTG GATCCGGAAC ACGCGCACCC CGAGGACCCT 
2451 GAGTGCGTGC TGCCGCAGGA GGCCTTTCCG CTGCGCCACG AGATCGTGGA 
2501 CACGCCTCCC GGACCCGAGC ACCTGCAGGA CAAGGCCGAC TCCCCGCACG 
2 551 TGAGCGGCAG CGAGGCGGAT GTGGCCTCCC TGGCCCCCGT GGACTCCTCC 
2601 TTCTCGCTCC TGTCTGTCTC ACACGCGCTC TACGACAGCC GCCTGCCGCC 
2 651 CGACTTCTTC GGCGTGCTGG TGCGCGACAT GCTGCACTTC TACGCTGAGC 
2701 AGGGCGACGT GCAGATGGCT GTGTCTGTGC TCATCGTCCT GGGTGAACGG 
2751 GTGCGCAAGG ACATCGACGA GCAGACCCAG GAGCACTGGT ACACTTCCTA 
2801 CATCGACCTG CTGCAGCGCT TCCGCCTCTG GAACGTGTCC AACGAGGTGG 
2851 TCAAGCTGAG CACCAGCCGC GCCGTCAGCT GCCTCAACCA GGCCTCCACC 
2901 ACCCTGCACG TCAACTGCAG CCACTGCAAG CGGCCCATGA GCAGCCGGGG 
2951 CTGGGTCTGC GACAGGTGCC ACCGCTGCGC CAGCATGTGT GCCGTCTGCC 
3001 ACCACGTAGT CAAGGGTCTC TTCGTGTGGT GCCAGGGCTG CAGCCACGGC 
3051 GGCCACCTGC AGCACATCAT GAAGTGGCTG GAAGGCAGCT CCCACTGTCC 
3101 CGCAGGCTGC GGCCACCTCT GCGAGTACTC CTGACGGGGC ATCTGCTGGG 
3151 CTTGCCCGG6 CGGCCG 


BLAST Results 


Entry HS313D11 from database EMBL: 
Human DNA sequence from cosmid 313D11 
chromosome 16. Contains ESTs, STS and 
Score = 6238, P = O.Oe+00, identities 


from a contig on the short arm of 
CpG islands. 
= 1318/1391 


Medline entries 


No Medline entry 


Peptide information for frame 3 


ORF from 762 bp to 3131 bp; peptide length: 790 
Category: similarity to known protein 


1 MEKMSRVTTA LGGSVLTGRT MHCHLDAPAN AISVCRDAAQ VWAGRSIFK 
51 lYAlEEEQFV EKLNLRVGRK PSLNLSCADV VWHQMDENLL ATAATNGVW 
101 TWNLGRPSRN KQDQLFTEHK RTVNKVCFHP TEAHVLLSGS QDGFMKCFDL 
151 RRKDSVSTFS GQSESVRDVQ FSIRDYFTFA STFENGNVQL WDIRRPDRCE 
201 RMFTAHNGPV FCCDWHPEDR GWLATGGRDK MVKVWDMTTH RAKEMHCVQT 
251 lASVARVKWR PECRHHLATC SMMVDHNIYV WDVRRPFVPA AMFEEHRDVT 
301 TGIAWRHPHD PSFLLSGSKD SSLCQHLFRD ASQPVERANP EGLCYGLFGD 
351 LAFAAKESLV AAESGRKPYT GDRRHPIFFK RKLDPAEPFA GLASSALSVF 
401 ETEPGGGGMR WFVDTAERYA LAGRPLAELC DHNAKVAREL GRNQVAQTWT 
451 MLRIIYCSPG LVPTANLNHS VGKGGSCGLP LMNSFNLKDM APGLGSETRL 
501 DRSKGDARSD TVLLDSSATL ITNEDNEETE GSDVPADYLL GDVEGEEDEL 
551 YLLDPEHAHP EDPECVLPQE AFPLRHEIVD TPPGPEHLQD KADSPHVSGS 
601 EADVASLAPV DSSFSLLSVS HALYDSRLPP DFFGVLVRDM LHFYAEQGDV 
651 QMAVSVLIVL GERVRKDIDE QTQEHWYTSY IDLLQRFRLW NVSNEWKLS 
701 TSRAVSCLNQ ASTTLHVNCS HCKRPMSSRG WVCDRCHRCA SMCAVCHHW 
751 KGLFVWCQGC SHGGHLQHIM KWLEGSSHCP AGCGHLCEYS 

BLASTP hits 

Entry YDSB^SCHPO from database SWISSPROT: 

HYPOTHETICAL 93.2 KO TRP-ASP REPEATS CONTAINING PROTEIN C4F8 11 IN 
CHROMOSOME I. >TREMBi:SPAC4F8_ll gene: •'SPAC4F8.il*'; product! 

beta-transducin"; S.poinbe chromosome I cosmid c4F8 
Score « 404, P ^ 3.0e-42, identities = 169/639, positives » 278/639 

Entry PEX7_HUMAN from database SWISSPROT: 

PEROXISOMAL TARGETING SIGNAL 2 RECEPTOR (PTS2 RECEPTOR) (PEROXIN-7) 

2^™;f ''""^ "Petoxisomriar™ signal 

2 receptor ; Human peroxisome targeting signal 2 receptor tPex7) mRNA, 
complete cds. >trembL:HSU88871 1 gene: -HsPEX7«; product' -HsPex7D"- 
Human HsPex7p (HsPEX7) mRNA, complete cds. "st'ex/p , 

Score = 220, P = l.le-15, identities = 62/244, positives = 107/244 
Entry PEX7_M0USE from database SWISSPROT: 

PEROXISOMAL TARGETING SIGNAL 2 RECEPTOR (PTS2 RECEPTOR) (PEROXIN-7) 
>TREMBL:MMU69171_1 product: "peroxisomal PTS2 receptor"; Mus musculus 
peroxisomal PTS2 receptor mRNA, complete cds. muscuius 
Score = 214, p = 5.3e-15, identities 60/240, positives - 106/240 
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Entry ATAC2294_7 from database TREMBL: 

gene: "F11P17.7"; Arabidopsis thaliana chromosome I BAC F11P17 genomic 
sequence, complete sequence. 

Score = 232, P = 3.4e-14, identities = 68/260, positives = 120/260 
Entry S66835 from database PIR: 

probable membrane protein YOL138c - yeast (Saccharomyces cerevisiae) 
>TREMBL:SCYOL138C 1 S. cerevisiae chromosome XV reading frame ORF 
yOLl38c 

Score - 136, P - 2.5e-l3, identities « 24/77, positives = 44/77 


Alert BLAST P hits for DKFZphtes3_4f5, frame 3 
No Alert BLAST? hits found 

Pedant information for DKFZphtes3_4f 5, frame 3 


Report for DKFZphtes3_4 f 5 . 3 


{LENGTH] 790 

(MWJ 88207.10 

Ipll 6.05 

{HOMOL] SWISSPROT:YDSB_SCHP0 HYPOTHETICAL 93.2 KD TRP-ASP REPEATS CONTAINING PROTEIN 

C4F8.11 IN CHROMOSOME I. 9e-4 4 

(FUNCAT] 99 unclassified proteins (S» cerevisiae, YOL138cJ 5e-16 

(FUNCATJ 10.04.09 regulation of g-protein activity (S. cerevisiae, YBRl95cl 3e-ll 

(FUNCATl 06.10 assembly of protein complexes [S. cerevisiae, YBR195c] 3e-ll 

[FUNCAT] 03-16 dna synthesis and replication [S, cerevisiae, YBR195c] 3e-ll 

(FUNCAT] 09.13 biogenesis of chromosome structure (S. cerevisiae, YBR195c] 3e-ll 

I FUNCAT) 04.05.01.07 chromatin modification {S- cerevisiae, YBR195cJ 3e-ll 

IFUNCAT] 30.10 nuclear organization (S. cerevisiae, YCR072c beta-transducin family] 

3e-I0 

[FUNCAT] 04.05.01.01 general transcription activities (S. cerevisiae, YBRl98c 

TAF90 - TFIID subunit] 9e-09 

(FUNCAT) 04,01.04 rrna processing (S. cerevisiae, YLLOllwJ le-07 

(FUNCAT] 30.09 organization of intracellular transport vesicles [S. cerevisiae, 

YDL195W) 2e-07 
(FUNCAT) 
2e-07 
[FUNCAT] 
[FUNCAT] 
4e-07 
[FUNCATJ 
[FUNCAT) 
(FUNCAT) 
(FUNCAT] 
[FUNCAT) 
( FUNCAT) 
(FUNCAT) 
(FUNCAT) 
[FUNCAT) 
[FUNCAT) 
[FUNCAT) 
le-O.S 
(FUNCAT) 
palmitylation, 
[FUNCAT) 
[SCOP] 
(PIRKH) 
[PIRKW) 
[PIRKW] 
[PIRKW) 
[PIRKW] 
(PIRKW) 
[PIRKW] 
[SUPFAM) 
(SUPFAM) 
(SUPFAM] 
[SUPFAM) 
[SUPFAM] 
[PROSITE] 
[PROSITE] 
(PROSITE] 
(PROSITE) 
(PROSITE) 
(PROSITE) 
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08.07 vesicular transport (golgi networ)c, etc.) (S. cerevisiae, YDL195w] 

30.19 peroxisomal organization (S. cerevisiae, YDR142c) 4e-07 

06.04 protein targeting, sorting and translocation (S. cerevisiae, yDR142c] 

08.10 peroxisomal transport (S. cerevisiae, YDR142c] 4e-07 
08.01 nuclear transport (S. cerevisiae, YER107ci 4e-07 

04.07 rna transport [S. cerevisiae, YER107c) 4e-07 

30.03 organization of cytoplasm (S. cerevisiae, YER107c] 4e-07 
03-22 cell cycle control and mitosis (S. cerevisiae, YGL003c] 5e-07 
06.13 proteolysis (S. cerevisiae, YGL003C] 5e-07 
04.05.01.04 transcriptional control [S. cerevisiae, YCR084c) 8e-07 
04.05.03 mrna processing (splicing) (S. cerevisiae, YPR178w] le-06 
03.13 meiosis (S. cerevisiae, YLRI29w] 3e-06 

03.25 cytokinesis (S. cerevisiae, YCR057c] le-05 

03.04 budding, cell polarity and filament formation [S. cerevisiae, YCR057c] 

06.07 protein modification (glycolsylation, acylation, myristylation, 

farnesylation and processing) (S. cerevisiae, YEL056w) 2e-04 

30.04 organization of cytoskeleton [S, cerevisiae, YOR272w) 6e-04 

dlgotb_ 2.4 6.3.1.1 betal -subunit of the signal- transducing 5e-06 

duplication 7e-10 

signal transduction 7e-08 

peroxisome 9e-06 

heterotrimer 7e-08 

GTP binding 7e-08 

peroxisome biogenesis 9e-06 

transmembrane protein le-14 

HSIl protein 7e-10 

WD repeat homology le-14 

GTP-binding regulatory protein beta chain 7e-08 

PRLl protein 3e-08 

coatomer complex beta' chain le-06 

CYTOCHROME_C 1 

WD__REPEATS 3 

MYRISTYL 10 

AMI DAT I ON 2 

CAMP PHOSPHORS ITE 2 

CK2_PH0SPH0 SITE 11 


wo 01/12659 


PCT/IBOO/01496 


(PROSITEl TYR_PHOSPHO_SITE 1 

(PROSITE] PKC_PHOSPHO_SITE 7 

[PROSITE] ASNGLYCOSYLATION 4 

CPFAM) WD domain, G-beta repeats 

[KWJ All Beta 

[KWJ 3D ~ 

CKWJ LOW_COMPLEXrTy 2.28 % 

SEQ MEKMSRVTTALGGSVLTGRTMHCHLDAPANAISVCRDAAQVVVAGRSIFKIYAIEEEQFV 

igotB - . * i i ' 

SEQ EKLNLRVGRKPSLNLSCADVVWHQMDENLLATAATNGVWTWNLGRPSRNKQDQLFTEHK 

'^^^^^ TTCEEEEEETTTEEEEEET-TTTCEEE— EEECCC 

SEQ RTVNKVCFHPTEAHVLLSGSQDGFMKCFDLRRKDSVSTFSGQSESVRDVOFSIRDYFTFA 

IgotB CCEEEEEEETT-TCEEEEEETTTEEEEEETTTT^^ 

SEQ STFENGNVQLWDIRRPDRCERMFTAHNGPVFCCDWHPEDRGWLATGGRDKMVKVWDMTTH 

IgotB E-ETTTEEEEEETTTTEEEE-EEECCCCCEE^ .* ; * ; 

SEQ RAKEMHCVQTIASVARVKWRPECRHHLATCSMMVDHNIYVWDVRRPFVPAAMFEEHRDVT 
SEG ••••••.»-......,.. 

IgotB 

SEQ TGIAWRHPHDPSFLLSGSKDSSLCQHLFRDASQPVERANPEGLCYGLFGDIAFAAKESLV 

SEG •••••*•••..««..,.,., 

IgotB 


SEQ AAESGRKPYTGDRRHPIE 

SEG 

IgotB 


E FFKRKLDPAEPFAGLASSALS VFETEPGGGGMRWFVDTMRYA 


SEQ lAGRPLAELCDHNAKVARELGRNQVAQTWTMLRIIYCSPGLVPTANLNHSVGKGGSCGLP 

IgotB ••••• ^ ■ i i i 

SEQ LMNSFNLKDMAPGLGSETRLDRSKGDARSDTVLLDSSATLITNEDNEETEGSDVPADYLL 

IgotB 

SEQ GDVEGEEDELYLLDPEHAHPEDPECVLPQEAFPLRHEIVDTPPGPEHLQDKADSPHVSGS 

IgotB '/.'.'.'/,[['.'. 

SEQ EADVASLAPVDSSFSLLSVSHALYDSRLPPDFFGVLVRDMLHFYAEQGDVQMAVSVLIVL 

IgotB i i i i i ! ! 

SEQ GERVRKDIDEQTQEHWYTSYIDLLQRFRLWNVSNEWKLSTSRAVSCLNQASTTLHVNCS 

IgotB • - i i i i ! i 

SEQ HCKRPMSSRGWVCDRCHRCASMCAVCHHWKGLFVWCQGCSHGGHLQHIMKWLEGSSHCP 

IgotB 

SEQ AGCGHLCEYS 

SEG 

IgotB 


Prosite for DKF2phtes3_4f5.3 


PSOOOOl 
PSOOOOl 
PSOOOOl 
PSOOOOl 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 


74->78 
468->472 
691->695 
718->722 

69->73 
152->156 

17->20 
165->168 
172->175 
239->242 
364->367 
701->704 


ASN_GLYC0SYLATION 

ASNGLYCOSYLATION 

ASNGLYCOSYLATION 

ASNGLYCOSYLATION 

CAMP_PHOS PHO_S ITE 

CAMP_PHOSPHO SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHOSITE 

PKC_PHOSPHO_SITE 

PKC PHOSPHO SITE 


PDOCOOOOl 
PDOCOOOOl 
PDOCOOOOl 
PDOCOOOOl 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOCD0005 
PDOC00005 
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PS00005 

727->730 

PKC PHOSPHO 

SITE 

PDOC00005 

PS00006 

76->80 

CK2 PHOSPHO' 

"site 

PDOC00006 

PS00006 

165->169 

CK2 PHOSPHO' 

'site 

PDOC00006 

PS00006 

172-M76 

CK2*'PH0SPH0' 

"site 

PDOC00006 

PS00005 

181->185 

CK2_PH0SPH0" 

"site 

PDOC00006 

PS00006 

398->402 

CK2 PHOSPHO' 

SITE 

PDOC00006 

PS00006 

498->502 

CK2 PHOSPHO' 

SITE 

PDOC00O06 

PS00006 

503->5O7 

CK2_PH0SPH0' 

"site 

PDOC00006 

PS00006 

522->526 

CK2 PHOSPHO" 

SITE 

PDOC00006 

PS00006 

598->602 

CK2 PHOSPHO" 

site 

PDOC00006 

PSO0OO6 

600->604 

CK2 PHOSPHO" 

site 

PDOC00006 

PS00006 

579->683 

CK2 PHOSPHO" 

"site 

PDOC00006 

PSO00O7 

337->346 

TYR PHOSPHO" 

SITE 

PDOC00007 

PS00008 

13->19 

MYRISTYL 


PDOC00008 

PS00008 

97->l03 

MYRISTYL 


PDOC00008 

PS00008 

139->145 

MYRISTYL 



PS00008 

161->167 

MYRISTYL 


PDOC00008 

PS00008 

317->323 

MYRISTYL 


PDOCOOOOB 

PSO0OO8 

342->348 

MYRISTYL 


PDOC00008 

PS00008 

391->397 

MYRISTYL 


PDOC00008 

PS00008 

460->466 

MYRISTYL 


PDOC00008 

PS00008 

474->480 

MYRISTYL 


PDOC00008 

PS00008 

759->765 

MYRISTYL 


PDOC00008 

PS00009 

67->71 

AMIDATION 


PDOC00009 

PS00009 

364->368 

AMI DAT ION 


PDOC00009 

PS00190 

743->749 

CYTOCHROME C 


PDOC00169 

PS00678 

90->105 

WD REPEATS 


PDOC00574 

P500678 

223->238 

WD_REPEATS 


PDOC00574 

PSO0678 

269->284 

WD~REPEATS 


PDOC00574 


Pfam for DKFZphtes3_4f5. 3 


HMM_NAME HD domain, G-beta repeats 

HMM *MrGHnnWVWCVaFSPDGrWFIvSGSWDgTCRLWD* 

++ HN++V C+ ++P+ R +++G+-»-D+ +++WD 
Query 203 FTAHNGPVFCCDWHPEDRGWLATGGRDKMVKVWD 236 
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DKFZphtes3_4h6 


group: intracellular transport/trafficking 

lighrchaiH^*"^ encodes a novel 622 amino acid protein with strong similarity to the kinesin 

o^!?!^!"^"/ microtubule-based motor protein that pulls vesicles or organelles towards the 
aio l"" microtubules Structural changes in the protein that drive motility are coupled to 
If tut f'^'V'"'' fy^^^^y^^^; P^^tein is similar to kinesin light chain, which is par? 

?«n^^?^ holoenzyme tetrameric protein. The light chain has been proposed to 

function in coupling of cargo to the heavy chain or in the modulation of the ATPase activity 
cell-attacJ^en^^site "''''^^ Protein contains two kinesin light chain repeats and one RGD 

™oH,,?!r^ kinesin protein can find application in modulating the function of kinesin and 
modulating intracellular transport via/on microtubules. 

strong similarity to Kinesin light chain 

complete cDNA, complete cds, start at 150, EST hits (few) 

Sequenced by AGOWA 


Locus : unknown 


Insert length: 2992 bp 

Poly A stretch at pos. 2914, polyadenylation signal at pos. 2893 


1 GGCGGGATGG AGGCGGCGGG ACCGGCTCGC GGGTGCGGGT CCGGGTGAAG 
51 CGGGAGGCAG CCAGAGTCGG AGCCGGGCCC GAGCACCAGG CGCAGGCCCG 
101 GCGCCCGCCT GCCCGCACCC TCGTCCTCAC AGACGCCACA GCCATGGCCA 
151 TGATGGTGTT TCCGCGGGAG GAGAAGCTGA GCCAGGATGA GATCGTGCTG 
201 GGCACCAAGG CTGTCATCCA GGGACTGGAG ACTCTGCGTG GGGAGCATCG 
251 TGCCCTGCTG GCTCCTCTGG TTGCACCTGA GGCCGGCGAA GCCGAGCCTG 
301 GCTCGCAGGA GCGCTGCATC CTCCTGCGTC GCTCCCTGGA AGCCATTGAG 
351 CTTGGGCTGG GGGAGGCCCA GGTGATCTTG GCATTGTCGA GCCACCTGGG 
4 01 GGCTGTAGAA TCAGAGAAGC AGAAGCTGCG GGCGCAGGTG CGGCGTCTGG 
4 51 TGCAGGAGAA CCAGTGGCTG CGTGAGGAGC TGGCGGGGAC ACAGCAGAAG 
501 CTGCAGCGCA GTGAGCAGGC CGTGGCCCAG CTCGAGGAGG AGAAGCAGCA 
551 CTTGCTGTTC ATGAGCCAGA TCCGCAAGTT GGATGAAGAC GCCTCCCCTA 
€01 ACGAGGAGAA GGGGGACGTC CCCAAAGACA CACTGGATGA CCTCTTCCCC 
651 AATGAGGATG AGCAGAGCCC AGCCCCTAGC CCAGGAGGAG GGGATGTGTC 
701 TGGTCAGCAT GGGGGCTACG AGATCCCGGC CCGGCTCCGC ACCCTGCACA 
751 ACCTGGTGAT CCAATACGCC TCACAGGGCC GCTACGAGGT AGCTGTGCCA 
801 CTCTGCAAGC AGGCACTCGA AGACCTGGAG AAGACGTCAG GCCACGACCA 
851 CCCTGACGTT GCCACCATGC TGAACATCCT GGCACTGGTC TATCGGGATC 
901 AGAACAAGTA CAAGGAGGCT GCCCACCTGC TCAATGATGC TCTGGCCATC 
951 CGGGAGAAAA CACTGGGCAA GGACCACCCA GCCGTGGCTG CGACACTAAA 
1001 CAACCTGGCA GTCCTGTATG GCAAGAGGGG CAAGTACAAG GAGGCTGAGC 
1051 CATTGTGCAA GCGGGCACTG GAGATCCGGG AGAAGGTCCT GGGCAAGTTT 
1101 CACCCAGATG TGGCCAAGCA GCTCAGCAAC CTGGCCCTGC TGTGCCAGAA 
1151 CCAGGGCAAA GCTGAGGAGG TGGAATATTA CTATCGGCGG GCACTGGAGA 
1201 TCTATGCTAC ACGCCTCGGG CCCGATGACC CCAATGTGGC CAAGACCAAG 
1251 AACAACCTGG CTTCCTGCTA CCTGAAGCAG GGCAAGTACC AGGATGCGGA 
1301 GACCTTGTAC AAGGAGATCC TCACCCGCGC TCATGAGAAA GAGTTTGGCT 
1351 CTGTCAATGG GGACAACAAG CCCATCTGGA TGCACGCAGA GGAGCGGGAG 
1401 GAAAGCAAGG ATAAGCGCCG GGACAGCGCC CCCTATGGGG AATACGGCAG 
1451 CTGGTACAAG GCCTGTAAAG TAGACAGCCC CACAGTCAAC ACCACCCTGC 
1501 GCAGCTTGGG GGCCCTATAC CGGCGCCAGG GCAAGCTGGA AGCCGCGCAC 
1551 ACACTAGAGG ACTGTGCCAG CCGTAACCGC AAGCAGGGTT TGGACCCCGC 
1601 AAGCCAGACC AAGGTGGTAG AACTGCTGAA AGATGGCAGT GGCAGGCGGG 
1651 GAGACCGCCG CAGCAGCCGA GACATGGCTG GGGGTGCCGG GCCTCGGTCT 
1701 GAGTCTGACC TCGAGGACGT GGGACCTACA GCTGAGTGGA ATGGGGATGG 
1751 CAGTGGCTCC TTGAGGCGCA GCGGTTCCTT TGGGAAACTC CGGGATGCCC 
1801 TGAGGCGCAG CAGTGAGATG CTGGTAAAGA AGCTGCAGGG GGGCACCCCC 
1851 CAGGAGCCCC CTAACCCCAG GATGAAGCGG GCCAGTTCCC TCAACTTCCT 
1901 CAACAAGAGC GTGGAAGAGC CGACCCAGCC TGGAGGCACA GGTCTCTCTG 
1951 ACAGCCGCAC TCTCAGCTCC AGCTCCATGG ACCTCTCCCG ACGAAGCTCC 
2001 CTGGTGGGCT AATGCTGAAG GGGCAGCCAG TCACCAGAGC GCCCACCTGG 
2051 CACACCCCCC TCACCCCAGC CCTGCGCATG GGCCTGCTGC TTGTCCCGCC 
2101 TGTCTCTCCC ACAGCCCCTG TCTTTTCTGT TCAATCTCAG GGTAACCTTC 
2151 TCCCTTGTCA TCTCAGCCTG AGCCCTGGAG GCTGGGCCTG CCCACTCCAG 
2201 CTCCATCCCT TATTTATTCC TTCCAGCAGG GCCCTCTTCC CTAGGTTCGG 
2251 GCCAGCAGGA GGTGCCGGCT GGAGTCTCCA CCATAGACTC AGTGGCCTGG 
2301 CCTCCCCAGA CCCCAGAGCC AAGAACACTA AGCACTCGCC GGCCCTTCGG 
2351 CACCCTCGCC CTCCCTCCCG ACTCAACCCG GCCGTTGCTT CTGTATATAG 
2401 AGAAATAAGT TATTGGCCGC GCGCCTCCCT TCAGTCCACG GTACTACCCG 
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2451 GGCCTCCCCT CGTCCCTCTT CTAGTGGTAC CGCCCAGGCC TTAATCACCC 

2501 CCATTCCGTG CGGTGGTATC TCCCAGGCTC TACATTCTCG GGAGCGGCGC 

2551 CTCCCAAGGG GGTCCTGGGA CCTTCTCGCG CTCCTCCTGG CCTCTGAGGG 

2 601 ATGCGTCCTA CCCGCGCCAT CGCCCCGTGG CCCAGGACGG GGACCTCCCC 

2651 TTAGTCCGTC CTCCCACCGC CGGGCCCTGC CCCGCATCCC GGCCTTATGC 

2701 ACTGCCCCTC CCACCCGGCC CCGCCCAGGC ACGGCCGACC CCGCCCCGGG 

2751 CACCGCCCAC CGAGCCATCC TGCCTCGCCT CCCCCCACGC CTGCAGCTTC 

2801 TCGCGAGGGG CGGCGACGGT CCCCTGGTGG CAGGAGGGGC TCCCCCTGTT 

2851 GCGGGTGAGG CGGCTGCTCT CTATTTTCAG ATGTTGCTGT AGAAATAAAG 

2901 ACGGTTTAAA TCTGAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 

2951 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AA 


BLAST Results 


No BLAST result 


Medline entries 


98288268: 

Two kinesin light chain genes in mice. Identification and 
characterization of the encoded proteins. 


Peptide information for frame 3 


ORF from 144 bp to 2009 bp; peptide length: 622 
Category: strong similarity to known protein 
Prosite motifs: RGD (502-505) 
KINESIN_LIGHT (223-265) 
KINESIN_LIGHT (265-307) 


1 MAMMVFPREE KLSQDEIVLG TKAVIQGLET LRGEHRALLA PLVAPEAGEA 
51 EPGSQERCIL LRRSLEAIEL GLGEAQVILA LSSHLGAVES EKQKLRAQVR 
101 RLVQENQWLR EELAGTQQKL QRSEQAVAQL EEEKQHLLFM SQIRKLDEDA 
151 SPNEEKGDVP KDTLDDLFPN EDEQSPAPSP GGGDVSGQHG GYEIPARLRT 
201 LHNLVIQYAS QGRYEVAVPL CKQALEDLEK TSGHDHPDVA TMLNILALVY 
251 RDQNKYKEAA HLLNDALAIR EKTLGKDHPA VAATLNNLAV LYGKRGKYKE 
301 AEPLCKRALE IREKVLGKFH PDVAKQLSNL ALLCQNQGKA EEVEYYYRRA 
351 LEIYATRLGP DDPNVAKTKN NLASCYLKQG KYQDAETLYK EILTRAHEKE 
401 FGSVNGDNKP IWMHAEEREE SKDKRRDSAP YGEYGSWYKA CKVDSPTVNT 
451 TLRSLGALYR RQGKLEAAHT LEDCASRNRK QGLDPASQTK WELLKOGSG 
SOI RRGDRRSSRD MAGGAGPRSE SDLEDVGPTA EWNGDGSGSL RRSGSFGKLR 
551 DALRRSSEML VKKLQGGTPQ EPPNPRMKRA SSLNFLNKSV EEPTQPGGTG 
601 LSD5RTLSSS SMDLSRRSSL VG 

BLAST? hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_4h6, frame 3 

TREMBL:AF055666_1 gene; "Klc2"; product: "kinesin light chain 2*'; Mus 
musculus kinesin light chain 2 (Klc2) mRNA, complete cds., N « 1, Score 
= 2824, P = 4e-294 

PIR: 153013 kinesin light chain - human, N = 1, Score - 1927, P = 
4.5e-199 


PIR:C41539 kinesin light chain C - rat, N = 1, Score » 1919, P - 
3.2e-198 


SWISSPROT:KNLC_RAT KINESIN LIGHT CHAIN (KLC)., N = 1, Score - 1919, P - 
3.2e-198 


>TREMBL:AF055666_1 gene: "Klc2''; product: "kinesin light chain 2"; Mus 
musculus kinesin light chain 2 (Klc2) mRNA, complete cds. 
Length = 599 

HSPs : 
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score = 2824 (423.7 bits), Expect = 4.0e-294, P = 4 Oe-294 
Identities - 558/598 (93%), Positives = 572/598 (95%) 


Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct : 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 


1 
1 
61 
61 
121 
121 
181 
180 
241 
240 
301 
300 
361 
360 
421 
420 
481 
479 
541 
535 


MAMMVFPREEKLSQDEIVLGTKAVIQGLETLRGEHRALLAPLVAPEAGEAEPGSQERCIL 60 
MA MV PREEKLSQDEIVLGTKAVIQGLETLRGEHRALLAPL + EAGEAEPGSQERC+L 
MATMVLPREEKLSQDEIVLGTKAVIQGLETLRGEHRALLAPLASHEAGEAEPGSQERCLL 60 

J'nnfJ'f^JS^^^^^^^^^^^^^^^^^^^^^Q^^^QVRRLVQENQWLREELAGTQQKL 120 
LRRSLEAIEI^LGEAQVILALSSHLGAVESEKQKLRAQVRRLVQENQWLREELAGTQQ^ 
LRRSLEAI ELGLGEAQVI LALSSHLGAVESEKQKLRAQVRRLVQENQWLREELAGTQQKL 120 

SofnS^^'^^^^^^^^"^^™^Q^^^^'^^^^^P^^E*^^f^VPKDTLDDLFPNEDEQSPAPSP 180 
QRSEQAVAQLEEEKQHLLFMSQIRKLDE P EEKGDVPKD+LDDLFPNEDEOSPAPSP 
QRSEQAVAQLEEEKQHLLFMSQIRKLDE-MLPQEEKGDVPKDSLDDLFPNEDEQSPAPSP 179 

GGGDVSGQHGGYEIPARLRTLHNLVIQYASQGRYEVAVPLCKQALEDLEKTSGHDHPDVA 240 
n3n^T. Q"GGyEIPARLRTLHNLVIQYASQGRYEVAVPLCKQALEDLEKTSGHDHPDVA 
GGGDVAAQHGGYEIPARLRTLHNLVIQYASQGRYEVAVPLCKQALEDLEKTSGHDHPD^^ 239 

TMLNILALVYRDQNKYKEAAHLLNDALAIREKTLGKDHPAVAATLNNLAVLYGKRGKYKE 300 
TMLNTLALVYRDQNKYK4AAHLLNDALAIREKTLGKDHPAVAATLNNIAVLYGKRGKYKE 

tmlnilalvyrdqnkykdaahllndaij^irektlgkdhpavaatlnniavlS^ 299 

AEPLCKRALEIREKVLGKFHPDVAKQLSNLALLCQNQGKAEEVEYYYRRALEIYATRLGP 360 

aeplckraleirekvlgkfhpdvakqlsnlallcqnqgkaeeveyyyrraleiyatr^^ 
aeplckraleirekvlgkfhpdvakqlsnlallcqn5gkaeevey^;Sr^l"^ 359 

DDPNVAKTKNNLASCYLKQGKYQDAETLYKEILTRAHEKEFGSVNGDNKPIWMHAEEREE 420 
DDPNVAKTKNNU^SCYLKQGKYQDAETLYKEILTRAHEKEFGSVNG+NKPlS^ 
DDPNVAKTKNNLASCYLKQGKYQDAETLYKEILTRAHEKEFGSVNGENKPIwJff^ 419 

SKDKRRDSAPYGEYGSWYKACKVDSPTVNTTLRSLGALYRRQGKLEAAHTLEDCASRNRK 480 
SKDKRRD P EYGSWYKACKVDSPTVNTTLR^LGALYR +GKLEAaStLeC^AS^S 
SKDKRRDRRPM-EYGSWYKACKVDSPTVNTTLRTLGALYRPEGKLEAMTLEDCAS^ 478 

QGLDPASQTKVVELLKDGSGRRGDRRSSRDMAGGAGPRSESDLEDVGPTAEWNGDGSGSL 540 
QGLDPASQTKVVELLKDGSGR G RR SRD^AG P+SESDLE4 GrAElHGDGSGSL 
QGLDPASQTKWELLKEXJSGR-GHRRGSRDVAG PQSESDLEESGPAAEWSGDGSGSL 534 

RRSGSFGKLRDALRRSSEMLVRKLQGGGPQEP-NSRMKRASSLNFLNKSVEEPVQPGG 591 
Pedant information for DKF2phtes3_4h6, frame 3 


Report for DKFZphtes3 4h6.3 


(LENGTH) 
[MWl 
(pIJ 
[HOMOLj 

kinesin light 
[BLOCKS] 
( BLOCKS J 
* [BLOCKS} 
[BLOCKS! 
[ BLOCKS } 
{BLOCKS] 
(BLOCKS] 
[BLOCKS] 
(BLOCKS) 
[BLOCKS] 
[SUPFAM] 
[PROSITE] 
[PROSITEJ 
( PROSITE] 
(PROSITE) 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITEJ 
[PROSITE] 
[PFAM] 
(KWJ 
[KW] 
(KWJ 


622 

68934.82 
6.72 

TREMBL:AF055666_1 gene: '•Klc2"; product 
chain 2 (Klc2) mRNA, complete cds. 0.0 
BL00927C Trehalase proteins 

BL01160I Kinesin light chain repeat proteins 
BL01160H Kinesin light chain repeat proteins 
BL01160G Kinesin light chain repeat proteins 
BL01160F Kinesin light chain repeat proteins 
BL01160E Kinesin light chain repeat proteins 
BL0H60D Kinesin light chain repeat proteins 
BL01160C Kinesin light chain repeat proteins 
BL01160B Kinesin light chain repeat proteins 
BL01160A Kinesin light chain repeat proteins 
tetratricopeptide repeat homology le-07 
RGD 1 

MYRISTYL 8 
KINESINLIGHT 2 
AMI DAT I ON 2 
CAMP_PHOSPHO_SITE 5 
CK2_PH0SPH0_SITE 11 
TYR_PHOSPHO_SITE 3 
PKC_PHOSPHO_SITE 7 
ASNGLYCOSYLATION 2 
Kinesin light chain repeat 
All_Aipha 

LOW_COMPLEXITY 12.54 % 
COILED_COIL 4.98 % 


kinesin light chain 2"; Mas musculus 
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SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 

SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 

COILS 

SEQ 
SEG 
PRD 

COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SBQ 
SEG 
PRD 
COILS 


MAMMVFPREEKLSQDEIVLGTKAVIQGLETLRGEHRALLAPLVAPEAGEAEPGSQERCIL 
ccccchhhhhhhhhhhhhchhhhhhhhhhhhhhchhhhhhhhhhhhhcccccccchhhhh 


LRRSLEAIELGLGEAQVILALSSHLGAVESEKQKLRAQVRRLVQENQWLREELAGTQQKL 

hhhhhhhhhhccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcchhhhhhhhhhhhhh 
- CCCCCCCCCCCC 

QRSEQAVAQLEEEKQHLLFMSQIRKLDEDASPNEEKGDVPKDTLDDLFPNEDEQSPAPSP 

hhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccccccccccccccccccccccccc 

ceccccccccccccccccc 

GGGDVSGQHGGYEIPARLRTLHNLVIQYASQGRYEVAVPLCKOALEDLEKTSGHDHPDVA 
cccccccccccccchhhhhhhhhhhhhhhccceeeeeehhhhhhhhhhhhhccccccchh 


TMLNILALVYRDQNKYKEAAHLLNDALAIREKTLGKDHPAVAATLNNLAVLYGKRGKYKE 

xxxxxxxxxxxx 

hhhhhhhhhhhhcchhhhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhcccccchh 


AEPLCKRALEIREKVLGKFHPDVAKQLSNLALLCQNQGKAEEVEYYYRRALEIYATRLGP 
hhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhccchhhhhhhhhhhhhhhhhhccc 

DDPNVAKTKNNLASCYLKQGKYQDAETLYKEILTRAHEKEFGSVNGDNKPIWMHAEEREE 

xxxxx 

ccccccchhhhhhhhhhhccchhhhhhhhhhhhhhhhhhhhccccccccchhhhhhhhhh 


SKDKRRDSAPYGEYGSWYKACKVDSPTVNTTLRSLGALYRRQGKLEAAHTLEDCASRNRK 

xxxxxxxx 

hhhhhccccccccccccceeeeccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 


QGLDPASQTKWELLKDGSGRRGDRRSSRDMAGGAGPRSESDLEDVGPTAEWNGDGSGSL 

xxxxxxxxxxxxxx xxxxx 

hhccchhhhhhhhhhccccccccccccccccccccccccccccccceeeecccccccccc 


RRSGSFGKLRDALRRSSEMLVKKLQGGTPQEPPNPRMKRASSLNFLNKSVEEPTOPGGTG 

xxxxxxxxxx xxxx 

ccccccchhhhhhhhhhhhhhhhhhcccccccccchhhhhhhcccccccccccccccccc 


LSDSRTLSSSSMDLSRRSSLVG 
xxxxxxxxxxxxxxxxxxxx. . 
cccccccccccchhhhhhcccc 


Prosite for DKFZphtes3_4h6.3 


PSOOOOl 

449- 

->453 

ASN GLYCOSYLATION 

PDOCOOOOl 

PSOOOOl 

587- 

->591 

ASN GLYCOSYLATION 

PDOCOOOOl 

PS00004 

425- 

->429 

CAMP PHOSPHO SITE 

PDOC00004 

PS00004 

505- 

->509 

CAMP PHOSPHO SITE 

PDOC00004 

PS00004 

554- 

->558 

CAMP PHOSPHO SITE 

PDOC00004 

PS00004 

578->582 

CAMP PHOSPHO SITE 

PDOC00004 

PS00004 

616- 

->620 

CAMP PH0SPH0"SITE 

PDOC00004 

PS00005 

30->33 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

90->93 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

451- 

->454 

PKC PHOSPHO SITE 

PDOC00Q05 

PS00005 

499- 

->502 

PKC PHOSPHO SITE 

PDOC000Q5 

PS00005 

507- 

■>510 

PKC PHOSPHO SITE 

PDOC00005 

PS00005 

539- 

•>542 

PKC PHOSPHO SITE 

PE>OC00005 

PS00005 

615- 

■>618 

PKC PHOSPHO SITE 

PDOC00005 

PS00006 

13->17 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

151- 

■>155 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

163- 

•>167 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

232- 

■>236 

CK2 PHOSPHO SITE 

PDOC00006 

P500006 

470- 

'>474 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

507- 

>511 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

519- 

>523 

CK2 PHOSPHO SITE 

POOC00006 

PS00006 

521- 

>525 

CK2 PHOSPHORS ITE 

PDOC00006 


888 


wo 01/12659 


PCT/IBOO/01496 


PS00006 

568->572 

CK2 PHOSPHO 

SITE 

PS00006 

589->593 

CK2_PH0SPHO*" 

'site 

PSO00O6 

610->614 

CK2_PH0SPHO' 

SITE 

PS00007 

339->346 

TYR~PHOSPHO' 

SITE 

PS00007 

339->347 

TYR PHOSPHO~ 

SITE 

PS00007 

424->432 

TYR PHOSPHO' 

SITE 

PS00008 

71->77 

MYRISTYL 


PS00008 

86->92 

MYRISTYL 


PS00008 

182->188 

MYRISTYL 


PS00008 

187->193 

MYRISTYL 


PS00008 

402->408 

MYRISTYL 


PS00008 

482->488 

MYRISTYL 


PS00008 

598->604 

MYRISTYL 


PS00008 

600->606 

MYRISTYL 


PS00009 

292->296 

AMIDATION 


PS00009 

499->503 

AMI DAT I ON 


PS00016 

502->505 

RGD 


PS01160 

223->265 

KINESIN LIGHT 

PS01160 

265->307 

KINESIN LIGHT 


PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00007 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00009 
PDOC00009 
PDOC00016 
PDOC00893 
PDOC00893 


Pfam for DKFZphtes3 4h6.3 


HMM_NAME Kinesin light chain repeat 

"MM *RALEDREKtlGHDHPDVAtMLNNLALvCRNQNKYeEveNYYN* 

+ALED+EKT+GHDHPDVATMLN+LALV+R+QNKY+E++ ++N 
Q"®^y 223 QALEDLEKTSGHDHPDVATMLNILALVYRDQNKYKEAAHLLN 264 

^^ktianm^Jl^L ^ dkf zphtes3_4h6 . 3 strong similarity to Kinesin light chain 

Alxgnment to HMM consensus: 

<^e^y *RALEDREKtlGHDHPDVAtMLNNLALvCRNQNKYeEveNYYN* 
AL +REKTLG DHP VA LNNLA+++ ++KY+E+E + + 
dkfzphtes3 265 DALAI REKTLGKDHPAVAATLNNLAVLYGKRGKYKEAEPLCK 


306 


''"LTgn^ent to H^'conseLus : '"''^""^"^-^"•^ "-"^ similarity to Kinesin light chain 

"MM *RALEDREKtlGHDHPDVAtMLHNLALvCRNQNKYeEveNYYN* 

RALE+REK+LG HPDVA++L+NLAL+C+NQ+K EEVE YY+ 
O^^^y 307 RALEIREKVLGKFHPDVAKQLSNLALLCQNQGKAEEVEYYYR 348 

";^Jign:ae„t'to H^°con.eLus : " ^ 'Strong similarity to Kinesin light chain 

Q^ery *RALEDREKtlGHDHPDVAtMLNNLALvCRNQNKYeEveNYYN* 
RALE+ LG D P+VA+ NNLA + Q+KY+++E +Y+ 
dk f zph tes 3 349 RALEI YATRLGPDDPNVAKTKNNLASCYLKQGKYQDAETLYK 390 
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DKFZphtes3_4ol9 


group: testes derived 

DKF2phtes3_4ol9 encodes a novel 1180 amino acid protein with weak similarity to human 
megakaryocyte stimulating factor and human mucin. 

The novel protein contains a cytochrome c family heme-binding site signature. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 


similarity to megakaryocyte stimulating factor and mucin 
complete cDNA, complete cds, EST hits (few) 
Sequenced by AGOWA 
Locus: unknown 

Insert length: 3767 bp 

Poly A stretch at pos. 3757, polyadenylation signal at pos. 3737 


1 GGCTAGGTTT AGCTTCAGGG GCAGCCCAGG GCAGTGTTGC TGCATATTGC 
51 ATGGATGAAA GGCTGAAGGC TGCCTCCTCT TGCAGGCTGG CTTCTGAGAT 
101 TGCACCTTCT TCTCCTGCTA CTCCTCCAAA TCTATGflCCC TTCAAGGCAG 
151 AGCTGACCTG TCCGGTAATC AAGGCAATGC AGCCGGCCGC CTAGCTACAG 
201 TTCACGAGCC AGTTGTCACC CAGTGGGCGG TGCATCCTCC AGCCCCCGCT 
251 CACCCCAGTC TCCTGGACAA AATGGAGAAA GCGCCTCCAC AGCCCCAGCA 
301 CGAGGGCCTC AAGTCCAAGG AGCATCTTCC GCAACAGCCT GCCGAAGGCA 
351 AGACGGCGTC CCGCCGCGTC CCACGCCTCC GGGCTGTGGT CGAGAGCCAG 
401 GCCTTCAAGA ACATCCTGGT AGACGAGATG GACATGATGC ACGCCCGTGC 
451 AGCCACGCTC ATCCAAGCCA ACTGGAGGGG CTATTGGCTC CGGCAGAAGC 
501 TGATTTCCCA GATGATGGCG GCCAAGGCCA TCCAGGAGGC CTGGCGGCGC 
551 TTCAACAAGA GACACATCCT TCACTCCAGC AAGTCGTTGG TAAAGAAAAC 
601 GAGGGCGGAG GAGGGGGACA TACCTTATCA CGCCCCACAG CAGGTGCGCT 
651 TCCAGCATCC GGAAGAGAAC CGCCTTCTGT CCCCGCCCAT CATGGTGAAC 
701 AAGGAGACCC AGTTCCCTTC CTGTGAC7VAT CTGGTCCTCT GCAGACCCCA 
751 GTCGTCCCCC CTCCTGCAGC CCCCAGCAGC TCAGGGTACC CCAGAGCCCT 
801 GTGTGCAGGG TCCTCATGCT GCCAGAGTCC GGCGGCTGGC CTTCCTGCCA 
851 CACCAGACGG TCACCATCAG ATTTCCCTGC CCAGTGAGTT TGGACGCAAA 
901 ATGCCAGCCA TGCCTGCTGA CCAGAACCAT CAGAAGCACC TGCCTCGTCC 
951 ACATAGAGGG TGACTCAGTG AAGACCAAAC GTGTAAGTGC CCGGACCAAC 
1001 AAAGCCAGGG CTCCGGAGAC ACCATTGTCC AGAAGGTATG ACCAGGCAGT 
1051 TACGAGACCA TCCAGAGCCC AAAGCCAGGG CCCTGTGAAA GCAGAGACCC 
1101 CCAAAGCCCC CTTCCAGATA TGTCCAGGGC CCATGATCAC CAAGACTCTA 
1151 CTCCAGACAT ATCCAGTGGT CTCCGTGACC CTGCCACAGA CATATCCAGC 
1201 GTCCACGATG ACCACCACCC CACCCAAGAC TAGCCCAGTT CCCAAAGTAA 
1251 CAATAATCAA GACCCCAGCC CAGATGTATC CGGGGCCCAC AGTGACCA7«\ 
1301 ACTGCACCTC ACACATGCCC CATGCCCACA ATGACCAAGA TCCAGGTACA 
1351 CCCCACAGCC TCCAGAACTG GCACCCCACG GCAGACATGC CCTGCGACCA 
1401 TCACGGCAAA GAACCGACCT CAGGTTTCCC TTCTGGCTTC CATCATGAAG 
1451 AGCCTGCCCC AGGTATGCCC GGGGCCTGCG ATGGCAAAGA CCCCACCCCA 
1501 GATGCACCCG GTCACCACCC CAGCCAAAAA CCCATTGCAA ACATGTCTGT 
1551 CAGCCACAAT GTCCAAGACT TCATCCCAGA GGAGCCCAGT TGGGGTGACC 
1601 AAGCCCTCAC CCCAGACCCG CCTGCCAGCC ATGATAACCA AGACCCCAGC 
1651 CCAGTTACGC TCGGTGGCCA CCATCCTCAA GACTCTGTGT CTGGCCTCTC 
1701 CAACAGTGGC AAATGTCAAG GCTCCACCCC AAGTGGCGGT AGCAGCCGGA 
1751 ACTCCCAACA CCTCAGGCTC CATCCATGAG AACCCACCCA AGGCCAAGGC 
1801 CACCGTGAAT GTGAAGCAGG CTGCAAAGGT GGTGAAAGCC TCATCCCCCT 
1851 CCTATTTGGC TGAGGGGAAG ATCAGGTGCC TGGCTCAACC ACATCCGGGA 
1901 ACTGGGGTCC CCAGGGCTGC AGCTGAGCTT CCTTTGGAAG CCGAGAAAAT 
1951 CAAGACTGGC ACCCAGAAAC AGGCGAAAAC AGACATGGCA TTTAAGACCA 
2001 GTGTGGCAGT GGAAATGGCT GGGGCTCCAT CCTGGACAAA AGTTGCTGAG 
2051 GAAGGGGACA AGCCACCTCA CGTGTATGTG CCTGTAGACA TGGCTGTCAC 
2101 CCTGCCCCGG GGACAGCTGG CTGCCCCACT GACCAATGCC TCATCCCAGA 
2151 GACATCCACC CTGCCTGTCC CAGAGACCAC TGGCCGCCCC GCTGACCAAG 
2201 GCCTCATCTC AGGGACATCT GCCCACTGAG CTGACCAAGA CCCCATCCCT 
2251 GGCCCATCTG GACACCTGTC TGAGCAAGAT GCATTCCCAG ACACATCTGG 
2301 CCACAGGTGC CGTGAAGGTC CAGTCCCAAG CGCCTCTAGC CACCTGTCTG 
2351 ACCAAGACGC AGTCCCGGGG GCAGCCGATC ACAGACATAA CCACGTGCCT 
2401 CATCCCAGCG CACCAGGCTG CTGATCTCAG CAGCAACACC CACTCCCAGG 
2451 TGCTCCTAAC AGGGTCCAAG GTGTCCAACC ACGCCTGCCA GCGCCTCGGT 
2501 GGCCTCAGCG CCCCACCCTG GGCCAAGCCA GAGGACAGAC AGACCCAGCC 
2551 ACAGCCCCAC GGACACGTGC CGGGGAAGAC CACTCAGGGG GGACCATGCC 
2601 CGGCAGCCTG TGAGGTCCAG GGTATGCTGG TGCCGCCGAT GGCACCCACC 
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2651 GGCCATTCCA CATGCAACGT TGAGTCCTGG GGAGACAACG GAGCCACACG 
2701 TGCCCAGCCA TCAATGCCCG GCCAGGCGGT GCCCTGCCAG GAGGACACGG 
2751 GCCCCGCGGA CGCTGGTGTG GTTGGTGGCC AATCGTGGAA CCGCGCATGG 
2801 GAGCCAGCCA GGGGTGCTGC GTCCTGGGAC ACCTGGCGCA ACAAGGCGGT 
2851 GGTGCCTCCC AGGCGGTCCG GGGAGCCAAT GGTGTCCATG CAGGCTGCAG 
2901 AGGAGATCCG CATCCTCGCA GTGATCACTA TCCAGGCGGG CGTCCGTGGC 
2951 TACCTGGCGC GTCGCAGGAT CCGGCTGTGG CACCGGGGGG CCATGGTCAT 
3001 CCAAGCTACT TGGCGCGGCT ACCGTGTGCG GCGGAACCTG GCACACCTCT 
3051 GCAGAGCCAC CACGACCATC CAGTCTGCCT GGCGCGGCTA CAGCACCCGC 
3101 CGGGACCAAG CCCGGCACTG GCAGATGCTC CACCCCGTCA CGTGGGTGGA 
3151 GCTGGGCAGC CGGGCCGGGG TCATGTCTGA CCGAAGCTGG TTCCAGGATG 
3201 GCAGAGCCAG GACAGTATCT GACCATCGCT GCTTCCAGTC CTGCCAGGCA 
3251 CACGCTTGCA GCGTCTGCCA CTCCCTGAGC TCCAGGATCG GGAGCCCGCC 
3301 CAGCGTGGTG ATGCTAGTGG GCTCCAGCCC TCGCACCTGT CATACCTGTG 
3351 GACGCACACA GCCCACCCGT GTGGTGCAGG GCATGGGCCA GGGCACTGAG 
3401 GGCCCCGGGG CAGTGTCTTG GGCCTCCGCC TACCAGCTGG CTGCCCTGAG 
34 51 TCCCAGGCAG CCGCATCGCC AGGACAAAGC GGCCACAGCC ATCCAGTCCG 
3501 CCTGGAGGGG CTTTAAGATC CGCCAGCAGA TGAGGCAGCA GCAAATGGCA 
3551 GCGAAGATAG TTCAAGCCAC CTGGCGAGGC CACCATACCC GGAGCTGTCT 
3601 GAAGAACACA GAGGCGCTCT TGGGACCAGC AGACCCCTCG GCCAGCTCAC 
3651 GGCACATGCA TTGGCCTGGC ATCTAGGACC CTGGCTCCCT GCAGTGGGGA 
3701 CTTCGTGGGA GGCACTCATG GCTCTCTGGG TCTAATGAAT AAAGTCCTCC 
3751 ACAGCCTAAA AAAAAAA 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 2 


ORF from 134 bp to 3673 bp; peptide length: 1180 
Category: similarity to known protein 


1 MTLQGRADLS GNQGNAAGRL ATVHEPVVTQ WAVHPPAPAH PSLLDKMEKA 
51 PPQPQHEGLK SKEHLPQQPA EGKTASRRVP RLRAVVESQA FKNILVDEMD 
101 MMHARAATLI QANWRGYWLR QKLISQMMAA KAIQEAWRRF NKRHILHSSK 
151 SLVKKTRAEE GDIPYHAPQQ VRFQHPEENR LLSPPIMVNK ETQFPSCDNL 
201 VLCRPQSSPL LQPPAAQGTP EPCVQGPHAA RVRGLAFLPH QTVTIRFPCP 
251 VSLDAKCQPC LLTRTIRSTC LVHIEGDSVK TKRVSARTNK ARAPETPLSR 
301 RYDQAVTRPS RAC?TQGPVKA ETPKAPFQIC PGPMITKTLL QTYPVVSVTL 
351 PQTYPASTMT TTPPKTSPVP KVTIIKTPAQ MYPGPTVTKT APHTCPMPTM 
401 TKIQVHPTAS RTGTPRQTCP ATITAKNRPQ VSLLASIMKS LPQVCPGPAM 
451 AKTPPQMHPV TTPAKNPLQT CLSATMSKTS SQRSPVGVTK PSPQTRLPAM 
501 ITKTPAQLRS VATILKTLCL ASPTVANVKA PPQVAVAAGT PNTSGSIHEN 
551 PPKAKATVNV KQAAKWKAS SPSYLAEGKI RCLAQPHPGT GVPRAAABLP 
601 LEAEKIKTGT QKQAKTDMAF KTSVAVEMAG APSWTKVAEE GDKPPHVYVP 
651 VDMAVTLPRG QLAAPLTNAS SQRHPPCLSQ RPLAAPLTKA SSQGHLPTEL 
701 TKTPSLAHLD TCLSECMHSQT HLATGAVKVQ SQAPLATCLT KTQSRGOPIT 
751 DITTCLIPAH QAADLSSNTH SQVLLTGSKV SNHACQRLGG LSAPPWAKPE 
801 DRQTQPQPHG HVPGKTTQGG PCPAACEVQG MLVPPMAPTG HSTCNVESWG 
851 DNGATRAQPS MPGQAVPCQE DTGPADAGVV GGQSWNRAWE PARGAASWDT 
901 WRNKAVVPPR RSGEPMVSMQ AAEEIRILAV ITIQAGVRGY LARRRIRLWH 
951 RGAMVIQATW RGYRVRRNLA HLCRATTTIQ SAWRGYSTRR DQARHWQMLH 
1001 PVTWVELGSR AGVMSDRSWF QDGRARTVSD HRCFQSCQAH ACSVCHSLSS 
1051 RIGSPPSWM LVGSSPRTCH TCGRTQPTRV VQGMGQGTEG PGAVSWASAY 
1101 QLAALSPRQP HRQDKAATAI QSAWRGFKIR QQMRQQQMAA KIVQATWRGH 
1151 HTRSCLKNTE ALLGPADPSA SSRHMHWPGI 

BLAST? hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_4ol9, frame 2 

TREMBL:HSU70136_1 product: ''megakaryocyte stimulating factor"; Human 
megakaryocyte stimulating factor mRNA, complete cds.. N « 2, Score « 
242/ P ~ 9*6e'~16 
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TREMBL : HSMUC2 A_l gene: "MUC2"; product: "mucin"; Human mucin-2 gene, 
partial cds., N = 1, Score = 204, P = 1.4e-12 

PIR:S48478 glucan 1 , 4-alpha-glucosidase (EC 3.2.1.3) - yeast 
(Saccharomyces cerevisiae), N 1, Score = 192, P = 9,6e-ll 


>TREMBL:HSU70136_1 product: "megakaryocyte stimulating factor"; Human 
megakaryocyte stimulating factor mRNA, complete cds. 
Length = 1,404 

HSPs: 

Score = 242 (36.3 bits), Expect * 9.6e-16, Sum P(2) = 9.6e-16 
Identities * 145/546 (26%), Positives « 198/546 (36%) 

Query: 282 KRVSARTNKARAPETPLSRRYDQAVTRPSRAQTQGPVKAETPKAPFQIC-PGPMITKTLL 340 

K+ + T K AP TP PS + P T AP P P TK+ 

Sbjct: 488 KKPAPTTPKEPAPTTP-KEPAPTTTKEPSPTTPKEPAPTTTKSAPTTTKEPAPTTTKSAP 546 

Query: 341 QTYPWSVTLPQ TYPASTMTTTPPKTSPV-PKVTIIKTPAQMYPGPTVTKTAPHTC 395 

T S T f TP TTP K +P PK TP + P PT TK 
Sbjct: 547 TTPKEPSPTTTKEPAPTTPKEPAPTTPKKPAPTTPKEPAPTTPKE—PAPTTTKK 599 

Query: 396 PMPTMTKIQVHPTASRTGTPRQTCPATITAKNRPQVSLLASIMKSLPQVCPGPAMAKTPP 455 

P PT K + PT TP++T P T LA P +A T P 

Sbjct: 600 PAPTAPK-EPAPT TPKETAPTTPKKLTPTTPEKLAPTTPEKPAPTTPEELAPTTP 653 

Query: 456 QMHPVTTPAKNPLQTCLSATMSKTSSQRSPVGVTKPSPQT-RLPAMIT-KTPAQLRSVAT 513 

+ TTP + P T A T + +P +P+P T + PA T K A T 
Sbjct: 654 EEPTPTTP-EEPAPTTPKAAAPNTPKEPAPTTPKEPAPTTPKEPAPTTPKETAPTTPKGT 712 

Query: 514 ILKTLCLASPTVANVKAPPQVAVAAG TPNTSGSIHENPPKAKATVNVKQAAKW-KA 569 

TL +PT AP ++A T TS PK A K+ A K 

Sbjct: 713 APTTLKEPAPTTPKKPAPKELAPTTTKEPTSTTSDKFAPTTPKGTAPTTPKEPAPTTPKE 772 

Query: 570 SSPSYLAEGKIRCLAQPHPGTGVPRAAAELPLEAEKIKTGT— QKQAKTDMAFKTSVAVE 627 

+P+ l+ppt a el KTT KAT +T+ 

Sbjct: 773 PAPTTPKGTAPTTLKEPAPTTPKKPAPKELAPTTTKGPTSTTSDKPAPTTPK-ETAPTTP 831 

Query: 628 MAGAPSWTKVAEEGDKPPHVYVPVDMAVTLPRGQLAAPLTNASSQEUIPPCLSQRPLAAPL 687 

AP+ K+ P P V+P +S PLSP L 

Sbjct: 832 KEPAPTTPK--KPAPTTPETPPPTTSEVSTPTTTKEPTTIHKSPDESTPELSAEPTPKAL 889 

Query: 688 TKASSQGHLPTELTKTPSLA— HLDTCLSKMHSQTHLATGAVKVQSQAPLAT— CLTKTQ 743 

+ + +PT TKTP+ + T ++ L T + + AP T T T+ 

Sbjct: 890 ENSPKEPGVPT—TKTPAATKPEMTTTAKDKTTERDLRT-TPETTTAAPKMTKETATTTE 946 

Query: 744 SRGQPITDITTCLIPAHQAADLS— SNTHSQVLLTGSKVSN — HACQRLGGLSAPP-WAK 798 

+ TT++ D+ T+ KV+ ++ P AK 

Sbjct: 947 KTTESKITATTTQVTSTTTQDTTPFKITTLKTTTLAPKVTTTKKTITTTEIMNKPEETAK 1006 

Query: 799 PEDRQTQPQPHGHVPGKTTQGGPCPAA 825 

P+DR T + P K T+ P + 

Sbjct: 1007 PKDRATNSKATTPKPQKPTKAPKKPTS 1033 

Score = 205 (30.8 bits). Expect = 3.1e-12, Sura P(2) = 3.1e-12 
Identities = 146/565 (25%), Positives » 209/565 (36%) 

Query: 281 TKRVSARTNKARAPETPLSRRYDQAVTRPSRAQTQGPVKAE— TPKAPFQICPGPMITKT 338 

TK+ + KAPTP +ATP+ PK TP+ P P + T 

Sbjct: 597 TKKPAPTAPKEPAPTTPK ETAPTTPKKLTPTTPEKLAPTTPEKPAPTTPEELAPTT 652 

Query: 339 LLQTYPVVSVTLPQTYPASTMTTTPPKTSPV-PKVTIIKTPAQMYPGPTVTK-TAPHTCP 396 

+ P T P + TP + +P PK TP + P PT K TAP T P 

Sbjct: 653 PEEPTPTTPEEPAPTTPKAAAPNTPKEPAPTTPKEPAPTTPKE — PAPTTPKETAP-TTP 709 

Query: 397 M PTMTKIQVHPTASRTGTPRQTCPATITAKNRPQVSLLASIMKSLPQVCPGPAMAKT 453 

PT K + PT + P++ PT +S+KP GAT 

Sbjct: 710 KGTAPTTLK-EPAPTTPKKPAPKELAPTT TKEPTSTTSD — KPAPTTPKGTAPT-T 761 

Query: 454 PPQMHPVTTPAKNPLQTCLSATMSKTSSQRSPVGVTKPSPQTRLPAMITKTPAQLRSVAT 513 
P + P TTP KPT T T + +P KP+P+ P TK P S 

Sbjct: 762 PKEPAP-TTP-KEPAPTTPKGTAPTTLKEPAPTTPKKPAPKELAPTT-TKGPTSTTSDKP 818 

Query: 514 ILKTLCLASPTVANVKAPPQVAVAAGTPNTSGSIHENPPKAKATVNV KQAAKWKA 569 

T +PT AP APT E PP + V+ K+ + K+ 

Sbjct: 819 APTTPKETAPTTPKEPAPTTPKKPA — PTTP ETPPPTTSEVSTPTTTKEPTTIHKS 872 

Query: 570 SSPSYLAEGKIRCLAQPHPGTGVPRAAAELPLEAEKIKTGTQKQAKTDMAFKTSVAV 626 

S+P AE + L GVP + p + T T K T+ +T+ 
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Sbjct: 873 PDESTPELSAEPTPKALENSPKEPGVP-TTKTPAATKPEMTTTAKDKTTERDLRTTPET 930 

Query: 627 EMAGAPSWTK-VAEEGDKPPHVYVPVDMAVTLPRGQLAAPLTNASSQRHPPCLSORPLAA 685 
A AP TK A +K + +T Q^- + T ++ r ta 


Sbjct: 
Query : 
Sbjct: 
Query: 
Sbjct: 
Query : 


931 TTA-APKMTKETATTTEKT TESKITATTTQVTSTTTQDTTPFKITTLKTTTLAP 983 

686 PLTKASSQGHLPTELTKTPSLAHLDTCLSKMHSQTHLATGAVKVQS OAPLATCLT 740 

+T + + TE+ P +T K + AT K Q + P +T 

984 KVT-TTKKTITTTEIMNKPE ETAKPKDRATNSKAT-TPKPQKPTKAPKKPTSTKKP 1037 

741 KTQSR-GQPITDIT TCLIPAHQAADLSSNTHSQVLLTGSKVSNHACQRLGGLSAPP 795 

KT R +p T T T +P + Q ++ N + S 

1038 KTMPRVRKPKTTPTPRKMTSTMPELNPTSRIAEAMLQTTTRPNQTPNSKLVEVNPKSEDA 1097 


796 W-AKPEDRQTQPQPHGHVPGKTTQGGPCPAACEVQGMLVPPMAPTGHSTCN 845 
^ +PH +P T P QG+++ PM + CN 

Sbjct; 1098 GGAEGETPHMLLRPHVFMPEVTPDKDYLPRVPN-QGIIINPMLSDETNICN 1147 

Score = 198 (29.7 bits). Expect ^ 2.3e-ll, Sum P{2) = 2.3e-ll 
Identxties = 142/513 <27%1, Positives = 200/513 (38%) 


Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 


204 RPQSSPLLQPPAAQGTPEPCVQGPHAARVRGLAFLPHQTVTIRFPCPVSLDAKCQPCLLT 263 

^ G + H V+ + +p L 

207 RTKKKPTPKPPVVDEAGSGLDNGDFKVTTPDTSTTQHNKVSTSPKITTAKPINPRPSLPP 266 

264 R-TIRSTCLVHIEGDSVKTKRVSARTNKARAP ETPLSRRYDQAVTRPSR<-AQTO 315 

T + T L + +V+TK + TNK + E S + Q++ + s A T 
267 NSDTSKETSLTVNKETTVETKETTT-TNKQTSTDGKEKTTSAKETQSIEKTSAKDLAPTS 325 

316 GPVKAETPKAPFQICPGPMITKTLLQTYPVVSVn-LPQTYPASTMTTTPPKTSPVPKVTII 375 

+ TPKA GP +T T + P T P+ PAST TP + +P + 
326 KVLAKPTPKAE-TTTKGPALT-TPKEPTP TTPKE-PAST— TPKEPTPTTIKSAP 375 

376 KTPAQMYPGPTVTKTAPHTC-PMPTMTKIQVHPTASRTGTPRQTC-PATITAKNRPQVS 432 
TP + P PT TK+AP T P PT TK + PT + F T PA T K+ P 

376 TTPKE— PAPTTTKSAPTTPKEPAPTTTK-EPAPTTPKEPAPTTTKEPAPTTTKSAPTTP 432 

433 I'LASIMKSLPQVCPGPAMAKTPPQMHPVTTPAKNPLQTCLSATMSKTSSQRSPVGVT 489 

* K P PA TP + p TTP KPT + T + +P 

433 KEPAPTTPKKPAPTTPKBPAPT-TPKEPTP-TTP-KEPAPTTKEPAPT-TPKEPAPTAPK 488 

540 

l + PATKPA + T K T ++PT AP AT 

489 KPAPTTPKEPAPTTPKEPAPTTTKEPSPTTPKEPAPTTTKSAPTTTKEPAPTTTKSAPTT 548 

541 PNT-SGSIHENP PKAKATVNVKQAAKVV-KASSPSYLAEGKIRCLAQPHPGTGVPR 594 

P S++P PKA K+A K +P+ E +P P P+ 

549 PKEPSPTTTKEPAPTTPKEPAPTTPKKPAPTTPKEPAPTTPKEPAPTTTKKPAPTA-PK 606 

595 AAAELPLEAEKIKTGTQKQAKTDMAFKTSVAVEMAGAPSWTK-VAEEGDKPPHVYVPVDM 653 

^ K + AP+ + +A + P p + 

607 EPA— PTTPKETAPTTPKKLTPTTPEKLAPTTPEKPAPTTPEELAPTTPEEPTPTTPEEP 664 

654 AVTLPRGQLAAPLTNASSQRHP-PCLSQRPLAAPLTKASSQGHLPTELTKTPSLAHLDTC 712 

™. ^ +PP+PAPT PET T 

665 APTTPKA— AAPNT PKEPAPTTPKEP— APTTPKEPAPTTPKETAPTTPKGTAPTT 716 

713 LSK 715 

L + 
717 LKE 719 


Score = 108 (16.2 bits). Expect = 4.3e-02, Sura P(2) = 4 3e-02 
Identities = 60/214 (28%), Positives = 85/214 (39%) 


Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 


265 TIRSTCLVHIEGDSVKTKRVSAR-TNKA--RAPETP-LSRRYDQAVTRPSRAQTQGPVKA 320 
T + +H D T +SA T KA +P+ p + a T+P T 

862 TTKEPTTIHKSPDE-STPELSAEPTPKALENSPKEPGVPTTKTPAATKPEKTTTAKDKTT 920 

321 ETP— KAPFQICPGPMITK-TLLQTYPVVSVTLPQTYPASTMTTTPPKTSPVPKVTIIKT 377 

E P p +TK T T + T T TTT T+P K+T +kt 

921 ERDLRTTPETTTAAPKMTKETATTTEKTTESKITATTTQVTSTTTQD-TTPF-KITTLKT 978 

378 paqmypgptvtk— TAPHTCPMPTMT-KIQVHPTASRTGTPRQTCPATITAKNRPQVSL 433 

"^PTTK T PTK+ TS+ TP+ P A +P + 

979 TT-LAPKVTTTKKTITTTEIMNKPEETAKPKDRATNSKATTPKPQKPTK-APKKPTSTK 1035 


4 34 LASIMKSL— PQVCPGPA-MAKTPPQMHPVTTPAKNPLQT 470 
M + P+ P P M T P+++P + A+ LQT 
Sbjct: 1036 KPKTMPRVRKPKTTPTPRKMTSTMPELNPTSRIAEAMLQT 1075 

Score - 56 (8,4 bits). Expect « 3.1e-12, Sum P(2) = 3,le-12 
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Identities = 17/60 (28%), Positives « 22/60 (36%) 

Query: 22 TVHEPVVTQWAVHPPAPAHPSLLDKMEKAPPQPQHEGLKS-KEHLPQQPAEGKTASRRVP 80 

T EP T P P PS E AP P+ + K+ P p E + + P 

SbjCt: 533 TTKEPAPTTTKSAPTTPKEPSPTTTKEPAPTTPKEPAPTTPKKPAPTTPKEPAPTTPKEP 592 

Score = 52 (7.8 bits). Expect = 9.6e-16, Sum P(2) = 9.6e-16 
Identities - 17/59 (28%), Positives « 22/59 (37%) 

Query: 22 TVHEPV-VTQWAVHPPAPAHPSLLDKMEKAPPQPQHEGLKSKEHLPQQPAE-GKTASRR 78 

T EP T P P P+ E P P+ +KE P P E TA ++ 

Sbjct: 4 31 TPKEPAPTTPKKPAPTTPKEPAPTTPKEPTPTTPKEPAPTTKEPAPTTPKEPAPTAPKK 489 

Score « 51 (7.7 bits). Expect = 1.2e-15, Sum P(2) = 1.2e-15 
Identities 15/51 (29%), Positives = 19/51 (37%) 

Query: 22 TVHEPWTQWAVHPPAPAHPSLLDKMEKAPPQPQHEGLKS-KEHLPQQPAE 71 

T EP T P P P+ + AP P+ + KE P P E 

Sbjct: 416 TTKEPAPTTTKSAPTTPKEPAPTTPKKPAPTTPKEPAPTTPKEPTPTTPKE 4 66 

Score = 47 (7.1 bits). Expect = 3.2e-15, Sum P(2) = 3.2e-15 
Identities « 12/41 (29%), Positives = 17/41 (41%) 

Query: 36 PAPAHPSLLDKMEKAPPQPQHEGLKSKEHLPQQPAEGKTAS 76 
P P P + P +P +KS P++PA T S 

Sbjct: 350 PTPTTPK — EPASTTPKEPTPTTIKSAPTTPKEPAPTTTKS 388 

Score = 47 (7.1 bits). Expect = 3.2e-15, Sum P(2) = 3.2e-15 
Identities = 15/57 (26%), Positives = 19/57 (33%) 

Query: 22 TVHEPWTQWAVHPPAPAHPSLLDKMEKAPPQPQHEG-LKSKEHLPQQPAEGKTASR 77 

T EP T P P P+ E AP P+ +KE P T + 

Sbjct: 377 TPKEPAPTTTKSAPTTPKEPAPTTTKEPAPTTPKEPAPTTTKEPAPTTTKSAPTTPK 433 

Score = 46 (6.9 bits), Expect = 4.0e-15, Sum P(2) = 4.0e-15 
Identities - 16/58 (27%), Positives «= 22/58 (37%) 

Query: 20 LATVHEPVVT QWAVHPPAPAHPSLLDKMEKAPPQPQHEGLKSKEHLPQQPAEGKT 74 

L T EP T + A P P+ + P +P KS P+-I-PA T 

Sbjct: 344 LTTPKEPTPTTPKEPASTTPKEPTPTTIKSAPTTPKEPAPTTTKSAPTTPKEPAPTTT 401 

Score « 42 (6.3 bits). Expect « l.Oe-14, Sum P(2) = l.Oe-14 
Identities = 15/60 (25%), Positives = 21/60 (35%) 

Query: 22 TVHEPVVTQWAVHPPAPAHPSLLDKMEKAPPQPQHEGLKS-KEHLPQQPAEGKTASRRVP 80 

T EP T P P P+ + AP P+ + KE P E + + P 

Sbjct: 4 63 TPKEPAPTTKEPAPTTPKEPAPTAPKKPAPTTPKEPAPTTPKEPAPTTTKEPSPTTPKEP 522 

Score = 39 (5.9 bits). Expect = 2.1e-14, Sum P(2) « 2.1e-14 
Identities = 15/55 (27%), Positives = 20/55 (36%) 

Query: 22 TVHEPWTQWAVHPPAPAHPSLLDKMEKAPPQPQHEGLKSKEHLPQQPAEGKTAS 76 

T EP T P PA + + P +P KS ++PA T S 

Sbjct: 494 TPKEPAPTT PKEPAPTTTKEPSPTTPKEPAPTTTKSAPTTTKEPAPTTTKS 544 


Pedant information for DKFZphtes3_4ol9, frame 2 


Report for 0KFZphte33_4ol9.2 


(LENGTH] 

1180 


CMW] 

127693.40 


[pl] 

10.25 


[HOMOLl 

SWISSPR0T:MUC2 HUMAN MUCIN 2 PRECURSOR (INTESTINAL MUCIN 2). le-08 

[FUNCAT] 

98 Classification not yet clear-cut [S. 

cerevisiae, YJR151c] 6e-06 

[FUNCAT) 

30.01 organization of cell wall (S. 

cerevisiae, YIR019c] 6e-06 

(FUNCAT) 

30.90 extracellular/secretion proteins 

(S. cerevisiae, YIR019C) 

{FUNCAT] 

01.05.01 carbohydrate utilization (S. 

cerevisiae, YIR019c} 6e-06 

{BLOCKS] 

BL004i2B Neuromodulin (GAP-43) proteins 


[PROSITE] 

CYTOCHROME C 1 


(PROSITE) 

MYRISTYL 12 


(PROSITE) 

CAMP PHOSPHO SITE 1 


(PROSITE] 

CK2 PHOSPHO SITE 8 


(PROSITE] 

PKC PHOSPHO SITE 25 


(PROSITE) 

ASNGLYCOSYLATION 2 


IKWJ 

Alpha Beta 


(KWJ 

LOW^COMPLEXITY 5.00 % 
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SEQ MTLQGRADLSGNQGNAAGRLATVHEPWTQWAVHPPAPAHPSLLDKMEKAPPQPQHEGLK 

PRD cccccceeeccccccceeeeeeeeceeeeeeeecccccccceeeeccccccccccccc^ 

SEQ SKEHLPQQPAEGKTASRRVPRLRAVVESQAFKNILVDEMDMMHARAATLIQANWRGYWLR 
SEG 

PRD cccccccccccccccccchhhhhhhhhhhhhhheeehhhhhhhhh^ 

SEQ QKLISQMMAAKAIQEAWRRFNKRHILHSSKSLVKKTRAEEGDIPYHAPQQVRFQHPEENR 
SBG , 

PRD hhhhhhhhhhhhhhhhhhhhhhheeeecccchhhhhhhhcccccccccceeeeccccc^ 

SEQ Li-SPPIMVNKETQFPSCDNLVLCRPQSSPLLQPPAAQGTPEPCVQGPHAARVRGLAFLPH 
SEG - 

PRD eeccceeeecccccccccceeeecccccccccccccccccccccccccceee^ 

SEQ QTVTIRFPCPVSLDAKCQPCLLTRTIRSTCLVHIEGDSVKTKRVSARTNKARAPETPLSR 
SEG 

PRD eeeeeecccccccccccccccccccccceeeeecccccccceeeeecccccccccccccc 

SEQ RYDOAVTRPSRAQTOGPVKAETPKAPFQICPGPMITKTLLOTYPWSVTLPOTYPASTMT 
"*"■••••■••••••-••.-..•.............,,. xxxx 

PRD ccceeeeeccccccccceeecccccccccccccccccccccccccccccccccccccccc 

SEQ TTPPKTSPVPKVTUKTPAQMYPGPTVTKTAPHTCPMPTMTKIQVHPTASRTGTPRCyrCP 

SEG xxxxxxxxxxxxx 

PRD cccccccccccceeeccccccccccccccccccccccccccceeecccccccccccccc^ 

SEQ ATITAKNRPQVSLLASIMKSLPQVCPGPAMAKTPPQMHPVTTPAKNPLQTCLSATMSKTS 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ SQRSPVGVTKPSPQTRLPAMITKTPAQLRSVATILKTLCLASPTVANVKAPPQVAVAAGT 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccc^ 

SEQ PNTSGSIHENPPKAKATVNVKQAAKVVKASSPSYLAEGKIRCLAQPHPGTGVPRAAAELP 

SEG xxxxxxxxxxxxxxxxx xxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ LEAEKI KTGTQKQAKTDMAFKTSVAVEMAGAPSWTKVAEEGDKPPHVYVPVDMAVTLPRG 

SEG xxxx 

PRD ccccccccccccccccccccccccccccccccccceeeeccccccceeecccccccccc^ 

SEQ QLAAPLTNASSQRHPPCLSQRPLAAPLTKASSQGHLPTELTKTPSLAHLDTCLSKMHSQT 

SEG *••••--•-•.••.•.........,........ 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ HIATGAVKVQSQAPLATCLTKTQSRGQPITDITTCLIPAHQAADLSSNTHSQVLITGSKV 

PRD ccccceeeeeccccceeeeccccccccccccccccccccccccccccccceeeeeccccc 

SEQ SNHACQRLGGLSAPPWAKPEDRQTQPQPHGHVPGKTTQGGPCPAACEVQGMLVPPMAPTG 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccc^ 

SEQ HSTCNVESWGDNGATRAQPSMPGQAVPCQEDTGPADAGWGGQSWNRAWEPARGAASWDT 

SEG _^ 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ WRNKAWPPRRSGEPMVSMQAAEEIRILAVITIQAGVRGYLARRRIRLWHRGAMVIQATW 

SEG 

PRD ccceeecccccccccchhhhhhhhhhhhhhhhccccchhhhh^ 

SEQ RGYRVRRNLAHLCRATTTIQSAWRGYSTRRDQARHWQMLHPVTWVELGSRAGVMSDRSWF 

SEG 

PRD ^ihhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhheeeeeccchhhhh^ 

SEQ QDGRARTVSDHRCFQSCQAHACSVCHSLSSRIGSPPSWMLVGSSPRTCHTCGRTQPTRV 
SEG 

PRD hccceeeeccceeeecccceeeeeeeecccccccccceeeeeeccccccccccccc^ 

SEQ VQGMGQGTEGPGAVSWASAYQLAALSPRQPHRQDKAATAIQSAWRGFKIRQQMRQQQMAA 

XXXXXXXXXXXXX 

PRD eeeccccccccccchhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ KTVQATWRGHHTRSCLKNTEALLGPADPSASSRHMHWPGI 

SEG XX 

PRD hhhhhhhccccccchhhhhhhhcccccccccccccccccc 
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PSOOOOl 

PSOOOOl 

PS00004 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00008 

PS00008 

PS00008 

PS00008 

PS00008 

PS00008 

PS00008 

PS00008 

PS00008 

PS00008 

PS00008 

PS00008 

PS00190 


542->546 
668->672 
282->286 
76->79 
148->151 
244->247 
265->268 
278->281 
281->284 
285->288 
288->29l 
299->302 
322->325 
414->417 
424->427 
481->484 
610->613 
671->674 
679->682 
900->9O3 
959->962 

987- >990 
1015->1018 
1049->1052 
1065->1068 
1106->1109 
1146->1149 
1171->1174 

22->26 
42->46 
156->160 
546->550 
848->852 

988- >992 
1003->lOO7 
1027->1031 

11->17 
14->20 
539->545 
59l->597 
746->752 
777->783 
853->859 
878->884 
882->888 
1008->1014 
1053->1059 
1083-">1089 
1042->1048 


ASNGLYCOS YLATI ON 

ASN_GLYCOSYLATION 

CAMP_PHOS PHO_S I TE 

PKC_PH0SPHO_SITE 

PKC_PHOSPH0_SITE 

PKC_PHOSPHO SITE 

PKC_PHOSPHO~S I TE 

PKC_PHOSPHO SITE 

PKC_PHOSPHO~S I TE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PH0SPHO SITE 

PKC_PHOSPHO~SITE 

PKC_PHOSPHO_S ITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_S ITE 

PKC_PHOS PHO_S ITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOS PHO_S ITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PH0SPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PH0SPHO SITE 

CK2_PH0SPH02SITE 

CK2_PHOSPHO SITE 

C K2_PHOS PHO^S ITE 

CK2_PH0SPHO_SITE 

C K2_PH0S PHO_S ITE 

CK2_PH0SPH0 SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

CYTOCHROME C 


PDOCOOOOl 

PDOCOOOOI 

PDOC00004 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00005 

PDOC00006 

PDOC00006 

PDOC00006 

POOC00008 

PDOC00008 

PDOC00008 

PDOCOOOOB 

PDOC00008 

POOC00008 

PDOC00008 

PDOC00008 

PDOC00008 

POOC00008 

PDOC00008 

PDOC00008 

PDOC00169 


(No Pfam data available for DKFZphtes3_4ol9.2) 
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DKF2phtes3_50j4 

group: testes derived 

DKF2phtes3_50j4 encodes a novel 187 amino acid protein proline rich protein. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 
genes. 

unknown, prolin ritch protein 

complete cDNA, complete cds, EST hits 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 1186 bp 

Poly A stretch at pos. 1176, polyadenylation signal at pos. 1126 

1 CACTGGGCGT CTGAAGCTCA GAGCTCACCC CTGAGATGGG CTCTCCTAGG 
51 CCTCCTGGGA TGAGGGAGCC ACCAGGACCC AGTGCTGTGA TGCCTGCTCT 

101 TCCCTCTACC AGCACCTGCC CGCCCAGAGA CCAGGGCACC CCTGAAGTCC 

151 AGCCCACCCC TGCAAAGGAC ACATGGAAGG GCAAGCGGCC TCGATCCCAG 

201 CAGGAGAACC CAGAGAGCCA GCCTCAGAAG AGGCCACGCC CCTCAGCCAA 

251 GCCCTCCGTC GTAGCTGAGG TCAAGGGCAG CGTCTCGGCC AGCGAACAGG 

301 GCACCTTGAA TCCCACGGCT CAAGACCCCT TCCAGCTCTC CGCrCCTGGC 

351 GTCTCCTTGA AGGAGGCTGC AAATGTTGTG GTCAAGTGCC TCACCCCTTT 

401 CTACAAGGAG GGCAAGTTTG CTTCCAAGGA GTTGTTTAAA GGCTTTGCCC 

451 GCCACCTCTC ACACTTGCTG ACTCAGAAGA CCTCTCCTGG AAGGAGCGTG 

501 AAAGAAGAGG CCCAGAACCT CATCAGGCAC TTCTTCCATG GCCGGGCCCG 

551 GTGCGAGAGC GAAGCTGACT GGCATGGCCT GTGTGGCCCC CAGAGATGAC 

601 CAACTGCTGG CTGGGGAGGG CCCGCGTCCT CCCCCAGATT CTAGCATGGG 

651 TCATCCTGGG CCTCACCTGC TGATGCCAGG GCCATCGTCT TTTCTCAGTC 

701 CTTCTCCTTT CCAACCATAC TTGGCTTTGG GGATGACCCC AGACACCCCC 

751 TGAATCCAGG TCAGAGGTCA GCCCACCTTT CTTTCTGCTT GCAAAGCCTA 

801 TAGACCCTTC TCAGAGCGGT CCTCATGGCT GGGTTTTCTG GGACACATGT 

851 CGAGGACAGA AGGTGGAGGG TGGTGGAGCT GCTGCTGGAA GAAGGGGAAG 

901 GAAGAGTGGC CCCTCCCCGA GTTCTAAGTC AGGATGAGGC CCACCTGTCC 

951 AAGGTATCGG AACCTACCCA GGGGACCCTC AGATCCTCCA CCCACTCCCC 
1001 CATCCATTAC GATGCCAGCT TCCAGCCTTG CCCAGGTCAG AGCTGTGGCA 
1051 GAGGAGAGGC AGCCAGGCCC TGTTCCTGCT CAGCTCCTGC TCAGGAAGGC 
1101 CAGGCCTGAC AGATGTTTGG GAGAGGAATA AAGTTGTGTT GTTGTGGGGC 
1151 ATGCAGGCGT GCACACAGCC CTTTTCAAAA AAAAAA 

BLAST Results 

No BLAST result 

Medline entries 

No Medline entry 

Peptide information for frame 3 


ORF from 36 bp to 596 bp; peptide length: 187 
Category: putative protein 


1 MGSPRPPGMR EPPGPSAVMP ALPSTSTCPP RDQGTPEVQP TPAKDTWKGK 

51 RPRSQQENPE SQPQKRPRPS AKPSVVAEVK GSVSASEQGT LNPTAQDPFQ 

101 LSAPGVSLKE AANVWKCLT PFYKEGKFAS KELFKGFARH LSHLLTQKTS 

151 PGRSVKEEAQ NLIRHFFHGR ARCESEADWH GLCGPQR 

BLASTP hits 

Entry MMU92455_1 from database TREMBL: 
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product: "WW domain binding protein 7"; Mus musculus WW domain binding 
protein 7 mRNA, partial cds . 

Score = 134, P = 6,9e-08, identities = 45/125, positives = 56/125 


Alert BLASTP hits for DKFZphtes3_50j4, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_50 j4, frame 3 


Report for DKFZphtes3_50j4 . 3 


I LENGTH! 

[MWJ 

Ipl] 

(PROSITEl 

(PROSITE) 

[PROSITEl 

[PROSITE] 

[KW) 

[KW] 


187 

20353.06 
9.76 

MYRISTYL 1 

AMI DAT I ON 1 

CK2_PH0SPH0_SITE 

PKC_PHOSPHO_SITE 

AllAlpha 

LOW COMPLEXITY 


8.56 % 


SEQ 
SEG 
PRO 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 


MGSPRPPGMREPPGPSAVMPALPSTSTCPPRDQGTPEVQPTPAKDTWKGKRPRSQQENPE 

xxxxxxxxxxxxxxxx 

cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SQPQKRPRPSAKPSWAEVKGSVSASEQGTLNPTAQDPFQLSAPGVSLKEAANVVVKCLT 

cccccccccccccchhhhhccccccccccccccccccccccccccccchhhhhhheeecc 

PFYKEGKFASKELFKGFARHLSHLLTQKTSPGRSVKEEAQNLIRHFFHGRARCESEADWH 

cccccccchhhhhhhhhhhhhhhhheeecccccchhhhhhhhhhhhhhccchhhhhhhhh 

GLCGPQR 


ccccccc 
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PS00005 

3->6 

PKC PHOSPHO 

SITE 

PDOC00005 

PS00005 

46->49 

PKC PHOSPHO" 

"site 

PDOC00005 

PS00005 

70->73 

PKC PHOSPHO" 

'site 

PDOC00005 

PS00005 

107->110 

PKC PHOSPHO' 

"site 

PDOC00005 

PSO0OD5 

146->149 

PKC PHOSPHO" 

'site 

PDOC00005 

PS00005 

154->157 

PKC PHOSPHO' 

'site 

PDOC00005 

PS00006 

54->58 

CK2 PHOSPHO" 

"site 

PDOC00006 

PS00006 

84->88 

CK2 PHOSPHO" 

'site 

PDOC00006 

PS00006 

94->98 

CK2 PHOSPHO" 

"site 

PDOC00006 

PS00006 

107->111 

CK2 PHOSPHO' 

'site 

PDOC00006 

PS00006 

154->158 

CK2~PH0SPH0' 

'site 

PDOC00006 

PS00006 

175->179 

CK2 PHOSPHO" 

"site 

PEXX:00006 

PS00008 

81->87 

MYRISTYL 


PDOC00008 

PS00009 

48->52 

AMI DAT I ON 


PDOC00009 


(No Pfam data available for DKFZphtes3_50j4 . 3) 
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DKFZphtes3_50n06 

group: testes derived 

DKFZphtes3_5On06 encodes a novel 186 amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 
genes. 

unknown 

complete cDNA, complete cds, EST hits 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 1095 bp 

Poly A stretch at pos. 1085, polyadenylation signal at pos. 1061 

1 CAAGACCCTC GGAGCCAAGA AACAACACTG AGTTCCAGAT TTCGGAAGGT 
51 TCACGAGTGT TGCCGACACG CCCTCCCAAC TGCAGACATC CTCCCTGGAG 

101 GACCTGCTGT GCTCACATGC CCCCCTGTCC AGCGAGGACG ACACCTCCCC 

151 GGGCTGTGCA GCCCCCTCCC AGGCACCCTT CAAGGCCTTC CTCAGTCCCC 

201 CAGAGCCACA TAGCCACCGA GGCACCGACA GGAAGCTGTC CCCGCTCCTG 

251 AGCCCCTTGC AAGACTCACT GGTGGACAAG ACCCTGCTGG AGCCCAGGGA 

301 GATGGTCCGG CCTAAGAAGG TGTGTTTCTC GGAGAGCAGC CTGCCCACCG 

351 GGGACAGGAC CAGGAGGACC TACTACCTCA ATGAGATCCA GAGCTTCGCG 

401 GGCGCCGAGA AGGACGCGCG CGTGGTGGGC GAGATCGCCT TCCAGCTGGA 

451 CCGCCGCATC CTGGCCTACG TGTTCCCGGG CGTGACGCGG CTCTACGGCT 

501 TCACGGTGGC CAACATCCCC GAGAAGATCG AGCAGACCTC CACCAAGTCT 

551 CTGGACGGCT CCGTGGACGA GAGGAAGCTG CGCGAGCTGA CGCAGCGCTA 

601 CCTGGCCCTG AGCGCGCGCC TGGAGAAGCT GGGCTACAGC CGCGACGTGC 

651 ACCCGGCGTT CAGCGAGTTC CTCATCAACA CCTACGGAAT CCTGAAGCAG 

701 CGGCCCGACC TGCGCGCCAA CCCCCTGCAC AGCAGCCCGG CCGCGCTGCG 

751 CAAGCTGGTC ATCGACGTGG TGCCCCCCAA GTTCCTGGGC GACTCGCTGC 

801 TGCTGCTCAA CTGCCTGTGC GAGCTCTCCA AGGAGGACGG CAAGCCCCTC 

851 TTCGCCTGGT GAGCCGCCCC GCGCCCGCCG CCTTGCCTGC AGTAAACGCG 

901 TTTGTTCCAA CCCGGGGCCG CGGTGCCTCC TGCGCGTCCC CCCGGAGGGG 

951 AAAGGGCCGC GTCCCCCGCG CGCGAGGCCA GAGAAGGCCC CGCTCCCACC 
1001 GGTGCTGGGC CCCGACCGCA GCCCGCCGCT GCCCGCACCT GCGGAGTGCT 
1051 TCTCACCCCT CATTAAAATC ATCCGTTTGC TTGTCAAAAA AAAAA 

BLAST Results 

No BLAST result 

Medline entries 

No Medline entry 


Peptide information for frame 2 


ORF from 302 bp to 859 bp; peptide length: 186 
Category: putative protein 
Classification: no clue 


1 MVRPKKVCFS ESSLPTGDRT RRSYYLNEIQ SFAGAEKDAR VVGEIAFQLD 
51 RRILAYVFPG VTRLYGFTVA NIPEKIEQTS TKSLDGSVDE RKLRELTQRY 
101 LALSARLEKL GYSRDVHPAF SEFLINTYGI LKQRPDLRAN PLHSSPAALR 
151 KLVIDWPPK FLGDSLLLLN CLCELSKEDG KPLFAW 

BLASTP hits 

NO BLASTP hits available 
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No Alert BLAST? hits found 


Alert BLASTP hits for DKFZphtes3_50n06, frame 2 
ISTP hits found 
Pedant information for DKFZphtes3__50n06, frame 2 


Report for DKF2phtes3_50n06.2 

{LENGTH! 186 

[m] 21049.39 

Ipl] 9.28 

[KW) ' All_Alpha 

[KW] LOW_C0MPLEXITY 5 . 38 % 

SEQ MVRPKKVCFSESSLPTGDRTRRSYYLNEIQSFAGAEKDARWGEIAFQLDRRILAYVFPG 

SEG 

PRD ccccceeeccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccc 

SEQ VTRLYGFTVANI PEKI EQTSTKSLDGSVDERKLRELTQRYLALSARLEKLGYSRDVHPAF 

SEG 

PRD ceeeeeeeeeeccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhccccccccch 

SEQ SEFLINTYGILKQRPDLRANPLHSSPAALRKLVIDWPPKFLGDSLLLLNCLCELSKEDG 

SEG xxxxxxxxxx 

PRD hhhhhhcceeecccccccccccccchhhhhhhhhhccccccccchhhhhhhhhhhhcccc 

SEQ KPLFAW 

SEG 

PRD CCCCCC 

(No Prosite data available for DKFZphtes3_50n06.2) 
{No Pfam data available for DKFZphtes3_50n06.2) 
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group: testes derived 

DKF2phtes3_50n23 encodes a novel 499 amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 
genes. 


unknown 
2 EST hits 

(from other testis librarys) testis specific cDNA? 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 1907 bp 

Poly A stretch at pos. 1897, polyadenylation signal at pos. 1872 


1 GGGCACCAGC CACTTTCCAC CATGACTGTG CGCTCGAGGG TCGCAGATGT 
51 GTTCGGCAGC AAGGACACTG AGAGCCTTGA GCCTGTGCTT TTACCCTTAG 
101 TAGATCGCAG GTTTCCTAAG AAATGGGAAA GACCGGTGGC AGAAAGCTTA 
151 GGCCACAAAG ACAAAGACCA GGAGGACTAC TTCCAGAAGG GAGGACTCCA 
201 AATTAAGTTC CACTGTAGCA AGCAGCTGTC TCTAGAGAGC TCCAGGCAGG 
251 TGACCTCTGA GAGCCAAGAG GAGCCCTGGG AGGAGGAATT CGGCCGGGAG 
301 ATGCGGAGGC AGCTGTGGCT GGAGGAGGAG GAGATGTGGC AGCAGCGGCA 
351 GAAGAAGTGG GCCCTGCTGG AGCAGGAGCA TCAGGAGAAG CTGCGGCAGT 
401 GGAATCTGGA AGACCTGGCC AGGGAGCAAC AGCGGAGATG GGTCCAGCTA 
451 GAAAAGGAGC AGGAGAGCCC ACGGAGAGAG CCAGAGCAGC TAGGGGAGGA 
501 TGTGGAGAGG AGGATCTTCA CACCCACCAG TCGATGGAGG GACTTGGAGA 
551 AGGCAGAGCT ATCATTAGTG CCTGCCCCAA GCCGGACCCA ATCTGCTCAC 
601 CAAAGCAGGA GGCCACACTT GCCCATGTCT CCTAGTACCC AGCAGCCTGC 
651 CCTGGGAAAG CAGAGACCTA TGAGTTCAGT GGAGTTTACC TACAGACCAC 
701 GGACCCGCCG AGTTCCCACA AAGCCCAAGA AATCTGCCTC CTTTCCTGTC 
751 ACTGGGACAT CCATCCGAAG GCTGACCTGG CCCTCTTTGC AGATATCCCC 
801 TGCAAATATT AAGAAGAAGG TGTACCACAT GGACATGGAG GCCCAGAGGA 
851 AGAACCTGCA GCTCCTGAGT GAGGAGTCTG AGTTGAGGCT GCCCCACTAC 
901 CTGCGCAGCA AAGCACTGGA GCTCACCACC ACCACCATGG AGCTGGGCGC 
951 GCTCAGGCTG CAGTACCTGT GCCATAAGTA CATCTTCTAT AGACGCCTCC 
1001 AGAGCCTCCG GCAAGAAGCG ATCAACCATG TACAAATCAT GAAAGAAACG 
1051 GAGGCTTCCT ACAAGGCCCA GAACCTCTAC ATCTTCCTGG AAAACATTGA 
1101 CCGCCTGCAG AGTCTCAGGC TGCAGGCCTG GACGGACAAG CAGAAGGGGC 
1151 TGGAGGAGAA GCACCGAGAG TGCCTGAGCA GCATGGTGAC CATGTTCCCC 
1201 AAGCTCCAGC TGGAGTGGAA CGTTCACCTG AACATCCCTG AGGTCACCTC 
1251 GCCAAAGCCA AAGAAATGCA AGTTGCCTGC AGCCTCACCC CGGCACATCC 
1301 GCCCCAGTGG CCCCACCTAC AAGCAGCCCT TTCTGTCTAG GCACCGGGCA 
1351 TGTGTGCCCC TGCAGATGGC CCGCCAACAG GGGAAGCAGA TGGAGGCTGT 
1401 CTGGAAGACC GAGGTGGCCT CCTCCAGTTA CGCAATAGAA AAAAAGACCC 
1451 CTGCCAGCCT TCCCCGGGAC CAGCTGAGGG GACACCCAGA TATTCCCCGG 
1501 CTGTTGACAC TGGACGTGTA GTCCTCCTGC CACAAAAGCC TGAACTTCCT 
1551 GAAGGCCCAG TAAGCGCCTC AGCGAACCAA AGGAAGGAAT GCCAGGTACC 
1601 TACAAATGAA TCCGCTTAGC TTGTTCAAAA AAAGTCAAGC GAGTCACTCC 
1651 CTGGAACCCA AATAAGCCAG AAGGATCAAG ACAGCCCCAG TCTCCACTGC 
1701 ATCCCTCAGC CAGTGATTCT CAACCTTCTG AGGGACGGAA ACCCACAGAG 
1751 AACTTGGTCA AAATGCAGGT TCCCAGCTGG TGCTTTTAAA GAAACCCTCT 
1801 GGGGGTTGCT GAGTACTCCT AGAACTTTGA GAAACACTGC TTCCCTCCTG 
1851 CAGTCCCCAA ACTCTACATT TTAATAAAAT AGAGGTTGGT TTATTTTAAA 
1901 AAAAAAA 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 
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Peptide information for frame 1 


ORF from 22 bp to 1518 bp; peptide length: 499 
Category: similarity to known protein 
Classification: no clue 


1 MTVRSRVADV FGSKDTESLE PVLLPLVDRR FPKKWERPVA ESLGHKDKDQ 

51 EDYFQKGGLQ IKFHCSKQLS LESSRQVTSE SQEEPWEEEF GREMRRQLWL 

101 EEEEMWQQRQ KKWALLEQEH QEKLRQWNLE DLAREQQRRW VQLEKEQESP 

151 RREPEQLGED VERRIFTPTS RWRDLEKAEL SLVPAPSRTQ SAHQSRRPHL 

201 PMSPSTQQPA LGKQRPMSSV EFTYRPRTRR VPTKPKKSAS FPVTGTSIRR 

251 LTWPSLQISP ANIKKKVYHM DMEAQRKNLQ LLSEESELRL PHYLRSKALE 

301 LTTTTMELGA LRLQYLCHKY IFYRRLQSLR QEAINHVQIM KETEASYKAQ 

351 NLYIFLENID RLQSLRLQAW TDKQKGLEEK HRECLSSMVT MFPKLQLEWN 

401 VHLNIPEVTS PKPKKCKLPA ASPRHIRPSG PTYKQPFLSR HRACVPLQMA 

451 RQQGKQMEAV WKTEVASSSY AIEKKTPASL PRDQLRGHPD IPRLLTLDV 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_50n23, frame 1 

PIR:S28589 trichohyaiin - rabbit, N = 1, Score = 134, P - 5.3e-Q5 

TREMBLNEW:AF132479_1 product: "Ese2L protein"; Mus musculus Ese2L 

protein mRNA, complete cds,, N 1, Score « 130, P = 0.00017 


>PIR:S28589 trichohyaiin - rabbit 
Length = 1,407 

HSPs: 

Score = 134 (20.1 bits). Expect = 5.3e-05, P = 5.3e-05 
Identities = 88/354 (24%), Positives = 154/354 (43%) 

Query: 29 RRFPKKWERPVAESLGHKDKDQEDYFQKGGLQIK-FHCSKQLSLESSRQVTSESQEEPWE 87 

R+T K +R + L + +-*-E ++ G + F +QL +++ E +EE + 

Sbjct: 165 RQYRDKEQRLQRQELEERRAEEEQLRRRKGRDAEEFIEEEQLRRREQQELKRELREEEQQ 224 

Query: 88 EEFGREMRRQLWLEEEEMWQQRQKKWALLBQEHQEKLRQWNLEDLAREQQRRWVQLEKEQ 147 

RE + L+EEE RQ++W E Q++LR+ LE++ RE+++R Q E+ + 
Sbjct: 225 RRERREQHERA-LOEEEEQLLRQRRWRE-EPREQQQLRR-ELEEI-REREQRLEQEERRE 280 

Query: 148 ESPRREPEQLGEDVERRIFTPTSRWRDLEKAELSLVPAPSRTQSAHQSRRPHLPMSPSTQ 207 

+ RRE ++L E ERR ++ + EL RQQR + + 

Sbjct: 281 QQLRRE-QRL-EQEERREQQLRRELEEIREREQRLEQEERREQRLEQEERREQQLKRELE 338 

Query: 208 QPALGKQRPMSSVEFTYRPRTRRVPTKPKKSASFPVTGTSIRRLTWPSLQISPANIKK-K 266 

+ +QR +E RR + ++++A GS+RW SA++K 

Sbjct: 339 EIREREQR LEQEER-REQLLAEEVREQAR—ERGESLTR-RWQRQLESEAGARQSK 390 

Query: 267 VYHMDMEAQRKNLQLLSEESELRLPHYLRSKALELTTTTM ELGALRLQYLCHKY 320 

VY +R+ QL++ER R+LE E RQL + 

Sbjct: 391 VYS RPRRQEEQSLRQDQERR-QRQERERELEEQARRQQQWQAEEESERRRQRLSARP 446 

Query: 321 IFYRRLQSLRQEAINHVQIMKETEASYKAQNLYI-FLENIDRLQSL-RLQAWTDKQKGLE 378 

RQ +E Q+EE ++ + FLE ++LQ R Q ++ E 

Sbjct: 447 SLRER-QLRAEERQEQEQRFREEEEQRRERRQELQFLEEEEQLQRRERAQQLQEEDSFOE 505 

Query: 379 EKHR 382 
++ R 

Sbjct: 506 DRER 509 


Score - 119 
Identities = 


(17.9 bits). Expect = 2.2e-03, P = 2.2e-03 
79/357 (22%), Positives «= 150/357 (42%) 


Query: 33 KKWERPVAESLGHKDKDQEDYFQKGGLQIKFHCSKQLSLESSRQVTSESQEEPWEEEFGR 92 

++ E+ + + K +++E Q+ + + +Q R+ + + + EE+F + 

Sbjct: 990 RREEQELRQERDRKFREEEQLLQE— -REEERLRRQERDRKFREEERQLRRQELEEQFRQ 1046 

Query: 93 EMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQWNLEDLAREQQRRWVQLEKEQESPRR 152 

E R+ LEE+ + Q++++K L QE K R+ E+ R +Q R QL +E++ R 
Sbjct: 1047 ERDRKFRLEEQ-IRQEKEEK-QLRRQERDRKFRE EEQQRRRQEREQQLRRERDRKFR 1101 

Query: 153 EPEQLGEDVERRIFTPTSRWRDLEKAELSLVPAPSRTQSAHQSR—RPHLPMSPSTQQPA 210 
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E EQL ++E RRL + EL+ + +RR + +++ 

Sb j ct : 1 102 EEEQLLQEREEERLRRQERARKLREEE-QLLRREEQLLRQERDRKFREEEQLLQESEEER 1160 

Query: 211 LGKQ RPMSSVEFTYRPRTRRVPTKPKKSASFPVTGTSIRRLTWPSLQISPANIKKKV 267 

L+QR+E+R + +++ +R+ Q ++++ 

Sbjct: 1161 LRRQERERKLREEEQLLQEREEERLRRQERARKLREEEQLLRQEEQELRQERARKLREEE 1220 

Query: 268 YHMDMEAO RKNLQLLS-EESELRLPHYLRSKALELTTTTMELGALRLQYL 316 

+ E Q R+ QLL EE ELR + + E E LR Q 

Sbjct: 1221 QLLRQEEQELRQERDRKFREEEQLLRREEQELRRERDRKFREEEQLLQEREEERLRRQER 1280 

Query: 317 CHKYIFYRRLQSLRQEAINHVQIMKETEASYKAQNLYIFLENIDRLQ-SLRLQAWTDKQK 375 

K + L E ++ +E + y+A+ + E RL+ LR + +++ 

Sbjct: 1281 ARK— LREEEEQLLFEEQEEQRLRQERDRRYRAEEQFAREEKSRRLERELRQEEEQRRRR 1338 

Query: 376 GLEEKHRE 383 
E K RE 

Sbjct: 1339 ERERKFRE 1346 

Score =» 109 (16.4 bits). Expect 1.9e-01, P ^ 1.7e-01 
Identities = 37/113 (32%), Positives = 60/113 (53%) 

Query: 67 KQLSLESSRQVTSESQ— EEPWEEEFGREMRRQLWLEEEEMWQORQKKWALLEQEHQEKL 124 

+QL E R+ E Q +E EE R+ R + EEE++ Q+R+++ L QE + KL 
Sbjct: 764 QQLRRERDRKFREEEQLLQEREEERLRRQERERKLREEEQLLQEREEE-RLRRQERERKL 822 

Query: 125 RQWNLEDLAREQQRRWVQLEKBQESPRREPEQLGEDVERRIFTPTSRWRDLEKAE 179 
R+ EL +E++ -^+ +E+E RE EQL E+ + R R L + E 

Sbjct: 823 REE — EQLLQEREEERLR-RQERERKLREEEQLLRQEEQEL — RQERARKLREEE 872 

Score = 107 (16.1 bits). Expect = 3.0e-01, P ^ 2.6e-01 
Identities = 35/109 (32%), Positives = 61/109 (55%) 


Query: 71 LESSRQVTSESQEEPWE-EEFGREMRRQL WLEEEEMWQQRQKKWALLEQEHQEKLRQ 126 

L Q+ ES+EE +E +++RR+ + EEE++ Q+R+++ L QE + KLR+ 
Sbjct: 742 LREEEQLLQESEEERLRRQEREQQLRRERDRKFREEEQLLQEREEE-RLRRQERERKLRE 800 

Query: 127 WNLEDLAREQQRRWVQLEKEQESPRREPEQLGEDVERRIFTPTSRWRDLEKAE 179 

E L +E++ ++ +E+E RE EQL ++ E R R L + E 

Sbjct: 801 E— EQLLQEREEERLR-RQERERKLREEEQLLQERESERLRRQERERKLREEE 850 

Score = 104 (15.6 bits), Expect = 9.4e-02, P = 9.0e-02 
Identities = 84/339 (24%), Positives = 149/339 (43%) 

Query: 67 KQLSLESSRQVTSESQEEPWEEEFGREMRRQL-WLEEEEhWQQRQKKWALLEQE— HQEK 123 

+QL E ++ +EE EE RE R++L +LEEEE Q+R++ L E++ +++ 
Sbjct: 451 RQLRAEERQEQEQRFREE EEQRRERRQELQFLEEEEQLQRRERAQQLQEEDSFQEDR 507 

Query: 124 LRQWNLEDLAREQQRRWVQLEKEQESPRR EP EQLGEDVE-RRIFTPTSRWRDL 175 

R+ ++ Q RW QL++E + R +P EQL E+ E +R R R+ 

SbjCt: 508 ERRRRQQEQRPGQTWRW-QLQEEAQRRRHTLYAKPGQQEQLREEEELQREKRRQEREREY 566 

Query: 176 EKAELSLVPAPSRTQSAHQSRRPHLPMSPSTQQPALGKQRPMSSVEFTYRPRT RRV 231 

+ EL ++R+ + Q+L+R+E+R RR 

Sbjct: 567 REEE-KLQREEDEKRRRQERERQYRELEELRQEEQL-RDRKLREEEQLLQEREEERLRRQ 624 

Query: 232 PTKPK— KSASFPVTGTSIRRLTWPSLQISPANIKKKVYHMDMEAQRK— -NLQLLSEE 285 

+ K + +R-f- L+ ++++ + E +RK QLL G 

Sbjct: 625 ERERKLREEEQLLRQEEQELRQERERKLREEEQLLRREEQELRQERERKLREEEQLLQER 684 

Query: 286 SELRLPHYLRSKALE LTTTTMELGALRLQYLCHKYIFYRRL-QSLRQEAINHV— 337 

E RL R++ L L ELR+L+ RRQ LRQE + 

Sb j c t : 685 EEERLRRQERARKLREEEQLLRQEEQELRQERERKLREEEQLLRREEQLLRQERDRKLRE 744 

Query: 338 — QIMKETEASYKAQNLYIFLENIDRLQSLRLQAWTDKQKGLEEKHRECL 385 

Q+++E+E + E +L+ R + + ++++ L+E+ E L 

Sbjct: 745 EEQLLQESEEERLRRQ EREQQLRRERDRKFREEEQLLQEREEERL 789 

Score = 103 (15.5 bits), Expect = 1.2e-01, P = l.le-01 
Identities = 42/152 (27%), Positives = 74/152 (48%) 

Query: 36 ERPVAESLGHKDKDQEDYFQKGGLQIKFHCSKQLSLESSRQVTSESQEEPWEEEFG-REM 94 

ER + K +++E ++ +++ +++L E + + E QE E + RE 
Sbjct: 835 ERLRRQERERKLREEEQLLRQEEQELRQERARKLR-EEEQLLRQEEQBLRQERDRKLREE 893 

Query: 95 RRQLWLEEEEMWQQRQKKWA LLEQEHQEKLRQWNLEDLAREQQ-— RRWVQ-LEKE 146 

+ L EE+E+ Q+R +K LL++ +E+LR+ E RE++ RR Q L +E 

Sbjct: 894 EQLLRQEEQELRQERDRKLREEEQLLQESEEERLRRQERERKLREEEQLLRREEQELRRE 953 

Query: 147 QESPRREPEQLGEDVERRIFTPTSRWRDLEKAE 179 
+ RE EQL ++ E R R L + E 
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Sbjct: 954 RARKLREEEQLLQEREEERLRRQERARKLREEE 986 

Score = 103 {15.5 bits). Expect = 7.8e-01, P = 5.4e-01 
Identities - 31/91 (34%), Positives « 52/91 (57%) 

Query: 67 KQLSLESSRQVTSESQEEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQ 126 

++L E R++ E Q EE+ R+ R + EEE++ Q+R+++ L QE KLR+ 
Sbjct: 642 QELRQERERKLREEEQLLRREEQELRQERERKLREEEQLLQEREEE-RLRRQERARKLRE 700 

Query: 127 WNLEDLAREQQRRWVQLEKEQESPRREPEQL 157 
E L R++++ +L +E+E RE EQL 

Sbjct: 701 E— EQLLRQEEQ ELRQERERKLREEEQL 726 


Score = 101 (15.2 bits). Expect = 2.0e-01, P = 1.8e-01 
Identities = 38/111 (34%), Positives * 57/111 (51%) 


Query: 72 ESSRQVTSESQEEPWEE-EFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQSfNLE 130 

E R++ E Q EE E RE R+L EEE++ Q+R+++ L QE KLR+ + 
Sbjct: 931 ERERKLREEEQLLRREEQELRRERARKL-REEEQLLQEREEE-RLRRQERARKLREEE-Q 987 

Query: 131 DLAREQQRRWVQLEKEQESPRREPEQLGEDVERRIFTPTSRWRDLEKAELSL 182 

L RE+Q +L +E++ RE EQL ++ E R R + E L 

Sbjct: 988 LLRREEQ ELRQERDRKFREEEQLLQEREEERLRRQERDRKFREEERQL 1035 

Score = 101 (15.2 bits). Expect = 1.3e+00, P = 7.2e-01 
Identities « 33/108 (30%), Positives = 56/108 (51%) 

Query: 72 ESSRQVTSESQEEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQWNLED 131 

E R++ E Q EE+ R+ R + EEE++ +Q +++ L QE KLR+ E 
Sbjct: 841 ERERKLREEEQLLRQEEQELRQERARKLREEEQLLRQEEQE LRQERDRKLREE~EQ 895 

Query: 132 LAREQQRRWVQLEKEQESPRREPEQLGEDVERRIFTPTSRWRDLEKAE 179 

L R++++ +L +E++ RE EQL ++ E R R L + E 

Sbjct: 896 LLRQEEQ ELRQERDRKLREEEQLLQESEEERLRRQERERKLREEE 940 

Score = 99 (14.9 bits). Expect « 2.0e+00, P = 8.7e-01 
Identities = 32/97 (32%), Positives = 50/97 (51%) 

Query: 72 ESSRQVTSESQEEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQWNLED 131 

E R+ E Q EE E R L EEE Q +++ L QE + KLR+ E 
Sbjct i 578 EKRRRQERERQYRELEELRQEEQLRDRKLREEEQLLQEREEERLRRQERERKLREE— EQ 635 

Query: 132 LAREQ QRRWQLEKEQES PRREPEQLGEDVERRI 165 

L R++ Q R +L +E++ RRE ++L ER++ 

Sbjct: 636 LLRQEEQELRQERERKLREEEQLLRREEQELRQERERKL 674 

Score - 99 (14.9 bits). Expect « 2.0e+00, P = 8.7e-01 
Identities = 34/111 (30%), Positives = 58/111 (52%) 

Query: 67 KQLSLESSRQVTSESQ--EEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKL 124 

++L E R++ E Q +E EE R+ R + EEE-H- +Q +++ L QE + KL 
Sbjct: 664 QELRQERERKLREEEQLLQEREEERLRRQERARKLREEEQLLRQEEQE LRQERERKL 720 

Query: 125 RQWNLEDLAREQQRRWVQLEKEQESPRREPEQLGEDVERRIFTPTSRWRDLEK 177 

R+ + L RE+Q L +E++ RE EQL ++ E R + L + 

Sbjct: 721 REEE-QLLRREEQL LRQERDRKLREEEQLLQESEEERLRRQEREQQLRR 768 

Score = 98 (14.7 bits). Expect = 2.6e+00, P = 9.2e-01 
Identities « 37/146 (25%), Positives = 77/146 (52%) 

Query: 20 EPVLLPLVDRRFPKKWERPVAESLGHKDKDQEDYFQKGGLQIKFHCSKQLSLESSRQVTS 79 

E LL ++ ++ ER + E 4 +E+ ++ K +QL + +++ 

Sb j ct : 655 EEQLLRREEQELRQERERKLREEEQLLQEREEERLRRQERARKLREEEQLLRQEEQELRQ 714 

Query: 80 ESQEEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQWNLED-LAREQQR 138 

E + + EEE + +RR+ L +E ++ +++ LL++ +E+LR+ E L RE+ R 
Sbjct: 715 ERERKLREEE— QLLRREEQLLRQERDRKLREEEQLLQESEEERLRRQEREQQLRRERDR 772 

Query: 139 RWVQLEKEQESPRREPEQLG-EDVERRI 165 

++ E+EQ RE E+L ++ ER++ 
Sbjct: 773 KF— REEEQLLQEREEERLRRQERERKL 798 

Score =97 (14.6 bits). Expect = 3.3e+00, P = 9.6e-01 
Identities = 38/129 (29%), Positives = 63/129 (48%) 

Query: 72 ESSRQVTSESQ— EEPWEEEFGREMRRQLWLEEEEMtfQQRQKKWALLEQEHQEKLRQWNL 129 

E R++ E Q +E EE R+ R + EEE++ +Q +++ L QE KLR+ 
Sbjct: 817 ERERKLREEEQLLQEREEERLRRQERERKLREEEQLLRQEEQE LRQERARKLREE— 871 

Query: 130 EDLAREQQRRWVQLEKEQESPRREPEQLGEDVERRIFTPTSRWRDLEKAELSLVPAPSRT 189 
E L R++++ +L +E++ RE EQL E+ + R R L -I- E L-l- 
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Sbjct: 872 EQLLRQEEQ— ELRQERDRKLREEEQLLRQEEQEL— RQERDRKLREEE-QLLQESEEE 925 

Query: 190 QSAHQSRRPHL 200 

+ OR L 
Sbjct: 925 RLRRQERERKL 936 

Score = 96 (14.4 bits). Expect = 4.1e+00, P = 9.8e-01 
Identities = 41/132 (31%), Positives = 69/132 (52%) 

Query: 46 KDKDQEDYFQKGGLQI-KFHCSKQLSLESSRQVTSESQEEPWEEEFGREMRRQLWLEEEE 104 

+++ QE F + Q+ + ++QL ESQ E + E+ G+ R QL +EE 
Sbjct: 473 RERRQELQFLEEEEQLQRRERAQQLQEEDSFQEDRERRRRQQEQRPGQTWRWQL~QEE 529 

Query: 105 MWQQRQKKWALLEQEHQEKLRQWNLEDLAREQQRRWVQLEKEQESPRREPEQLGEDVERR 164 

++R +A Q QE+LR+ E+L RE++R+ E+E+E E Q ED +RR 
Sbjct: 530 AQRRRHTLYAKPGQ— QEQLREE— EELQREKRRQ EREREYREEEKLQREEDEKRR 581 

Query: 165 IFTPTSRWRDLEK 177 
++R+LE+ 

Sbjct: 582 RQERERQYRELEE 594 

Score = 96 (14.4 bits). Expect « 4.1e'i-00, P « 9.8e-01 
Identities - 35/138 (25%), Positives = 76/138 (55%) 

Query: 28 DRRFPKKWERPVAESL-GHKDKDQEDYFQKGGLQIKFHCSKQLSLESSRQVTSESQEEPW 86 

+R++ + E ELK +++E Q+ + ++ L Q+ + ++E 

Sbjct: 586 ERQYRELEELRQEEQLRDRKLREEEQLLQEREEERLRRQERERKLREEEQLLRQEEQE-L 644 

Query: 87 EEEFGREMRRQLWL— EEEEMWQQRQKKWALLEQEHQEKLRQWNLEDLAREQQRRWVQL 143 

+E R++R + L EE+E+ Q+R++K L +E Q L++ E L R+++ R +L 
Sbjct: 645 RQERERKLREEEQLLRREEQELRQERERK LREEEQ-LLQEREEERLRRQERAR— KL 698 

Query: 144 EKEQESPRREPEQLGEDVERRI 165 

+E++ R+E ++L ++ ER++ 
Sbjct: 699 REEEQLLRQEEQELRQERERKL 720 

Score = 95 {14.3 bits). Expect =^ 5.2e+00, P = 9.9e-01 
Identities = 59/282 (20%), Positives = 121/282 (42%) 

Query: 20 EPVLLPLVDRRFPKKWERPVAESLGHKDKDQEDYFQKGGLQIKFHCSKQLSLESSRQVTS 79 

E LL ++ ++ ER + E + +E+ ++ K +QL + +++ 

Sbjct: 655 EEQLLRREBQELRQERERKLREEEQLLQEREEERLRRQERARKLREEEQLLRQEEQELRQ 714 

Query: 80 ESQEEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQWNLED-LAREQQR 138 

E ••■ + EEE + +RR+ L +E ++ +++ LL++ +E+LR+ E L RE+ R ' 
Sbjct: 715 ERERKLREEE— QLLRREEQLLRQERDRKLREEEQLLQESEEERLRRQEREQQLRRERDR 772 

Query: 139 RWVQLEKEQESPRREPEQLG-EDVERRIFTPTSRWRDLEKAELSLVPAPSRTQSAHQ— S 195 

++ E+EQ re E+L ++ ER++ ++ E+ L + + Q 

Sbjct: 773 KF— REEEQLLQEREEERLRRQERERKLREEEQLLQEREEERLRRQERERKLREEEQLLQ 830 

Query: 196 RRPHLPMSPSTQQPALGKQRPMSSVEFTYRPRTRRVPTKPKKSASFPVTGTSIRRLTWPS 255 

R + ++ L 4-+ + E R R ++ +R+ 

Sbjct: 831 EREEERLRRQERERKLREEEQLLRQE-EQELRQERARKLREEEQLLRQEEQELRQERDRK 889 

Query: 256 LQISPAtaKKKVYHMEMEAQRK NLQLLSEESELRLPHYLRSKAL 299 

L+ ++++ + E RK QLL E E RL R + L 

Sbjct: 890 LREEEQLLRQEEQELRQERDRKLREEEQLLQESEEERLRRQERERKL 936 

Score « 94 (14.1 bits). Expect » l.le-4-00, P « 6.8e~01 
Identities * 35/116 (30%), Positives = 59/116 (50%) 

Query: 72 ESSRQVTSESQEEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEK L 124 

E +R++ E Q EE+ R+ R + + EEE++ Q+R+++ L QE K L 
Sbjct: 977 ERARKLREEEQLLRREEQELRQERDRKFREEEQLLQEREEE-RLRRQERDRKFREEERQL 1035 

Query: 125 RQWNLEDLAREQQRRWVQLEKEQESPRREPEQLGEDVERRIFTPTSRWRDLEECAELSL 182 

R+ LE+ R+++ R -t-LE EQ +E +QL R F + R ++ E L 

Sbjct: 1036 RRQELEEQFRQERDRKFRLE-EQIRQEKEEKQLRRQERDRKFREEEQQRRRQEREQQL 1092 

Score =94 (14.1 bits). Expect = l,le+00, P * 6.Be-01 
Identities = 51/166 (30%), Positives = 76/166 (45%) 

Query: 67 KQLSLESSRQVTSESQ— EEPWEEEFGREMR-RQLWLEEEEMWQQRQKKWALLEQEHQEK 123 

++L E R+ E Q +E EE R+ R R+L EEE++ + Q++ L QE+ 
Sbjct: 1250 QELRRERDRKFREEEQLLQEREEERLRRQERARKLREEEEQLLFEEQEEQRL RQER 1305 

Query: 124 LRQWNLED-LAREQQRRWVQLEKEQESPRREPEQLGEDVERRIFTPTSRWRDLEKAELSL 182 

R++ E+ ARE++ R +LE+E R+E EQ R F R E+ B 

Sbjct: 1306 DRRYRAEEQFAREEKSR— RLEREL RQEEEQRRRRERERKFREEQLRRQQEE-EQRR 1359 


905 


wo 01/12659 


PCT/lBOO/01496 


Query: 183 VPAPSRTQSAHQSRRPHLPMSPSTQQPALGKQRPMSSVEFTYRPRTRRVP 232 

R QSRR L P T+Q A R E+ R++ p 

Sbjct: 1360 RQLRERQFREDQSRRQVL— EPGTRQFARVPVRSSPLYEYIQEQRSQYRP 1407 

Score « 93 (14.0 bits). Expect = 8.3e+00, P = l.Oe+00 
Identities 41/145 (28%), Positives • 72/145 (49%) 


Query: 
Sbjct: 

Query: 
Sbjct: 
Query: 
Sbjct: 


28 DRRFPKKWERPVAESLGHKDKDQEDYFQKGGLQIKFHCSKQLSLESSRQVTSESOEEPW- 86 
+RR ++ER+E ++Q+ + Q+ L R + QE+ + 

408 ERRQRQERERELEEQARRQQQWQAEEESERRRQ-RLSARPSLRERQLRAEERQEQEQRFR 466 

87 -EEEFGREMRRQL-WLEEEEMWQQRQKKWALLEQE— HQEKLRQWNLEDLAREQQRRWVQ 142 
EEE RE R++L +LEEEE Q+R++ L E++ +++ R+ f+ Q RW Q 
4 67 EEEEQRRERRQELQFLEEEEQLQRRERAQQLQEEDSFQEDRERRRRQQEQRPGQTWRW-Q 525 

143 LEKEQESPRR EP— EQLGEDVE 162 

L++E + R +P EQL E+ E 

526 LQEEAQRRRHTLYAKPGQQEQLREEEE 552 


Score = 91 (13.7 bits), Expect = 2.4e+00, P = 9.1e-01 
Identities = 38/110 (34%), Positives = 57/110 (51%) 

Query: 72 ESSRQVTSESQEEPWEE-EFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQWNL- 129 

E R++ E Q EE E re R+L EEE++ Q+R+++ L QE KLR+ 
Sbjct: 931 ERERKLREEEQLLRREEQELRRERARKL-REEEQLLQEREEE-RLRRQERARKLREEEQL 988 

Query: 130 EDLAREQQRRWVQLEKEQESPRREPEQLGEDVERRIFTPTSRWRDLEKAEL 180 

++L +E+ R++ E+EQ RE E+L R F R L + EL 

Sbjct: 989 LRREEQELRQERDRKF— REEEQLLQEREEERLRRQERDRKFREEER— QLRRQEL 1040 

Score = 89 (13.4 bits). Expect » 2.2e+00, P = 8.9e-01 
Identities = 35/138 (25%), Positives = 65/138 (47%) 


Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 


82 QEEPWEEEFGREMRRQLWLEEEEM--WQQRQKKWALLEQEHQEKLRQWNLEDLAREQQRR 139 
Q E++ E+R + + +E E WQ++++-H L E+E Q K R+ + +R+ + + 
111 QNRRQEDQRRFELRDRQFEDEPERRRWQKQEQERELAEEEEQRKKRERFEQHYSRQYRDK 170 

140 WVQLEKEQ-ESPRREPEQL GEDVERRIFTPTSRWRDLEKAELSLVPAPSRTQSAHQ 194 

+L++++ E RE EQL GDEF +RE+EL Q + 

171 EQRLQRQELEERRAEEEQLRRRKGRDAEE— FIEEEQLRRRE(3QELKR-ELREEEQQRRE 227 

195 SRRPHLPMSPSTQQPALGKQR 215 

R H ++ L ++R 

228 RREQHERALQEEEEQLLRQRR 248 


Score = 50 (7.5 bits). Expect = 2.2e+00, P = 8.9e-01 
Identities = 34/160 (21%), Positives = 67/160 (41%) 

Query: 325 RLQSLRQEAINHVQIMKETEASYKAQNLYIFLENIDRL-QSLRLQAHTOKQKGLEEKHRE 383 

R + R+E Q+ +E E + + LE +R Q LR + ++++ E++ R 

Sbjct: 245 RQRRWREEPREQQQLRRELEEIREREQR LEQEERREQQLRREQRLEQEERREQQLRR 301 

Query: 384 CLSSMVTMFPKLQLEWNVHLNIP-EVTSPKPKKCKLPAASPRHIRPSGPTYKQPFLSRHR 442 

L + +L+ E + E + K +L R R ++ L+ 

Sbjct: 302 ELEEIREREQRLEQEERREQRLEQEERREQQLKRELEEIREREQRLEQEERREQLLAEEV 361 

Query: 443 ACVPL<»1ARQQGKQMEAVWKTEVASSSYAIEKKTPASLPRDQ 484 

+ AR++G+ + W+ ++ S + A + K S PR Q 
Sbjct: 362 R EQARERGESLTRRWQRQLESEAGARQSKV-YSRPRRQ 398 

Score = 40 (6.0 bits). Expect 1.9e-01, P * 1.7e-01 
Identities = 32/115 (27%), Positives = 47/115 (40%) 

Query: 276 RKNLQLLSEESELRLPHYLRSKAL--ELTTTTMELGALRLOYLCHKYIFYRRL~QSLRQE 332 

R+ QLL E E RL R++ L E E LR Q K+ +L Q +E 

Sbjct: 959 REEEQLLQEREEERLRRQERARKLREEEQLLRREEQELR-QERDRKFREEEQLLQEREEE 1017 

Query: 333 AINHVQI— MKETEASYKAQNLYI-FLENIDRLQSLRLQAWTDKQ-KGLEEKHRE 383 

+ + +EE +QL F+DR LQ +K+ K L + R+ 

Sbjct: 1018 RLRRQERDRKFREEERQLRRQELEEQFRQERDRKFRLEEQIRQEKEEKQLRRQERD 1073 

Score » 37 (5.6 bits). Expect • l,6e+00, P = 7.9e-01 
Identities - 27/108 (25%), Positives * 43/108 (39%) 


Query: 27 6 RKHLQLLSEESELRLPHYLRSKAL ELTTTTMELGALRLQYLCHKYIFYRRLQSLRQE 332 

R+ QLL EERL R+L E E LRQ K R+LQE 
Sbjct: 775 REEEQLLQEREEERLRRQERERKLREEEQLLQEREEERLRRQERERRL REEEQLLQE 831 

Query: 333 AINHVQIMKETEASYKAQNLYIFLENIDRLQSLRLQAWTDKQKGLEEKHRE 383 

+E E + + + E L+ R + ++++ L ++ +E 
Sbjct: 832 REEERLRRQERERKLREEEQLLRQEE-QELRQERARKLREEEQLLRQEEQE 881 
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Pedant information for DKFZphtes3_50n23, frame 1 

V ~ " — — — — 

Report for DKFZphtes3_50n23. 1 

(LENGTH] 499 
[MWI 58885.69 
[pIJ 9.67 
(KWl All_Alpha 


IKW] LOW_COMPLEXITY 10.42 % 


SEQ MTVRSRVADVFGSKDTESLEPVLLPLVDRRFPKKWERPVAESLGHKDKDQEDYFQKGGLQ 
PRD ccccccceeecccccccccceeeccccccccccccchhhhhhhccccccc^ 

SEQ IKFHCSKQLSLESSRQVTSESQEEPWEEEFGREMEIRQLWLEEEEMWQQRQKKWALLEQEH 

f^^ xxxxxxxxxx . . xxxxxxxxxxxxxxxxxxx 

PRD eeeecchhhhhhccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ QEKLRQWNLEDLAREOQRRWVQLEKEQESPRREPEQLGEDVERRIFTPTSRWRDLEKAEL 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccccccceeecccc^ 

SEQ SLVPAPSRTQSAHQSRRPHLPMSPSTQQPALGKQRPMSSVEFTYRPRTRRVPTKPKKSAS 

Dpn H^^^^^^^am xxxxxxxxxxxxxxx . . . 

PRD '^ccccccchhhhhccccccccccccccccccccccccceeeeeeccccccccccccceee 

SEQ fPVTGTSIRRLTWPSLQISPANIKKKVYHMDMEAQRKNLQLLSEESELRLPHYLRSKALE 
**"'*'*****'"■*■••■••*••••*••-••••■•---• xxxxxxxx 

PRD «<^ccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhJA^ 

SEQ LTTTTMELGALRLQYLCHKYIFYRRLQSLRQEAINHVQIMKETEASYKAQNLYIFLENID 

SEG 

PRD i^h*^*^hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh^ 

SEQ RLQSLRLQAWTDKQKGLEEKHRECLSSMVTMFPKLQLEWNVHLNIPEVTSPKPKKCKLPA 

SEG 

PRD hhhhhhhhhhhhcchhhhhhhhhhhhhhhhccccchhhhhcccccccccccc^^ 

SEQ ASPRHIRPSGPTYKQPFLSRHRACVPLQMARQQGKQMEAVWKTEVASSSYAIEKKTPASL 

SEG , 

PRD ccccccccccccccchhhhhhccchhhhhhhhhcchhhhh 

SEQ PRDQLRGHPDIPRLLTLDV 

SEG 

PRD ccccccccccccccccccc 

(No Prosite data available for DKFZphtes3_50n23 . 1) 
(No Pfam data available for DKrzphtes3__50n23 . 1) 
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DKFZphtes3_6b21 


group: testes derived 

DKF2phtes3_6b21 encodes a novel 781 amino acid protein without similarity to human KIAA0256 
gene product. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes. 


similarity to KIAA0256 

complete cDNA, complete cds, EST hits 

Sequenced by BMF2 

Locus: /map="356.3 cR from top of Chr9 linkage group" 
Insert length: 3360 bp 

Poly A stretch at pos. 3314, polyadenylatlon signal at pos. 3300 


1 GGCAAGCCGA CGGCCCGCTG CTGGCCTCCG TGACGCGGCC TCCTCCGCGC 
51 CTCGCGGCAT GGCGTCGGAG GGGCCGCGGG AGCCCGAAAG CGAGGGCATC 
101 AAGTTATCAG CAGATGTCAA ACCATTTGTC CCCAGATTTG CCGGGCTCAA 
151 TGTGGCATGG TTAGAGTCCT CAGAAGCATG TGTCTTCCCC AGCTCTGCAG 
201 CCACATACTA TCCGTTTGTT CAGGAACCAC CAGTGACAGA AATGTTTACT 
251 CAGTGCCTGG CTCCCAGTAT CTTTATAACC AACCCAGTTG TTACCGAGGT 
301 TTTCAAACAG TGAAGCATCG AAATGAGAAC ACATGCCCTC TCCCACAAGA 
351 AATGAAAGCT CTGTTTAAGA AGAAAACCTA TGATGAGAAA AAAACGTATG 
401 ATCAGCAAAA GTTTGACAGT GAAAGGGCTG ATGGAACTAT ATCATCTGAG 
451 ATAAAATCAG CTAGAGGTTC ACATCATTTG TCCATTTACG CTGAGAATAG 
501 TTTGAAATCA GATGGTTACC ATAAGCGAAC AGACAGGAAA TCCAGAATCA 
551 TTGCAAAAAA TGTATCTACC TCCAAACCTG AGTTTGAATT TACCACACTG 
601 GACTTTCCTG AACTGCAAGG TGCAGAGAAC AATATGTCAG AGATACAGAA 
651 GCAACCCAAG TGGGGACCTG TCCACTCTGT CTCTACCGAC ATTTCTCTTC 
701 TAAGAGAAGT AGTAAAACCA GCTGCAGTGT TATCAAAGGG TGAAATAGTG 
751 GTGAAAAATA ACCCAAATGA ATCTGTAACT GCTAATGCCG CTACCAATTC 
801 TCCTTCATGT ACAAGAGAGT TATCTTGGAC ACCAATGGGT TATGTTGTTC 
851 GACAGACATT ATCTACAGAA CTGTCAGCAG CCCCTAAAAA TGTTACTTCT 
901 ATGATAAACT TAAAGACCAT TGCTTCATCA GCAGATCCTA AAAATGTTAG 
951 TATACCATCT TCTG/\AGCTT TATCTTCGGA TCCTTCCTAC AACAAAGAAA 
1001 AACACATTAT TCATCCTACC CAAAAGTCTA AAGCATCACA AGGTAGTGAC 
1051 CTTGAACAAA ATGAAGCCTC AAGAAAGAAT AAGAAAAAGA AAGAAAAATC 
1101 TACATCAAAA TATG7VAGTCC TGACAGTTCA AGAGCCTCCA AGGATTGAAG 
1151 ATGCCGAGGA ATTTCCCAAC CTGGCAGTTG CATCTGAAAG AAGAGACAGA 
1201 ATAGAGACAC CGAAATTTCA ATCTAAGCAG CAGCCACAGG ATAATTTTAA 
1251 AAATAATGTA AAGAAGAGCC AGCTTCCAGT GCAGTTGGAC TTGGGGGGCA 
1301 TGCTGACAGC CCTGGAGAAG AAGCAGCACT CTCAGCATGC AAAGCAGTCC 
1351 TCCAAACCAG TGGTAGTCTC AGTTGGAGCA GTGCCAGTCC TTTCCAAAGA 
1401 ATGTGCATCA GGGGAGAGAG GCCGCCGCAT GAGTCAAATG AAGACCCCGC 
1451 ACAATCCCTT GGACTCCAGC GCCCCACTGA TGAAGAAAGG GAAGCAGAGG 
1501 GAGATCCCCA AGGCCAAGAA GCCAACCTCA CTGAAGAAGA TTATTTTGAA 
1551 AGAACGGCAA GAGAGAAAGC AGCGTCTCCA AGAAAATGCT GTGAGTCCAG 
1601 CTTTTACCAG TGATGACACA CAAGATGGAG AGAGTGGTGG TGATGACCAG 
1651 TTTCCCGAGC AGGCAGAGCT GTCAGGGCCA GAGGGGATGG ACGAACTGAT 
1701 CTCCACTCCT TCGGTTGAGG ACAAGTCTGA AGAGCCACCA GGCACAGAGC 
1751 TCCAGAGGGA CACAGAGGCC TCCCACCTTG CTCCCAATCA CACCACCTTC 
1801 CCTAAGATCC ACAGCCGCAG ATTCAGGGAT TACTGCAGCC AGATGCTTAG 
1851 TAAAGAAGTG GATGCTTGTG TTACCGACCT ACTC/^GAA CTGGTCCGTT 
1901 TCCAAGACCG TATGTACCAG AAAGATCCAG TCAAGGCCAA GACTAAACGT 
1951 CGACTTGTGT TGGGGTTGAG GGAGGTTCTC AAACACCTGA AGCTCAAAAA 
2001 ACTGAAATGT GTCATTATTT CTCCCAACTG TGAGAAGATA CAGTCAAAAG 
2051 GTGGGCTGGA TGACACTTTG CACACAATTA TTGATTATGC CTGTGAGCAG 
2101 AACATTCCCT TTGTGTTTGC TCTCAACCGC AAAGCTCTGG GGCGCAGTTT 
2151 GAATAAGGCA GTTCCTGTCA GTGTGGTGGG GATCTTCAGC TATGATGGGG 
2201 CCCAGGATCA GTTCCACAAG ATGGTTGAGC TGACAGTGGC GGCCCGACAG 
2251 GCGTACAAGA CCATGCTGGA GAATGTGCAG CAGGAGCTGG TGGGAGAGCC 
2301 CAGGCCTCAG GCACCTCCCA GCCTACCCAC ACAGGGCCCC AGCTGCCCTG 
2351 CAGAAGATGG CCCCCCAGCC CTGAAAGAAA AAGAAGAGCC ACACTACATT 
2401 GAAATCTGGA AAAAACATCT GGAAGCATAC AGTGGATGTA CCCTGGAGCT 
2451 AG7VAGAATCC TTGGAGGCTT CAACCTCTCA AATGATGAAT TTGAATTTAT 
2501 GAGAGTTCTT GCCTGTGTGT CTGTATTTTG GGTAAGGAGG GGAGGTCTGA 
2551 AAAAGACTTT GGGGCTTTTT CTTCTGTTTT TCATGACAAT GTAATTTGTG 
2601 TAACTGTTGA ATCTGGAAAT TGATCAGCAT TAAAGGGCAC ATGAAGCAGT 
2651 GTCTGCAGGC GTTCAGTGCT GCGGAGCCTG TTAAAGGTCA CTCAGATGTG 
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2701 CAGGTGTTAA TCTTCTCTAA AAGCCTGGTT ATACAGCTCT GGCTTTCTGA 
2751 GCACACTACG GATCTGGAAA ATACTGGAAA ATGTGATACT TAGAATACTT 
2801 TGGCTGCTAA GGAAACTTCC TCTCCATTGC AGAATAGCTG AGCCAAGTGA 
2851 GTGAGTTTGC AGAAAGCAGG TGGTGAGCTC CTGCCTGCTG GAGGTTGCCA 
2901 TGGAGGGCCA TTCCTGCCCG GCAACAGCAC CGTCCTGCAG GGAGCCACTT 
2951 GGCAGAAGGG TGCAGGGCTG CTGGTGTCAG AGCAAGAGGG CTACAGGGAA 
3001 AGGGCCCTTT CTCAGGGGAT GTAGCTTTTT TAAAAGATTT GGGAACACTT 
3051 GGAGGATTTG CTAAAATGAG CCTCAGAAGG AAAATTGGTT TTCTAACCTG 
3101 TGACTTTTTG AAATGAATTA TTCCTTTCAG TCTTTATTTT TCAAAGAAAC 
3151 AATGTGTATT GAAGTACCTA GATTTGTTTG ATAATCAACA AATCTTTCCT 
3201 TTTTCAATGA ACATATTCTG AATGTGGTTT CTGTCTTAGA CCAGGAGGAC 
3251 AGAGTTTGCT TTCATATTTT CCCTGTAAGT AAGAGGGCTT ATTTATTTTA 
3301 AATAAAGAGT AATTATTAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
3351 AAAAAAAAAA 


BLAST Results 


Entry HS773347 from database EMBL: 
human STS WI-18160. 
Score = 813, p = 2.9e-30, identities = 167/171 


Medline entries 


No Medline entry 


Peptide information for frame 1 


ORF from 157 bp to 2499 bp; peptide length: 781 
Category: similarity to known protein 


1 MVRVLRSMCL PQLCSHILSV CSGTTSDRNV YSVPGSQYLY NQPSCYRGFQ 
51 TVKHRNENTC PLPQEMKALF KKKTYDEKKT YDQQKFDSER ADGTISSEIK 
101 SARGSHHLSI YAENSLKSDG YHKRTDRKSR IIAKNVSTSK PEFEFTTLDF 
151 PELQGAENNM SEIQKQPKWG PVHSVSTDIS LLREVVKPAA VLSKGEIVVK 
201 NNPNESVTAN AATNSPSCTR ELSWTPMGYV VRQTLSTELS AAPKNVTSMI 
251 NLKTIASSAD PKNVSIPSSE ALSSDPSYNK EKHIIHPTQK SKASQGSDLE 
301 QNEASRKNKK KKEKSTSKYE VLTVQEPPRI EDAEEFPNLA VASERRDRIE 
351 TPKFOSKQQP QDNFKNNVKK SQLPVQLDLG GMLTALEKKQ HSQHAKQSSK 
401 PWVSVGAVP VLSKECASGE RGRRMSQMKT PHNPLDSSAP LMKKGKQREI 
451 PKAKKPTSLK KIILKERQER KQRLQENAVS PAFTSDDTQD GESGGDDQFP 
501 EQAELSGPEG MDELISTPSV EDKSEEPPGT ELQRDTEASH LAPNHTTFPK 
551 IHSRRFRDYC SQMLSKEVDA CVTDLLKELV RFQDRMYQKD PVKAKTKRRL 
601 VLGLREVLKH LKLKKLKCVI ISPNCEKIQS KGGLDDTLHT IIDYACEQNI 
651 PFVFALNRKA LGRSLKKAVP VSWGIFSYD GAQDQFHKMV ELTVAARQAY 
701 KTMLENVQQE LVGEPRPQAP PSLPTQGPSC PAEDGPPALK EKEEPHYIEI 
751 WKKHLEAYSG CTLELEESLE ASTSQMMNLN L 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_6b21, frame 1 

SWISSPR0T:Y256_HUMAN HYPOTHETICAL PROTEIN KIAA0256., N = 1 Score = 
786, P « 3.6e-78 

TREMBL:PFMAL3P3_15 gene: "MAL3P3 . 15"; Plasmodium falciparum MAL3P3, 
= 2, Score = 161, p = 5.1e-10 

TREMBL:RNNrLH_l Rat heavy neurofilament subunit (NF-H) mRNA, 3* end 
= 1, Score = 150, P = 9.1e-07 


>SWISSPROT:Y256_HUMAN HYPOTHETICAL PROTEIN KIAA0256. 
Length = 635 

HSPs: 


Score « 786 (117. 9 bits). Expect « 3.6e-78, P = 3.6e-78 
Identities = 190/424 (44%), Positives = 263/424 (62%) 
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Query: 369 KKSQLPVQLDLGGMLTALEKKQHSQHAKQ—SSKPVVVSVGAVPVLSKECASGERGRRMS 426 

KK++ PVQLDLG ML ALEK+Q + A+Q ++ + P+ +V + ++ + s 

Sbjct; 16 KKNKTPVQLDLGDMLAALEKQQQAMKARQITNTRPLSYTVVTAASFHTKDSTNRKPLTKS 75 

Query: 427 Q-MKTPHNPLDSSAPLMKKGKQREIPKAKKPTSLKKIILKERQERKQRLQENAVSPAFTS 485 

0 T N +D ++ KKGK++EI K K+PT+LKK+ILKER+E+K RL + S 
Sbjct: 76 QPCLTSFNSVDIASSKAKKGKEKEIAKLKRPTALKKVILKEREEKKGRLTVD--HNLLGS 133 

Query: 486 DDTQDGESGGDDQFPEQAELSGPEGMDELISTPSVEDKSEEPPG— TELQRDTEASHL— 541 

++ + D P++ G+ + S S+ S+ P T + + + AS 

Sbjct: 134 EEPTEMHLDFIDDLPQEIVSQEDTGLS-MPSDTSLSPASQNSPYCMTPVSQGSPASSGIG 192 

Query: 542 APN-HTTFPKIHSRRFRDYCSQMLSKEVDACVTDLLKELVRFQDRMYQKDPVKAKTKRRL 600 

+P +T KIHS+RFR+YC+Q+L KE+D CVT LL+ELV FQ+R+YQKDPV+AK +RRL 
Sbjct: 193 SPMASSTITKIHSKRFREYCNQVLCKEIDECVTLLLQELVSFQERIYQKDPVRAKARRRL 252 

Query: 601 VLGLREVLKHLKLKKLKCVIISPNCEKIQSKGGLDDTLHTIIDYACEQNIPFVFALNRKA 660 

V+GLREV KH+KL K+KCVIISPNCEKIQSKGGLD+ L+ +1 A EQ IPFVFAL RKA 
Sbjct: 253 VMGLREVTKHMKLNKIKCVIISPNCEKIQSKGGLDEALYNVIAMAREQEIPFVFALGRKA 312 

Query: 661 LGRSLNKAVPVSWGIFSYDGAQDQFHKMVELTVAARQAYKTMLENVQQELVGEPRP 717 

LGR +NK VPVSVVGIF+Y GA+ F+K+VELT AR+AYK M+ ++QE E 
Sbjct: 313 LGRCVNKLVPVSVVGIFNYFGAESLFNKLVELTEEARKAYKDMVAAMEQEQAEEALKNVK 372 

Query: 718 QAPPSLP-TQGPS CPAEDGPPALKEKEEPHYIEIWKKHLEAYSGCTL ELE 766 

+ P + ++PS C P+EEYW++EG EE 

Sbjct: 373 KVPHHMGHSRNPSAASAISFCSVISEP— ISEVNEKEYETNWRNMVETSDGLEASENEKE 430 

Query: 767 ESLEASTSQ 775 

S + STS+ 
Sbjct: 431 VSCKHSTSE 439 


Pedant information for DKFZphtes3_6b21, frame 1 


Report for DKFZphtes3_6b21 . 1 


(LENGTH J 781 

IMWJ 87393.44 

tpll 8.94 

[HOMOLl SWISSPROT: Y256_HUMAN HYPOTHETICAL PROTEIN KIAA0256. 4e-75 

(PROSITE) MYRISTYL 4 

[PROSITE] AMI DAT I ON 1 

(PROSITE) CAMP_PHOSPHO_SITE 3 

(PROSITE) CK2 PHOSPHO_SITE 16 

[PROSITE] TYR~PHOSPHO_SITE 4 

[PROSITE) PKC_PHOSPHO_SITE 16 

[PROSITE) ASN_GLYCOSYLATION 6 

[KW] Alpha_Beta 

(KW) LOW_COMPLEXITY 8.45 % 


SEQ MVRVLRSMCLPQLCSHILSVCSGTTSDRNVYSVPGSQYLYNQPSCYRGFQTVKHRNENTC 

SEG 

PRO ccceeeeeccceeeeeeeeeeccccccccccccccccccccccceeeceeeeeecccccc 

SEQ PLPQEMKALFKKKTYDEKKTYDQQKFDSERADGTISSEIKSARGSHHLSIYAENSLKSDG 

SEG xxxxxxxxxxxx 

PRD cccchhhhhhhhhhccchhhhhhhhhhhccccccchhhhlihhcccceeeeeeeecccccc 

SEQ YHKRTDRKSRIIAKNVSTSKPEFEFTTLDFPELQGAENNMSEIQKQPKWGPVHSVSTDIS 

SEG 

PRD cccccchhhhheeeccccccccceeecccccccccccchhhhhhccccccccceeecchh 

SEQ LLREVVKPAAVLSKGEIVVKNNPNESVTANAATNSPSCTRELSWTPMGYWRQTLSTELS 

SEG 

PRD hhhhhhheeeeecccceeeeccccceeeeeecccccccceeeeeccceeeeeeccccccc 

SEQ AAPKNVTSMINLKTIASSADPKNVSI PSSEALSSDPSYNKEKHIIHPTQKSKASQGSDLE 

SEG 

PRD ccccceeeeehhhhhhcccccceeeecccccccccccccccceeechhhhhhhcccccch 

SEQ QNEASRKNKKKKEKSTSKYEVLTVQEPPRIEDAEEFPNLAVASERRORIETPKFQSKQQP 

SEG . . . .xxxxxxxxxxxxxx 

PRD hhhhccccccccccccceeeeeecccccchhhhhhccchhhhhhhhhhhhcccccccccc 

SEQ QDNFKNNVKKSQLPVQLDLGGMLTALEKKQHSQHAKQSSKPWVSVGAVPVLSKECASGE 

SEG xxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccchhhhhhhhhhhhhhhhhhhccceeeeeeeeeeeecccccc 
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SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 


RGRRMSQMKTPHNPLDSSAPLMKKGKQREIPKAKKPTSLKKIILKERQERKQRLQENAVS 
chhhhhhcccccccccccccchhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhh^ 

PAFTSDDTQDGESGGDDQFPEQAELSGPEGMDELISTPSVEDKSEEPPGTELQRDTEASH 
ccccccccccccccccccchhhhhhcccccceeeeccccccccccccccccc^^ 

LAPNHTTFPKIHSRRFRDYCSQMLSKEVDACVTDLLKELVRFQDRMYQKDPVKAKTKRRL 
ccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhc^^ 

VLGLREVLKHLKLKKLKCVIISPNCEKIQSKGGLDDTLHTIIDYACEQNIPFVFALNRKA 
XXXXXXXXXX 

hhhhhhhhhhhhhhhheeeeecccccccccccccchhhhhhhhhhhhcccceeeeccccc 
LGRSLNKAVPVSVVGIFSYDGAQDQFHKMVELTVAARQAYKTMLENVQQELVGEPRPQAP 
cccccccceeeeeeeeecccccchhhhhhhhhhhhhhhhhhhhh^ 

PSLPTQGPSCPAEDGPPALKEKEEPHYIEIWKKHLEAYSGCTLELEESLEASTSQMMNLN 

11 ' ' * 1 C''* xxxxxxxxxxxxx 

cccccccccccccccchhhhhhcccceeeehhhhhhhhhchhhhhhhhhhhhhhhccccc 


PSOOOOl 

135->139 

PSOOOOl 

159->163 

roUUUU 1 

204 ->208 

PSOOOOl 

245->249 

PSOOOOl 

263->267 

PSOOOOl 

544->548 

PS00004 

71->75 

PS00004 

423->427 

PS00004 

454->458 

PS00005 

26->29 

PS00005 

51->54 

PS00005 

88->9l 

PS00005 

101->104 

PS00005 

115->118 

PS00005 

125->128 

P$00005 

138->141 

PS00005 

288->291 

PS00005 

305->308 

PS00005 

316->319 

PS00005 

343->346 

PS00005 

351->354 

PS00005 

398->401 

PS00005 

458->461 

PS00005 

553->556 

PS00005 

596->599 

PS00006 

24->28 

PS00006 

74->78 

PS00006 

139->143 

PS00006 

146->150 

PS00006 

193->197 

PS00006 

257->26l 

PS00006 

297->301 

PS00006 

317'->321 

PS00006 

323->327 

PS00006 

384->388 

PS00006 

484->488 

PS00006 

493->497 

PS00006 

506->510 

PSOO006 

519->523 

PS00006 

640->644 

PS00006 

702->706 

PS00007 

581->588 

PS00007 

740->748 

PS00007 

740->748 

PS00007 

73~>82 

PS00008 

93->99 

PS00008 

155->161 

PS00008 

380->386 


Prosite for DKFZphtes3_6b21 . 1 


ASN^GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATT0N 

AS N_G L YCOS y L AT I ON 

ASN^GLYCOSYLATION 

ASN_GLYCOSyLATI0N 

CAMP_ PHOS PHO_S I T E 

CAMP_PH0SPH0_SITE 

CAMP_PHOSPHO_SITE 

PKC_PHOSPH0_SITE 

PKC_PHOSPH0_SITE 

PKC_PHOSPHO_SITE 

P KC_PHOS PHO_S I TE 

PKC_PHOSPHO_SITE 

PKC_PHOSPH0_SITE 

PKC_PHOSPH0_SITE 

PKC_PHOSPH0_SITE 

PKC_PHOSPHO_SITS 

PKC_PHOSPH0 SITE 

PKC_PHOSPHO~SITE 

PKC_PHOS PHO_S I TE 

PKC_PHOS PHO_S I TE 

PKC_PHOSPHO SITE 

PKC_PHOSPH0]|^SITE 

P KC_PHOS PHO_S I TE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PHOSPHO_S I TE 

CK2_PH0SPH0_SITE 

CK2"PHOSPHO_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PHOSPH0_SITE 

CK2_PH0SPH0_SITE 

CK2 PHOSPHO_SITE 

CK2~PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

TYR~PHOSPHO_S I TE 

T YR_PHOS PHO_S I TE 

TYR_ PHOSPHORS I TE 

TYR_PHOSPHO_^SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 


PDOCOOOOl 

PDOCOOOOl 

PDOCOOOOl 

PDOCOOOOl 

PDOCOOOOl 

PDOCOOOOl 

PDOC00004 

PDOC00004 

PDOC00004 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC0O005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00007 

PDOC00007 

PDOC00007 

PDOC00007 

PDOC00008 

PDOC00008 

PDOC00008 
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PS00008 633->639 MYRISTYL PDOC00008 

PS00009 421->425 AMIDATION PDOC00009 


(No Pfam data available for DKFZphte$3_61:>21 . 1} 
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DKFZphtes3 6cll 


group: signal transduction 

an'the'^diol'ster^Kleptorr'' ''''' ""^ sinularity to A. aoO^isexualis 

^^P^J^^S^:^^^:^ -as. ana 

ambisexualis antheridiol steroid receptor. similar to cne A. 

ITtHs ?elept2r?^" ^'"^ ^PP"""°" modulating/blocking the expression of genes controlled 
Strong similarity to YNL132w 

C^elegans/F55A12^8^^ S .poinbe/YDK9_SCHP0, S.cerevisiae/YNL132w, 

Sequenced by BMFZ 

Locus : unknown 

Insert length: 3966 bp 

Poly A stretch at pos. 3890, polyadenylation signal at pes. 3873 

1 GCTGTGCCTT CTCTTTCGGA GTTGTTCCGT GCTCCCACGT GCTTCCCCTT 
51 CTCCACTGGC TGGGATCCCC CGGGCTCGGG GCGCAGTAAT AATTTTTCAC 
101 CATGCATCGG AAAAAGGTGG ATAACCGAAT CCGGATTCTC ATTGAGAATG 
151 GAGTAGCTGA GCGGCAAAGA TCTCTCTTTG TTGTAGTTGG GGATCGAGGA 
201 AAAGATCAGG TGGTAATACT TCATCACATG TTATCCAAAG CAACTGTGAA 
251 GGCTCGGCCT TCAGTGCTGT GGTGTTATAA GAAAGAGCTG GGGTTTAGCA 
301 GTCACCGGAA GAAAAGAATG CGACAGCTGC AGAAGAAAAT AAAGAATGGA 
351 ACACTGAACA TAAAGCAGGA CGACCCCTTT GAACTCTTCA TAGCAGCCAC 
401 AAACATTCGC TACTGCTACT ACAACGAGAC CCACAAGATC CTGGGCAATA 
451 CCTTCGGCAT GTGTGTGCTG CAGGATTTTG AAGCCTTAAC TCCAAACTTG 
501 CTGGCCAGGA CTGTAGAAAC AGTGGAAGGT GGTGGGCTAG TGGTCATCCT 
551 CCTACGGACC ATGAACTCAC TCAAGCAATT GTACACAGTG ACTATGGATG 
601 TGCATTCCAG GTACAGAACT GAGGCCCATC AGGATGTGGT GGGAAGATTT 
651 AATGAAAGGT TTATTCTGTC TCTGGCCTCT TGTAAGAAGT GTCTCGTCAT 
701 TGATGACCAG CTCAACATCC TGCCCATCTC CTCCCACGTT GCCACCATGG 
751 AGGCCCTGCC TCCCCAGACT CCGGATGAGA GTCTTGGTCC TTCTGATCTG 
801 GAGCTGAGGG AGTTGAAGGA GAGCTTGCAG GACACCCAGC CTGTGGGTGT 
851 GTTGGTGGAC TGCTGTAAGA CTCTAGACCA GGCCAAAGCT GTCTTGAAAT 
901 TTATCGAGGG CATCTCTGAA AAGACCCTGA GGAGTACTGT TGCACTCACA 
951 GCTGCTCGAG GACGGGGAAA ATCTGCAGCC CTGGGATTGG CGATTGCTGG 
1001 GGCGGTGGCA TTTGGGTACT CCAATATCTT TGTTACCTCC CCAAGCCCTG 
1051 ATAACCTCCA TACTCTGTTT GAATTTGTAT TTAAAGGATT TGATGCTCTG 
1101 CAATATCAGG AACATCTGGA TTATGAGATT ATCCAGTCTC TAAATCCTGA 
1151 ATTTAACAAA GCAGTGATCA GAGTGAATGT ATTTCGAGAA CACAGGCAGA 
1201 CTATTCAGTA TATACATCCT GCAGATGCTG TGAAGCTGGG CCAGGCTGAA 
1251 CTAGTTGTGA TTGATGAAGC TGCCGCCATC CCCCTCCCCT TGGTGAAGAG 
1301 CCTACTTGGC CCCTACCTTG TTTTCATGGC ATCCACCATC AATGGCTATG 
1351 AGGGCACTGG CCGGTCACTG TCCCTCAAGC TAATTCAGCA GCTCCGTCAA 
1401 CAGAGCGCCC AGAGCCAGGT CAGCACCACT GCTGAGAATA AGACCACGAC 
14 51 GACAGCCAGA TTGGCATCAG CGCGGACACT GCATGAGGTT TCCCTCCAGG 
1501 AGTCAATCCG ATACGCCCCT GGGGATGCAG TGGAGAAGTG GCTGAATGAC 
1551 TTGCTGTGCC TGGATTGCCT CAACATCACT CGGATAGTCT CAGGCTGCCC 
1601 CTTGCCTGAA GCTTGTGAAC TGTACTATGT TAATAGAGAT ACCCTCTTTT 
1651 GCTACCACAA GGCCTCTGAA GTTTTCCTCC AACGGCTTAT GGCCCTCTAC 
1701 GTGGCTTCTC ACTACAAGAA CTCTCCCAAT GATCTCCAGA TGCTCTCCGA 
1751 TGCACCTGCT CACCATCTCT TCTGCCTTCT GCCTCCTGTG CCCCCCACCC 
1801 AGAATGCCCT TCCAGAAGTG CTTGCTGTTA TCCAGGTGTG CCTTGAAGGG 
1851 GAGATTTCTC GCCAGTCCAT CTTGAACAGT CTGTCTCGAG GCAAGAAGGC 
1901 TTCAGGGGAC CTGATTCCAT GGACAGTGTC AGAACAGTTC CAAGATCCAG 
1951 ACTTTGGTGG TCTGTCTGGT GGAAGGGTCG TTCGCATTGC TGTTCACCCA 
2001 GATTATCAAG GGATGGGCTA TGGCAGCCGT GCTCTGCAGC TGCTGCAGAT 
2051 GTACTATGAA GGCAGGTTTC CTTGTCTGGA GGAAAAGGTC CTTGAGACAC 
2101 CACAGGAAAT TCACACCGTA AGCAGCGAGG CTGTCAGCTT GTTGGAAGAG 
2151 GTCATCACTC CCCGGAAGGA CCTGCCTCCT TTACTCCTCA AATTGAATGA 
2201 GAGGCCTGCC GAACGCCTGG ATTACCTGGG TGTTTCCTAT GGCTTGACCC 
2251 CCAGGCTCCT CAAGTTCTGG AAACGAGCTG GATTTGTTCC TGTTTATCTG 
2301 AGACAGACCC CGAATGACCT GACCGGAGAG CACTCGTGCA TCATGCTGAA 
2351 GACGCTCACT GATGAGGATG AGGCTGACCA GGGAGGCTGG CTTGCAGCCT 
2401 TCTGGAAAGA TTTCCGACGG CGGTTCCTAG CCTTGCTCTC CTACCAGTTC 
2451 AGTACCTTCT CTCCTTCCCT GGCTCTGAAC ATCATTCAGA ACAGGAACAT 
2501 GGGGAAGCCA GCCCAGCCTG CCCTGAGCCG GGAGGAGCTG GAAGCACTCT 
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2551 TCCTCCCCTA TGACCTGAAG CGGCTGGAGA TGTATTCACG GAATATGGTG 
2601 GACTATCACC TCATCATGGA CATGATCCCG GCCATCTCTC GCATCTATTT 
2651 CCTGAACCAG CTGGGGGACC TGGCCCTGTC TGCGGCTCAG TCGGCTCTTC 
2701 TCTTGGGGAT TGGCCTGCAG CATAAGTCTG TGGACCAGCT GGAAAAGGAG 
2751 ATTGAGCTGC CCTCGGGCCA GTTGATGGGA CTTTTCAACC GGATCATCCG 
2801 CAAAGTTGTG AAGCTATTTA ATGAAGTTCA GGAAAAGGCC ATTGAGGAGC 
2851 AGATGGTGGC AGCGAAGGAT GTGGTCATGG AGCCCACGAT GAAGACCCTC 
2901 AGTGACGACC TAGATGAAGC AGCAAAGGAA TTTCAGGAGA AACACAAGAA 
2951 GGAAGTAGGG AAGCTGAAGA GCATGGACCT CTCTGAATAC ATAATCCGTG 
3001 GGGACGATGA AGAGTGGAAT GAAGTTTTGA ACAAAGCTGG GCCGAACGCC 
3051 TCGATCATCA GCCTGAAAAG TGACAAGAAA AGGAAGTTAG AGGCCAAACA 
3101 AGAACCCAAA CAGAGCAAGA AGTTGAAGAA CAGAGAGACA AAGAACAAAA 
3151 AAGATATGAA ACTGAAGCGG AAGAAATAGT GAAGAGAAAC TCGGGCATCT 
3201 GTGTTTGATC ATGGGAAGAT ACTCTCACTA ACTGAACCCT CTCTGGCTGG 
3251 ACTGTTAAAA GCAACGAGAG GCCCCGGCAC ACCTGGAAGC TGGCCGCGAA 
3301 TTCGGCCTCT GGGCCTGTGT GTCTGTGAGC TCAACCTGGC TAAAGGCAGA 
3351 GTCACTCCCA AATGGGTCTC TTTAGAACTT GATGGCTGGG CACTGCCATC 
3401 TCTAGAATTG CCACGAGTCT CTCTCTTCCT GCCCAGTCCA GGGCCCTCCT 
3451 TTCCTATAAG TTCATATTTT GCTTTGAGCC AGCTTTTTAG TCTCATTCCC 
3501 ACACATGTGG AAGCCACGTT GCCTCTCGAC CGCCTGAGGC CCTTAAGTAC 
3551 ATCGCTTTCT GGTGGTGCCC AGGAGGCTGC TGCTGGGCCG CTGGGTCTCT 
3601 CTTTGTGGAC TTGTACCTGG AGCAGGAGGA ACTCCAGTCC GTCCCGGCAT 
3651 CCATGGCAGC CCGCGGTTAG GTGCGCCAGG GTTTGCTGAT GTTGTCTTGT 
3701 GCTGTTCCAC TCTTGGCTCC AGCAGACCCA CTGTCCCAGA AAAGCCTGAT 
3751 CCTGTAGTTT ATGTAGAATG CCACATCTGC GTCCTCAAGA CCTGTTTCAT 
3801 CCATTTGGGA AAAGATGTTG GGAAAGGCCA CTTTGCTCGC AGGGGTGAGG 
3851 GGAAGGATAG AGAATCTATT TTTAATAAAT AACATTCTAG AATGAAAAAA 
3901 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
3951 AAAAAAAAAA AAAAAA 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 3 


ORF from 102 bp to 3176 bp; peptide length: 1025 
Category: similarity to unknown protein 
Classification: unclassified 
Prosite motifs: RGD (966-969) 
ATP_GTP__A (284-292) 


1 MHRKKVDNRI RILIENGVAE RQRSLFVWG DRGKDQWIL HHMLSKATVK 
51 ARPSVLWCYK KELGFSSHRK KRMRQLQKKI KNGTLNIKQD DPFELFIAAT 
101 NIRYCYYNET HKILGNTFGM CVLQDFEALT PNLLARTVET VEGGGLWIL 
151 LRTMNSLKQL YTVTMDVHSR YRTEAHQDW GRFNERFILS LASCKKCLVI 
201 DDQLNILPIS SHVATMEALP PQTPDESLGP SDLELRELKE SLQDTQPVGV 
251 LVDCCKTLDQ AKAVLKFIEG ISEKTLRSTV ALTAARGRGK SAALGLAIAG 
301 AVAFGYSNIF VTSPSPDNLH TLFEFVFKGF DALQYQEHLD YEIIQSLNPE 
351 FNKAVIRVNV FREHRQTIQY IHPADAVKLG QAELWIDEA AAIPLPLVKS 
401 LLGPYLVFMA STINGYEGTG RSLSLKLIQQ LRQQSAQSQV STTAENKTTT 
451 TARLASARTL HEVSLQESIR YAPGDAVEKW LNDLLCLDCL NITRIVSGCP 
501 LPEACELYYV NRDTLFCYHK ASEVFLQRLM ALYVASHYKN SPNDLQMLSD 
551 APAHHLFCLL PPVPPTQNAL PEVLAVIQVC LEGEISRQSI LNSLSRGKKA 
601 SGDLIPWTVS EQFQDPDFGG LSGGRVVRIA VHPDYQGMGY GSRALQLLQM 
651 YYEGRFPCLE EKVLETPQEI HTVSSEAVSL LEEVXTPRKD LPPLLLKLNE 
701 RPAERLDYLG VSYGLTPRLL KFWKRAGFVP VYLRQTPNDL TGEHSCIMLK 
751 TLTDEDCAOQ GGWLAAFWKD FRRRFLALLS YQFSTFSPSL ALNIIQNRNM 
801 GKPAQPALSR EELEALFLPY DLKRLEMYSR NMVDYHLIMD MIPAISRIYF 
851 LNQLGDLALS AAQSALLLGI GLQHKSVDQL EKEIELPSGQ LMGLFNRIIR 
901 KVVKLFNEVQ EKAIEEQMVA AKDWMEPTM KTLSDDLDEA AKEFQEKHKK 
951 EVGKLKSMDL SEYIIRGDDE EWNEVLNKAG PNASIISLKS DKKRKLEAKQ 
1001 EPKQSKKLKN RETKNKKOMK LKRKK 

BLASTP hits 

No BLASTP hits available 
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Alert BLASTP hits for DKFZphtes3_6cll , frame 3 

I^c"?^^^"^?"^?^-' 9""^= -F55A12.8"; Caenorhabditis elegans cosmid 
F55A12., N = 1, Score = 2782, P = l.le-289 

PIR:S55151 probable membrane protein YNL132w - yeast (Saccharomyces 
cerevisiae), N - 2, Score « 2549, P = 3.5e-273 

SWISSPROTiYXXl^ACHAM HYPOTHETICAL PROTEIN (FRAGMENT)., N = 1, Score - 
1013/ P — 3 . 2e~102 

SWISSPROT:YDK9_SCHP0 HYPOTHETICAL 116.5 KD PROTEIN C20G8 09C IN 
CHROMOSOME I., N - 1, Score = 2B43, p « 3.8e-296 

>SWISSPR0T:YDK9_SCHP0 HYPOTHETICAL 116.5 KD PROTEIN C20G8.09C IN CHROMOSOME 
Length = 1,033 

HSPs: 

Score - 2843 (426.6 bits). Expect = 3.8e-296, P = 3 8e-296 
Identities = 576/1033 (55%), Positives = 750/1033 (72%) 

Query: 1 MHRKKVDNRIRILIENGVAERQRSLFVVVGDRGKDQVVILHHMLSKATVKARPSVLWCYK 60 

M +K +D+RI LI+NG 'E+QRS FVVVGDR +DQVV LH +LS++ V ARP+VLW YK 

SbDCt: 1 mpkkaldsriptlikngcqekqrsffwvgdrardqvvnlhwllsqskvaarpnvlwmyk 60 

Query: 61 KEL-GFSSHRKKRMRQLQKKIKNGTLNIKQDDPFELFIAATNIRYCYYMETHKILGNTFG 119 

K+L GF+SHRKKR +++K+IK G + +DPFELF + TNIRYCYY E+ KILG T+G 
Sbjct: 61 KDLLGFTSHRKKRENKIKKEIKRGIRDPNSEDPFELFCSITNIRYCYYKESEKILGQTYG 120 

Query: 120 MCVLQDFEALTPNLLARTVETVEGGGLVVILLRTMNSLKQLYTVTMDVHSRYRTEAHQDV 179 
eu - . W VLQDFEALTPNLLART+ETVEGGG+VV+LL +NSLKQLYT++MD+HSRYRTEAH DV 

SbDCt: 121 MLVLODFEALTPNLLARTIETVEGGGIVVLLLHKLNSLKQLYTMSMDIHSRYRTEAHSDV 180 

Query: 180 VGRFNERFILSLASCKKCLVIDDQLNILPISSHVATMEALPPQTPDESLGPSDLELRELK 239 

RFNERFILSL +C+ CLVIDD+LN+LPIS ++ALPP +++ + ++EL+ 

Sbjct: 181 TARFNERFILSLGNCENCLVIDDELNVLPISGG-KNVKALPPTLEEDN — STQNSIKELQ 237 

Query: 240 ESLQDTQPVGVLVDCCKTLDQAKAVLKFIEGISEKTLRSTVALTAARGRGKSAALGLAIA 299 

. on« ^ ^ KTLDQA+AVL F+E I EK+L+ TV+LTA RGRGKSAALGLAIA 

Sb3Ct: 238 ESLGEDHPAGALVGVTKTLDQARAVLTFVESIVEKSLKGTVSLTAGRGRGKSAALGLAIA 297 

Query: 300 GAVAFGYSNIFVTSPSPDNLHTLFEFVFKGFDALQYQEHLDYEIIQSLNPEFNKAVIRVN 359 

GYSNIF+TSPSP+NL TLFEF+FKGFDAL Y+EH+DY+IIQS NP ++ A++RVN 
Sbjct: 298 AAIAHGYSNIFITSPSPENLKTLFEFIFKGFDALNYEEHVDYDIIQSTNPAYHNAIVRVN 357 

Query: 360 VFREHRQTIQYIHPADAVKLGQAELVVIDEAAAIPLPLVKSLLGPYLVFMASTINGYEGT 419 
ci.- . +FR+HRQTIQYI P D+ LGQAELVVIDEAAAIPLPLV+ L+GPYLVFMASTINGYEGT 

SbDCt: 358 IFRDHRQTIQYISPEDSNVLGQAELVVIDEAAAIPLPLVRKLIGPYLVFMASTINGYEGT 417 

Query: 420 GRSLSLKLIQQLRQQSAQSQVSTTAENKTTTTARLASARTLHEVSLQESIRYAPGDAVEK 479 

GRSLSLKL+QQLR+QS S + NK+ + + + S RTL E+SL E IRYA GD +E 

Sbjct: 418 GRSLSLKLLQQLREQSRI— YSGSGNNKSDSQSHI-SGRTLKEISLDEPIRYAMGDRIEL 474 

Query: 480 WLNDLLCLDCLN-ITRIVS-GCPLPEACELYYVNRDTLFCYHKASEVFLQRLMALYVASH 537 

.n. * + G P P C LY V+RDTLF YH SE FLQR+M+LYVASH 

Sb^ct: 475 WLNKLLCLDAASYVSRMATQGFPHPSECSLYRVSRDTLFSYHPISEAFLQRMMSLYVASH 534 

Query: 538 YKNSPNDLQMLSDAPAHHLFCLLPPVPPTQNALPEVLAVIQVCLEGEISRQSILNSLSRG 597 

YKNSPNDLQ++SDAPAH LF LLPPV LP+ + VIQ+ LEG ISR+SI+NSLSRG 

Sbjct: 535 YKNSPNDLQLMSDAPAHQLFVLLPPVDLKNPKLPDPICVIQLALEGSISRESIMNSLSRG 594 

Query: 598 KKASGDLIPWTVSEQFQDPDFGGLSGGRWRIAVHPDYQGMGYGSRALQLLQMYYEGRFP 657 
CK. . ... ^t"" LG R+VRIAV MGYG+RA+QLL Y+EG+F 

Sbjct: 595 QRAGGDLIPWLISQQFQDENFAALGGARIVRIAVSPEHVKMGYGTRAMQLLHEYFEGKFI 654 

Query: 658 CLEEKVLETPQEIHTVSSEAV SLLEEVITPR— KDLPPLLLKLNERPAERLDYLGVS 712 

^ + + E + +L E I R K +PPLLLKL+E E L Y+GVS 

Sb3ct: 655 SASEEFKAVKHSLKRIGDEEIENTALQTEKIHVRDAKTMPPLLLKLSELQPEPLHYVGVS 714 

Query: 713 YGLTPRLLKFWKRAGFVPVYLRQTPNDLTGEHSCIMLKTLTDEDEADQGGWLAAFWKDFR 772 

^GLTP L KFWKR G+ P+YLRQT NDLTGEH+C+ML+ L D WL AF + + F 

Sbjct: 715 YGLTPSLQKFWKREGYCPLYLRQTANDLTGEHTCVMLRVLEGRDSE WLGAFAQNFY 770 

Query: 773 RRFLALLSYQFSTFSPSLALNI IQNRNMGKP AQPALSREELEALFLPYDLKRLEMY 828 

RRFL+LL YQF F+ AL+++ N G + L+ EE+ +F YDLKRLE Y 

Sbjct: 771 RRFLSLLGYQFREFAAITALSVLDACNNGTKYVVNSTSKLTNEEINNVFESYDLKRLESY 830 
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Query: 

829 

Sbjct : 

831 

Query: 

888 

Sbjct: 

891 

Query: 

940 

Sbjct: 

951 

Query: 

,999 

Sbjct: 

1005 


S N ++DYH+I+D++P ++ +YF + D + LS Q ++LL +GLQ+K+ + D LEKE LP 


SGQLMGLFNRIIRKVVKLFNEVQEKAIEEQMVAAKDVVME PTMKTLSDDLDE 939 

S QL+ -I- ++ +K++K +E++ K IEE++ + K P ++L ++L E 


A E +K+ + ++DL +y IRG++E+W KA N I R + 

GADEAMLALREKQRELINAIDLEKYAIRGNEEDW KAAEN-QIQKTNGKGARWSI 1004 


K E +++ L +++TK K K K +K 


Pedant information for DKFZphtes3_6cll, frame 3 
Report for DKFZphtes3_6cll . 3 

[LENGTH) 1025 
fMW) 115704.57 
[pD 8.50 

[HOMOL] PIR:S55151 probable membrane protein YNL132w - yeast (Saccharomyces cerevisiae) 

0.0 

[FUNCAT] 10.99 other signal-transduction activities (S. cerevisiae, YNL132w] 0.0 

[FUNCATJ r general function prediction (H. influenzae, HI1254] 2e'05 

[PROSITE) ATP_GTP_A 1 

IPROSITE) RGD 1 

(KWJ Alpha_Beta 

[KW] LOW_COMPLEXITy 11.80 % 

SEQ MHRKKVONRIRILIENGVAERQRSLFVWGDRGKDQWILHHMLSKATVKARPSVLWCYK 

SEG 

PRO cccccccchhhhhhcccccccceeeeeeeeccccceeeeehhhtihhhhhhccceeehhhh 

SEQ KELGFSSHRKKRMRQLQKKIKNGTLNIKQDDPFELFIAATNIRYCYYNETHKILGNTFGM 

SEG 

PRD hhhcccchhhhhhhhhhhhhhhhcccccccccceeeecccceeeeeccccceeeccccee 

SEQ CVLQDFEALTPNLLARTVBTVEGGGLVVILLRTMNSLKQLYTVTMDVHSRYRTEAHQDVV 

SEG xxxxxxxxxxxxxxx 

PRD eehhhhhccccchhhhhhhhhcccceeeeeeccchhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ GRFNERFILSLASCKKCLVIDDQLNILPISSHVATMEALPPQTPDCSLGPSDLELRELKE 

SEG 

PRD hhhhhhhhhhhcccceeeeeecceeeecccccccccccccccccccccccchhhhhhhhh 

SEQ SLQDTQPVGVLVDCCKTLDQAKAVLKFIEGI SEKTLRSTVALTAARGRGKSAALGLAI AG 

SEG xxxxxxxxx 

PRD hhcccccceeeeehhhhhhhhhhhhhhhhhhhhhhhhhhheeeccccccchhhhhhhhhh 

SEQ AVAFGYSNI FVTSPSPDNLHTLFEFVFKGFDALQYQEHLDYEIIQSLNPEFNKAVIRVNV 

SEG XXX 

PRD hhhhcccceeecccccccchhhhhhhhhhhhhhhhhhhhhheeeeeccccccceeeeeeh 

SEQ FREHRQTIQYIHPADAVKLGQAELVVIDEAAAI PLPLVKSLLGPYLVFMASTINGYEGTG 

SEG 

PRD hhhhhhheeeeccccccccccceeeehhhhhccchhhhhhhccceeeeeeeccccccccc 

SEQ RSLSLKLIQQLRQQSAQSQVSTTAENKTTTTARLASARTLHEVSLQESIRYAPGDAVEKW 
SEG XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 

PRD cchhhhhhhhhhhhhhhhhhhhccccccccccchhhhhhhhhhhhhhceeeccccchhhh 

SEQ LNDLLCLDCLNITRIVSGCPLPEACELYYVNRDTLFCYHKASEVFLQRLMALYVASHYKN 

SEG xxxxxxxxxxx 

PRD hhhhhhcccccceeeccccccccceeeeeeeccccccccchhhhhhhhhhhhhhhhhccc 

SEQ SPNDLQMLSOAPAHHLFCLLPPVPPTQNALPEVLAVIQVCLEGEISRQSILNSLSRGKKA 

SEG 

PRD cccccccccccccceeeeeeccccccccccchhhhhhhhhhccccchhhhhhhhcccccc 

SEQ SGDLIPWTVSEQFQDPDFGGLSGGRWRIAVHPDYQGMGYGSRALQLLQHYYEGRFPCLE 

SEG 

PRD cccchhhhhhhhhhhccccccccceeeeeeccccccccccchhhhhhhhhhhhcccchhh 

SEQ EKVLETPQEIHTVSSEAVSLLEEVITPRKDLPPLLLKLNERPAERLDYLGVSYGLTPRLL 
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SEG 
PRD 

SEQ 
SEG 


xxxxxxxxxx 

hhhhhccccccchhhhhhhhhhhhhhccccccccccccccccccceeeeccccccchhhh 

KFWKRAGFVPVYLRQTPNDLTGEHSCIMLKTLTDEDEADQGGWLAAFWKDFRRRFLALLS 
PRD hhhhhcccceeeeeccccccccceeeeeeecccccccccchhhhh^^^ 

SEQ YQf^STFSPSLALNIIQNRNMGKPAQPALSREELEALFLPYDLKRLEMYSRNMVDYHLIMD 
b fc»G 

PRD hhhhcchhhhhhhhhhhcccccccchhhhhhhhhhhhccchhhh^ 

MIPAISRIYFLNQLGDLALSAAQSALLLGIGLQHKSVDQLEKEIELPSGQLMGLFNRIIR 
xxxxxxxxxxxxxxxxxxxxx 

hhhhhhhhhhhhcccchhhhhhhhhhhhhhcchhhhhhhhhhhhhccccchhhhhhhhhh 
KWKLFNEVQEKAIEEQMVAAKDVVMEPTMKTLSDDLDEAAKEFQEKHKKEVGKLKSMDL 
PRD *»hhhhhhhhhhhhhhhhhhhhhhhhcccchhhhhhh^ 

SEYIIRGDDEEWNEVLNKAGPNASIISLKSDKKRKLEAKQEPKQSKKLKNRETKNKKDMK 


SEQ 
SEG 
PRD 


SEQ 
SEG 


SEQ 

SEG 

PRn r-^oo^.^^^^KKKililiIiluu XXXXXXXXXXXXXXX 

PRD cceeecccchhhhhhhhhccccceeeeeeccchhhhhhhhcccccccccccccccchhhh 

SEQ LKRKK 

SEG xxxxx 

PRD hhccc 


Prosite for DKFZphtes3_6cll . 3 

PS00016 966->969 RGD PDCX:00016 
PS00017 284~>292 ATP_GTP_A PDOCOOoJv 

(No Pfam data available for DKFZphtes3_6cll.3) 
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DKFZphtes3_6dl6 


group: testes derived 

DKFZphtes3_6dl6 encodes a novel 695 amino acid protein nearly identical to a sequence from 
human PAC clone WUGSC:H_DJ1185I07.2. 

The cDNA is different to the proposed gene model: it contains additional exons. 
No informative BI*AST results; No predictive prosite^ pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 


WUGSC:H_DJ1185I07 .2, differences to genmodel 

differences to genmodel of WUGSC:H_DJ1185I07 .2 two exons skippt. 

Sequenced by BMFZ 

Locus: /map="7qll.23-q21" 

Insert length: 4572 bp 

Poly A stretch at pes. 4540, polyadenylation signal at pos, 4520 


1 GGCGGCGCTA GCTTCGGAGT CTCCCGCGCG CACCTCAGCC GCCTCCTAGC 
51 GGCGCGGCGC TCGCTCCTAC GCCTAAAATG ACCAATGTGT GATTTCAG7G 
101 GAATAAATGG CGTCCAAAGT CACAGATGCT ATAGTCTGGT ATCAAAAGAA 
151 GATTGGAGCA TATGATCAAC AAATATGGGA TU^TCTGTT GAACAGAGAG 
201 AAATCAAGGG GCTAAGGAAT AAACCAAAGA AAACAGCACA TGTGAAACCA 
251 GACCTCATAG ATGTTGATCT TGTAAGAGGG TCTGCATTTG CAAAGGCAAA 
301 GCCTGAAAGT CCTTGGACTT CTCTGACCAG AAAGGGAATT GTTCGAGTTG 
351 TATTTTTCCC CTTTTTCTTC CGGTGGTGGT TACAAGTAAC ATCAAAGGTC 
401 ATCTTTTTCT GGCTTCTTGT CCTTTATCTT CTTCAAGTTG CTGCAATAGT 
451 ATTATTCTGC TCCACTTCTA GCCCACACAG CATACCTCTG ACAGAGGTGA 
501 TTGGGCCGAT ATGGCTGATG CTGCTCCTGG GAACTGTGCA TTGCCAGATT 
551 GTTTCCACAA GAACACCCAA ACCTCCTCTA AGTACAGGGG GTAAAAGAAG 
601 AAGGAAATTA AGAAAAGCAG CCCATTTGGA AGTACATAGG GAAGGAGATG 
651 GTTCTAGTAC CACAGATAAC ACACAAGAGG GAGCAGTTCA GAACCACGGT 
701 ACAAGCACCT CTCACAGCGT TGQCACTGTC TTCAGAGATC TCTGGCATGC 
751 TGCTTTCTTT TTATCAGGAT CAAAGAAAGC AAAGAATTCA ATTGATAAAT 
801 CAACTGAAAC TGACAATGGC TATGTATCCC TTGATGGGAA GAAGACTGTT 
851 AAAAGCGGTG AAGATGGAAT ACAAAACCAT GAACCTCAGT GTGAAACTAT 
901 TCGACCAGAA GAGACAGCCT GGAACACAGG AACACTGAGG AATGGTCCTA 
951 GCAAAGATAC CCAAAGGACA ATAACAAATG TCTCTGATGA AGTCTCCAGT 
1001 GAGGAAGGTC CTGAAACAGG ATACTCATTA CGTCGTCATG TGGACAGGAC 
1051 TTCTGAAGGT GTTCTTCGGA ATAGAAAGTC ACACCATTAT AAGAAACATT 
1101 ACCCTAATGA GGACGCCCCT AAATCGGGTA CTAGTTGCAG CTCTCGCTGT 
1151 TCAAGTTCCA GACAGGATTC TGAGAGTGCA AGGCCAGAAT CTGAAACAGA 
1201 AGATGTGTTA TGGGAAGACT TGTTACATTG TGCAGAATGC CATTCATCTT 
1251 GTACCAGTGA GACAGATGTG GAAAATCATC AGATTAATCC ATGTGTGAAA 
1301 AAAGAATATA GAGATGACCC TTTTCATCAG AGTCATTTGC CCTGGCTCCA 
1351 TAGTTCCCAC CCAGGATTAG AAAAAATAAG TGCTATAGTA TGGGAAGGTA 
1401 ATGATTGTAA GAAAGCAGAC ATGTCTGTAC TTGAAATCAG TGGAATGATA 
1451 ATGAACAGAG TGAACAGCCA TATACCAGGA ATAGGATACC AGATTTTTGG 
1501 AAATGCAGTC TCTCTCATAC TGGGTTTAAC TCCATTTGTT TTCCGACTTT 
1551 CTCAAGCTAC AGACTTGGAA CAACTCACAG CACATTCTGC TTCAGAACTT 
1601 TATGTGATTG CATTTGGTTC TAATGAAGAT GTCATAGTTC TTTCTATGGT 
1651 TATAATAAGT TTTGTGGTTC GCGTGTCTCT TGTGTGGATT TTCTTTTTTT 
1701 TGCTCTGTGT AGCAGAAAGA ACTTATAAAC AGCGATTACT TTTTGCAAAA 
1751 CTCTTTGGAC ATTTAACATC TGCAAGGAGG GCTCGAAAAT CTGAGGTTCC 
1801 TCATTTCCGG TTGAAGAAAG TACAGAATAT AAAAATGTGG CTATCTCTCC 
1851 GTTCCTATCT TAAGCGTCGA GGTCCTCAGC GATCAGTTGA TGTAATAGTT 
1901 TCATCTGCTT TCTTATTGAC TATCTCAGTT GTATTTATCT GTTGTGCCCA 
1951 GATAAACCTC TACTTGAAAA TGGAGA7VAAA ACCTAACAAA AAGGAGGAAC 
2001 TGACACTAGT GAATAATGTT TTAAAACTGG CTACTAAACT GCTAAAGGAG 
2051 TTGGACAGTC CTTTTAGATT ATATGGGCTT ACAATGAATC CGCTGCTTTA 
2101 TAACATCACC CAGGTTGTTA TCCTGTCAGC TGTTTCTGGT GTTATCAGTG 
2151 ACTTGCTTGG ATTTAATTTA AAGCTATGGA AGATTAAGTC ATGACAATTC 
2201 AAAGAAAAGA AGATGTAGCC TCTTTTCCAG AATAAGAGTA CTGACTAAGC 
2251 TGCCTGAAAG CTTGTCACTG ATTCTTTGCT TCAGGAGTCT CAGCTAGGGA 
2301 GTTGAAGTGT TTACATCAGA CTGTCTTGTG CAATTCTTAT ATTTATTTTA 
2351 CTGGTTCACT TTTTTTTACA TTTATTTTAG TCTTTATATT TTTATTTTTA 
2401 AGCATTGATG TACTTAGTTG TTGAAAGGGT GATGAAACTG ATATCCAGAT 
2451 ACTTGAGATC CTGGTAATTG GTCATAAATA ATTGGCAAAA TAACAAATTG 
2501 TGAAAATAGA AGCCATTGCT CAGCACCGTT TCTCCATCAA TGCCGTGAAC 
2551 TTGCCTTACT TGAGGAAAAA TTCTTTAACT TTGGAATATT GCATTGAACT 
2601 CAGCTATACA CATAAAACAT TTTCTTTGGT AAATCAAGAT CCAGTCAGGG 
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2651 TTTCTCTTGA ATTATTTTGG AACAATGCCA GGATCCAAAC TGATTAAGTT 
2701 ACAGTTTAAG CACCCTTCAG TATTAATATA TACGGTATTA TATAACAGGT 
2751 CAACAAGTGC TCTTTGATGA TAAAACTTGT AATAGAGCAA TAATTGTAAA 
2801 TGGTTACCAT ACTGTAAGAT ATTTTGATAA AAATTAACTA GTAATACTTG 
2851 TATTTATTTG AAACACTGGG CTGTTTGCAC AGCTCCAACT GTGCATGCTC 
2901 AAAATGTGCA CTTTTTAAAA TTGTTACTTT TAATGCGTAT CTTTATATGG 
2951 GATCTGTTAT AGTATACTAG GGCATGATAT GGTATCCTTT TGAGTGAGGT 
3001 ATATACTCAT CTCACAAGTG AAGTGCCTAC TGATATTACT AAAGTACATT 
3051 ATGTTTACTC AAGTAAATAA TTTTCTCCCC ATGGTACACT CTAGTGTAGG 
3101 CTATTCATAC CACACTGAAA TGAACAACTG AAGAATAAGG CTAAGAACCA 
3151 ATAAAATATT TCTCTAATTG CTAGTTGTAA AACTGTATCC AAATTTTCAG 
3201 AAAAGACAGC TTCAGCTTGC AAATTCTATC CTCTAAACTT ATCTGGTGCA 
3251 TTCTCCCCAC CCCACCCCCA TTATATAAGG GCTATTTTAG ATGCTTTTAA 
3301 CCTCCCCAAC AAATAATTTG CCAAGTGTCC AATGAGAACT TATCATGTTG 
3351 GTGTGTTAGG TAAATCGGGC AAATATGATA GTGTCTTACA TTGGGCCTTG 
3401 ATTTTAAGTT GTTATATTTG TACAATCGAG TATTTTAGAA ATTACATGAA 
3451 ACATGAAACA GTTTTTGCAA TTTTTTTTAA ACTGGGCATC TGGTTTCTAA 
3501 AAATTTATTT GAAACAATCT AGAATTTTCT TGGTGCAAAG TGTATCATGT 
3551 GGAATATCCT CATATTTTTA CCATATTTTA AGAACTTTAA GACGATTAAT 
3601 TGTAAATAAT TTATTTGATT GGTGCAGTTC TAATCCCTAA ATCATAATCT 
3651 TAAAATCAGG AATGTGTGGA GAACAGAGCC ATGTCATATC ACTTTGCTCT 
3701 TACCATTCCT TTTGATCAGC CTCAATTCAG CCTCATTGTG TAGTATGTTT 
3751 TTTCTTTCTA TGAAAAACAA CAGAAAGCAT TTCATTPTAT TTGCCTATGT 
3801 TCAAATATGT TTAATAATGA CCAAAGTGCA TTCTGAGTTT TTTCAAGGAA 
3851 TGTAATACTG GAGCTTTAAG AACATACTTA GTTTCTCATG TGAAAACTTA 
3901 GGCTTTGTCT GATGTTTTTC CTTCCTCTAT TGTCTAATGT TGAGGTTGTT 
3951 TTTAGGAATT ATGTTTTATA AACTTTTTCA ATATAAGGTA CATGCCTATA 
4001 CAGAACTTAA CATTTTGCAC AGAATATATC AAATATATTT TGAGAAAAAA 
4051 AGTACGGCAT GAGTTCTGTT AGGAATAAAA GATGAAACTA TTGTATCTCA 
4101 CAAAAAATCT TATTTCAGAA TGGAAATATT TTTGAGAAAA GTAGCTGAGT 
4151 ATACTGGTTT AAGAAAATGC TTGTTTTAGA TTGAGGTTAA CTTAGAGTTG 
4201 GGAGTTGATT TATTAAGTAC AGTATACCTC TCAACAGTTT ATAAATAATA 
425.1 TGTTGAATTA TGTCAGTGTG GGCAGCAGTA GAATACTAAA AGGAAAATGT 
4301 CATGTTAAGC AATTTCAGAA CATTAACTGA ACTATTTTCA AAGCAGAAAA 
4351 ATTGACATTG CTGCCTTTAA GAATACCATG AATGTAAGAA ATTGAAAGAA 
4 401 ATTGTAAAAT ATCACATAAT ATAGAAATGG CAGTTCAAAG AGAATTGTGG 
4 4 51 CAGATGTTGT GTGTGAACTG TTGTTTCTTT GCCACATGTG TTGTATTTGA 
4501 AAGTTTTACA GTAAGTTTAA AATAAAACAT TCTGTGACTG AAAAAAAAAA 
4551 AAAAAAAAAA AAAAAAAAAA AA 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 2 


ORF from 107 bp to 2191 bp; peptide length: 695 

Category; known protein 

Classification: unclassified 

Prosite motifs: CYTOCHROMES (375-381) 


1 MASKVTDAIV WYQKKIGAYD QQIWEKSVEQ REIKGLRNKP KKTAHVKPDL 
51 IDVDLVRGSA FAKAKPESPW TSLTRKGIVR WFFPFFFRW WLQVTSKVIF 
101 FWLLVLYLLQ VAAIVLFCST SSPHSIPLTE VIGPIWLMLL LGTVHCQIVS 
151 TRTPKPPLST GGKRRRKLRK AAHLEVHREG DGSSTTDNTQ EGAVQNHGTS 
201 TSHSVGTVFR DLWHAAFFLS GSKKAKNSID KSTETDNGYV SLDGKKTVKS 
251 GEDGIQNHEP QCETIRPEET AWNTGTLRNG PSKDTQRTIT NVSDEVSSEE 
301 GPETGYSLRR HVDRTSEGVL RNRKSHHYKK HYPNEDAPKS GTSCSSRCSS 
351 SRQDSESARP ESETEDVLWE DLLHCAECHS SCTSETDVEN HQINPCVKKE 
401 YRDDPFHQSH LPWLHSSHPG LEKISAIVWE GNDCKKADMS VLEISGMIMN 
451 RVNSHIPGIG YQIFGNAVSL ILGLTPFVFR LSQATDLEQL TAHSASELYV 
501 lAFGSNEDVI VLSMVIISFV VRVSLVWIFF FLLCVAERTY KQRLLFAKLF 
551 GHLTSARRAR KSEVPHFRLK KVQNIKMWLS LRSYLKRRGP QRSVDVIVSS 
601 AFLLTISWF ICCAQINLYL KMEKKPNKKE ELTLVNNVLK LATKLLKELD 
651 SPFRLYGLTM NPLLYNITQV VILSAVSGVI SDLLGFNLKL WKIKS 

BLASTP hits 
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NO BLASTP hits available 

Alert BLASTP hits for DKF2phtes3_6dl6, frame 2 

PIR:S38170 SRP40 protein - yeast (Saccharomyces cerevisiae), N = 1, 
Score = 100, P = 0.08 

TREMBL:AC004990_1 gene: "WUGSC : H_DJ1185I07 . 2" ; Homo sapiens PAC clone 

DJl 185107 from 7qll.23-q21, complete sequence., N = 2, Score « 2693, P 
« 0 

>TREMBL:AC004990_1 gene: "WUGSC:H_DJ1185I07 .2"; Homo sapiens PAC clone 
DJ1185I07 from 7qll.23-q21, complete sequence. 
Length = 588 

HSPs: 

Score = 2693 (404.1 bits). Expect = O.Oe+00, Sum P(2) = COe+OO 
Identities « 510/515 (99%), Positives = 512/515 (99%) 

Query: 35 GLRNKPKKTAHVKPDLIDVDLVRGSAFAKAKPESPWTSLTRKGIVRVVFFPFFFRWWLQV 94 

GLRNKPKKTAHVKPDLIDVDLVRGSAFAKAKPESPWTSLTRKGIVRVVFFPFFFRWWLQV 
SbjCt: 1 GLRNKPKKTAHVKPDLIDVDLVRGSAFAKAKPESPWTSLTRKGIVRWFFPFFFRWWLQV 60 

Query: 95 TSKVIFFWLLVLYLLQVAAIVLFCSTSSPHSIPLTEVIGPIWLMLLLGTVHCQIVSTRTP 154 

TSKVi FFWLLVL YLLQVAAI VLFCSTSS PHS I PLTEVIGPIWLMLLLGT VHCQI VSTRTP 
Sbjct: 61 TSKVIFFWLLVLYLLQVAAIVLFCSTSSPHSIPLTEVIGPIWLMLLLGTVHCQI VSTRTP 120 

Query: 155 KPPLSTGGKRRRKLRKAAHLEVHREGDGSSTTDNTQEGAVQNHGTSTSHSVGTVFRDLWH 214 

KPPLSTGGKRRRKLRKAAHLEVHREGDGSSTTDNTQEGAVQNHGTSTSHSVGTVFRDLWH 
Sbjct: 121 KPPLSTGGKRRRKLRKAAHLEVHREGDGSSTTDNTQEGAVQNHGTSTSHSVGTVFRDLWH 180 

CJuery: 215 AAFFLSGSKKAKNSIDKSTETDNGYVSLDGKKTVKSGEEXJIQNHEPQCETIRPEETAWNT 274 

AAFFLSGSKKAKNSIDKSTETDNGYVSLDGKKTVKSGEDGIQNHEPQCETIRPEETAWNT 
Sbjct: 181 AAFFLSGSKKAKNSIDKSTETDNGYVSLDGKKTVKSGEDGIQNHEPQCETIRPEETAWNT 240 

Query: 275 GTLRNGPSKDTQRTITNVSDEVSSEEGPETGYSLRRHVDRTSEGVLRNRKSHHYKKHYPN 334 

GTLRNGPSKDTQRTITNVSDEVSSEEGPETGYSLRRHVDRTSEGVLRNRKSHHYKKHYPN 
Sbjct: 241 GTLRNGPSKDTQRTITNVSDEVSSEEGPETGYSLRRHVDRTSEGVLRNRKSHHYKKHYPN 300 

Query: 335 EDAPKSGTSCSSRCSSSRQDSESARPESETEDVLWEDLLHCAECHSSCTSETDVENHQIN 394 

EDAPKSGTSCSSRCSSSRQDSESARPESETEDVLWEDLLHCAECHSSCTSETDVENHQIN 
Sbjct: 301 EDAPKSGTSCSSRCSSSRQDSESARPESETEDVLWEDLLHCAECHSSCTSETDVENHQIN 360 

Query: 395 PCVKKEYRDDPFHQSHLPWLHSSHPGLEKISAIVWEGNEXZKKADMSVLEISGMIMNRVNS 454 

PCVKKEYRDDPFHQSHLPWLHSSHPGLEKISAIVWEGNDCKKADMSVLEISGMIMNRVNS 
Sbjct: 361 PCVKKEYRDDPFHQSHLPWLHSSHPGLEKISAIVWEGNDCKKADMSVLEISGMIMNRVNS 420 

Query: 455 HIPGIGYQIFGNAVSLILGLTPFVFRLSQATDLEQLTAHSASELYVIAFGSNEDVIVLSM 514 

HIPGIGYQIFGNAVSLILGLTPFVFRLSQATDLEQLTAHSASELYVIAFGSNEDVIVLSM 
Sbjct: 421 HIPGIGYQIFGNAVSLILGLTPFVFRLSQATDLEQLTAHSASELYVIAFGSNEDVIVLSM 480 

Query: 515 VIISFWRVSLVWIFFFLLCVAERTYKQRLLFAKL 549 

VIISFVVRVSLVWIFFFLLCVAERTYKQ L+ K+ 
Sbjct: 481 VIISFWRVSLVWIFFFLLCVAERTYKQINLYLKM 515 

Score « 409 (61.4 bits). Expect = 0,0e+00, Sum P(2) = O.Oe+00 
Identities = 92/115 (80%), Positives - 98/115 (85%) 

Query: 595 DVIVSS AFLLTISVVFI CCA QINLYLKMEKKPNKKEELTLVNNVLK 640 

DVIV S +F++ +S+V+I C A QINLYLKMEKKPNKKEELTLVNNVLK 

Sbjct: 474 DVIVLSMVIISFVVRVSLVWIFFFLLCVAERTYKQINLYLKMEKKPNKKEELTLVNNVLK 533 

Query: 641 LATKLLKELDSPFRLYGLTMNPLLYNITQWILSAVSGVISDLLGFNLKLWKIKS 695 

LATKLLKELDSPFRLYGLTMNPLLYNITQWILSAVSGVISDLLGFNLKLWKIKS 
Sbjct: 534 LATKLLKELDSPFRLYGLTMNPLLYNITQVVILSAVSGVISDLLGFNLKLWKIKS 588 

Pedant information for DKF2phtes3_6dl6, frame 2 

Report for DKFZphtes3_6dl6.2 

[LENGTH] 695 
[MW] 78466.68 
Ipl) 9.30 

(HOMOLl TREMBL:AC004990_1 gene: "WUGSC :H_DJ1 185107. 2"; Homo sapiens PAC clone DJl 185107 

from 7qll.23-q21, complete sequence. 0.0 
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[PROSITE) CYTOCHROME C 1 

f^^WJ TRANSMEMBRANE 6 

EKW] LOW^COMPLEXITY 5.32 % 

SEQ WASKVTDAIVWYQKKIGAYDQQIWEKSVEQREIKGLRNKPKKTAHVKPDLIDVDLVRGSA 

MEM 

SEQ FAKAKPESPWTSLTRKGIVRVVFFPFFFRWWLQVTSKVIFFWLLVLYLLQVAAIVLFCST 

DDn XXXXXXXXXXX 

PRO hhhhcccccccccccccceeeeeccl^hhhhhhhhhhhhhhhhAhhi;AAi;i;h^ 

^®WMMMMMMMMMMM^lMM^l^^MM^I^I^tt^^ 

SEQ SSPHSIPLTEVIGPIMLMLLLGTVHCQIVSTRTPKPPLSTGGKRRRKLRKAAHLEVHREG 
SEQ DGSSTTDNTOEGAVQNHGTSTSHSVGTVFRDLWHAAFFLSGSKKAKNSIDKSTETDNGW 


§EG ^!'^^fyf^°;=°^^0«»Ef«=^I«''EETAWNTGTLRNGPSKDTQRTITNVSDEVSSEE 
PRO =="<=«e«cccccccccccccccccccce;;;cccccccccccc«eeeccccccc« 


SEQ GPETGYSLRRHVDRTSEGVLRNRKSHHYKKHVPNEDAPKSGTSCSSRCSSSRQDSESARP 
P^D ccccc;;;eeeccccccch.;Hhhhccccccccccccccc^^^^^ 


SEQ 
SEG 
PRD 
MEM 


ESETEDVLWEDLLHCAECHSSCTSETDVENHQINPCVKKEYRDDPFHQSHLPWLHSSHPG 

cccchhhhhhhhhhhhcccccccccccccccccccce^eeecccccc^ 


LEKISAIVWEGNDCKKADMSVLEISGMIMNRVNSHIPGIGYQIFGNAVSLILGLTPFVFR 


SEQ 
SEG 

em cccceeeeeecccccccc;;eeehhhihhi;hAccccccccccccccccc^^ 

M>ffl<^l^(MHMMM^D«^IMMHM^OIMHHH^IMMM^IM 

SEQ I'SQATDLEQLTAHSASELWIAFGSNEDVIVLSMVIISFWRVSLVWIFFFLLCVAERTy 


• MMMMMMMMHMMMMMMMMMMM»QD4MMM] 


SEQ KQRLLFAKLFGHLTSARRARKSEVPHFRLKKVQNIKMWLSLRSYLKRRGPQRSVDVIVSS 
PR^ hhhhhhhhhhhhhhhhhhhhhh< 


SEQ 
SEG 


ccccccceeeeeehhhhhhhhhhhhccccceeeeeeee 
• ••..«•,.,. .MMMHMMM 

AFLLTISWFICCAQINLYLKMEKKPBKKEELTLVNNVLKlATKLI.KBLOSPFRLyGLTM 

NPLLYNITQVVILSAVSGVISDLLGFHLKLWKIKS 
SEG 

PRD cchhhhheeeeeeeeecchhhhhccceeeeeeccc 
MEM MMMMMMMMMMMMMMMMMMMMMMMMM 


Prosite for DKFZphtes3_6dl6.2 
PS00190 375->381 CYTOCHROME_C PDOC00169 

(No Pfam data available for DKF2phtes3_6dl6.2) 
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DKFZphtes3_72klI 


group: testes derived 

DKrzphtes3_72kll encodes a novel 233 amino acid protein with similarity to S.pombe 
hypothetical repeat-containing protein. 

The novel protein contains 5 leucine zippers and a microbodies C-terminal targeting signal (S- 
K-L) signature. This sequence is responsible for transport of proteins from free polysomes 

.into the microbodies. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 


similarity to S.pombe hypothetical repeat-containing protein 

complete cDNA, complete cds, 6 EST hits {3 from testis derived 
librarys) 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 1134 bp 

Poly A stretch at pos . 1124, poiyadenylation signal at pos. 1088 


1 AACCTTTCAA GTGCCCCCTC CTTTCCTTAA AGTCTTTTAT AGGGGTCCCC 
51 TTCTTGGCCA TCTCCATCCT GTGAGTCAGG ACTGAAAGGG CACAGACAGG 
101 TCACTGCCAG CATTGTTGGG GCAAGCCTGC AAGCACGCAT CACTGGGGAT 
151 CTGACATGAC AATGGCCGCC TGCCCCCTCT GAGGGCTACA GGACTTACCC 
201 CAGTGGGAAG CAGCTAAGCA GGTCTGACCA GCCGACCTGG ACCTGGCCAA 
251 GGGTCCTGTC ATCCCTCATG GCCACCCCGC CATTCCGGCT GATAAGGAAG 
301 ATGTTTTCCT TCAAGGTGAG CAGATGGATG GGGCTTGCCT GCTTCCGGTC 
351 CCTGGCGGCA TCCTCTCCCA GTATTCGCCA GAAGAAACTA ATGCACAAGC 
401 TGCAGGAGGA AAAGGCTTTT CGCGAAGAGA TGAAAATTTT TCGTGAAAAA 
451 ATAGAGGACT TCAGGGAAGA GATGTGGACT TTCCGAGGCA AGATCCATGC 
501 TTTCCGGGGC CAGATCCTGG GTTTTTGGGA AGAGGAGAGA CCTTTCTGGG 
551 AAGAGGAGAA AACCTTCTGG AAAGAGGAAA AATCCTTCTG GGAAATGGAA 
501 AAGTCTTTCA GGGAGGAAGA GAAAACTTTC TGGAAAAAGT ACCGCACTTT 
651 CTGGAAGGAG GATAAGGCCT TCTGGAAAGA GGACAATGCC TTATGGGAAA 
701 GAGACCGGAA CCTTCTTCAG GAGGACAAGG CCCTGTGGGA GGAAGAAAAG 
751 GCCCTGTGGG TAGAGGAAAG AGCCCTCCTT GAGGGGGAGA AAGCCCTGTG 
801 GGAAGATAAA ACGTCCCTCT GGGAGGAAGA GAATGCCCTC TGGGAGGAAG 
851 AGAGGGCCTT CTGGATGGAG AACAATGGCC ACGTTGCCGG AGAGCAGATG 
901 CTCGAAGATG GGCCCCACAA CGCCAACAGA GGGCAGCGCT TGCTGGCCTT 
951 CTCCCGAGGC AGGGCGTAGC CAGCATGCAG GTGCAGGGCC CTGTGGTCCA 
1001 GACTCCCCTG GGTTGGGATT CAAGTCCAGG GTGAGCCCAT GTGCTGGAGA 
1051 AAATACACAC TCATTGGTCT CCTTGCTTTG AAAGATCCAA TAAAGTCCTG 
1101 AGGCAAGGTT TGGAAAACCA ACTTAAAAAA AAAA 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 1 


ORF from 268 bp to 966 bp; peptide length: 233 
Category: similarity to known protein 
Prosite motifs: MICROBODIES_CTER (231-234) 
LEOCINE_ZIPPER (142-164) 
LEUCINE_ZIPPER (149-171) 
LEUCINE_ZIPPER (156-178) 
LEUCINE_ZIPPER (163-185) 
LEUCINE ZIPPER (170-192) 
LEUCINE^ZIPPER (170-192) 
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1 MATPPFRLIR 
51 FREEMKIFRE 
101 WKEEKSFWEN 
151 QEDKALWEEE 
201 ENNGHVAGEO 


KMFSFKVSRW 
KIEDFREEMW 
EKSFREEEKT 
KALWVEERAL 
MLEDGPHNAN 


MGLACFRSLA ASSPSIRQKK LMHKLQEEKA 
TFRGKIHAFR GQILGFWEEE RPFWEEEKTF 
FWKKYRTFWK EDKAFWKEDN ALWERDRNLL 
LEGEKALWED KTSLWEEENA LWEEERAFWM 
RGQRLLAFSR GRA 

BLAST P hits 


Entry SPCC330_4 from database TREMBLNEW* 

gene: "SPCC330. 04c"; product: "hypothetical rGpeat-containino Drotein" 
S.pombe chromosome III cosmid c330. contaxning protein 

Score = 149, P « 1.6e-08, identities - 55/187, positives = 88/187 

Entry A45973 from database PIR: 
trichohyalin - human 

score - 147, P - 3.0e-07, identities - 57/194, positives « 94/194 


Alert BLASTP hits for DKFZphtes3_72kll, frame 1 
No Alert BLASTP hits found 

Pedant information for DKF2phtes3_72kll, frame 1 
Report for DKF2phtes3 72kH.l 


[LENGTH) 

[MW] 

ipl] 

[PROSITE) 

[PROSITE] 

[PROSITEl 

I PROSITE] 

[PROSITE] 

[KW] 

[KW] 


233 

28752.65 
5.70 

LEUCINE_2IPPER 5 

MICROBODIES_CTER 

MYRISTYL 1 

CK2_PH0SPH0_SITE 

PKC_PHOSPHO_SITE 

All_Alpha 

LOW_COMPLEXITY 


3 
4 

15.45 % 


SEQ WATPPFRLIRKMFSFKVSRWMGLACFRSLAASSPSIRQKKLMHKLQEEKAFREEMKIFRE 
PRO «^<=<=ccchhhhhhhhhhhhhhhhhhhhhhhhhcccccchhhhh^ 

SEQ KIED^REEMWTFRGKIHAFRGQILGFWEEERPFWEEEKTFWKEEKSFWEMEKSFREEEKT 

PRD ^hhhhhhhhhhhhhhhcccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhh 

FWKKYRTFWKEDKAFWKEDNALWERDRNLLQEDKALWEEEKALWVEERALLEGEKALWED 

hhhhcccccccccchhhhhhhhhhhhcAhhhhhhhhhhhhhhhhhh^ 

KTSLWEEENALWEEERAFWMENNGHVAGEQMLEDGPHNANRGQRLLAFSRGRA 
. « • xxxxxxxxxxxx •••••«»«..,.,,.., 

ccchhhhhhhhhhhhhhhhhhccccchhhhlihcccccccccchhh^ 


SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 


Prosite for DKFZphtes3 72kll.l 


PS00005 

14->17 

PS00005 

35->38 

PS00005 

71->74 

PS00005 

113->116 

PS00006 

106->110 

PS00006 

113->117 

PS00006 

183->187 

PS00008 

81->87 

PS00342 

231->234 

PS00029 

142->164 

PS00029 

149->171 

PS00029 

156-M78 

PS00029 

163->185 

PS00029 

170->192 


PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC_PH0SPH0"SITE 

CK2_PH0SPH0 SITE 

CK2_PH0SPH0~SITE 

CK2_PH0SPH0'"SITE 

MYRISTYL 

MICROBODIES_CTER 

LEUCINE_ZIPPER 

LEUCINE_ZIPPER 

LEUCINE_2IPPER 

leucine zipper 
leocine"zipper 


PDOC00005 
PDOC000O5 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00299 
PDOC00029 
PDOC00029 
PDOC00029 
PDOC00029 
PDOC00029 


(No Pfam data available for DKFZphtes3 72kll.l) 
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DKFZphtes3_72kl5 


group: cell structure and motility 

DKrzphtes3_72kl6 encodes a novel 188 amino acid protein with strong similarity to Rattus 
norvegicus actin-f ilament binding protein Frabin. 

FGDl-related F-actin-binding protein ( Farbin/FGDl ) is a novel F-actin-binding protein. The 
gene locus fgdl seems to be responsible for faciogenitai dysplasia or Aarskog-Scott syndrome, 
Frabin binds F-actin and shows F-actin-cross-linking activity. Overexpression of frabin in 
Swiss 3T3 cells and C0S7 cells induces cell shape change and c-Jun N-terminal kinase 
activation, as described for FGDl. Because FGDl has been shown to serve as a GDP/GTP exchange 
protein for Cdc42 small G protein, it is likely that frabin is a direct linker between Cdc42 
and the actm cytoskeleton . Cdc42p is an esin yeast, Cdc42p transduces signals to the actin 
cytoskeleton to initiate and maintain polarized growth and to initogen-activated protein 
morphogenesis. In mammalian cells, Cdc42p regulates a variety of actin-dependent events and 
induces the JNK/SAPK protein kinase cascade, which leads to the activation of transcription 
factors within the nucleus. 

The novel protein seems to be the human orthologue of rat frabin. 

The new protein can find application in modulating of cell structure and motility as well as 
modulation of the JNK/SAPK pathway. 


strong similarity to actin- filament binding protein Frabin 

2 EST hits 
Sequenced by DKFZ 
Locus : unknown 

Insert length: 1845 bp 

Poly A stretch at pos. 1835, polyadenylation signal at pos. 1816 

1 GTGATGGAGA GTGCTGTTAT GATAGATGAA TCTAGGAAAG CCTCTTTGGA 
.51 GATGTGATAC CTGAACAGAA CCCCGAATGA TAAGAAGAAA TACCAGTGTT 
101 TTAGGAGAGA TTGTCCTAAG CAGAGAACAG CAGCTGCAAA GACCCCAAGA 
151 CACATACACT TGGTTATTAA GAATGGGAGC AGCAAGGAGT ATGGCAAGAA 
201 .CACAGTGAGT TTTCCCTTGA GTGTGTGAGG AAGCCCTCAG AGTTTGTGAC 
251 TGACTTGTAG AGGTTCTAGT GGAGGGGATC AGAGTGGAAA CAAAGAGACC 
301 AGTTAAAAAG GTATGGCAGC ATGAATAAAA AAGTTTTGAG AGTATTCATT 
351 ATGCCTTCCA AATAAAAAAC TCTTTGGTTC ATAATTTGTT CATAAATTAA 
401 GGACTGGCTA CACTGTACTA TTTAAAAATG TTAAGAAACA TCAATAAGTA 
451 AAAATGTTAG GAAGAGATGA TAAATACGTA AGTATTATAT CTAACTAAGT 
501 CTTTACTAAC TAGTCACATT ATTAAACAGT GCAAGGATCA AGAAAAGTTA 
551 AGCGTTGAAA AATAAATAAA TAAGTTATAA ATAAAATAAA CAGCCCAAGG 
601 AAATGTTCCA GTCCCCATAG GTAGACTCGG GGTCATCTTC TTTATTTAAA 
65 1« TCTTTATTTA AATGTGGATA GCATCCCAAG AGACTTGGGT CTACACTAAG 
701 AATATTCAAA TCCATGTTTC TGAAACCATC AGAGATAGAA AAAAAAAGTA 
751 GCGAATATCC CTTTTCAACT GGAATAAACT TGTCTTAATT CTAGAACTTT 
801 TCCATACCAA TGTTTTCATG CTTCCTTTGT ATTTTATCTT TTAGCTCATT 
851 ATCAAATTAT AGTGATTTGA AGAAAGAGTC TGCTGTGAAC CTAAATGCTC 
901 CTAGAACCCC AGGAAGGCAT GGATTGACAA CCACACCTCA ACAAAAACTC 
951 CTCTCCCAGC ACTTGCCACA GAGGCAGGGA AATGATACAG ATAAGACTCA 
1001 GGGTGCACAG ACTTGTGTGG CCAACGGTGT AATGGCAGCA CAAAACCAGA 
1051 TGGAATGTGA GGAGGAGAAA GCTGCCACTC TTAGCTCAGA TACTTCTATT 
1101 CAAGCTTCTG AACCCTTGCT TGATACGCAC ATAGTGAATG GAGAAAGAGA 
1151 TGAAACTGCC ACAGCTCCTG CATCACCCAC AACAGATAGC TGTGATGGAA 
1201 ATGCTTCTGA CAGTAGCTAC AGGACTCCAG GCATAGGCCC AGTGCTCCCC 
1251 CTAGAAGAAA GAGGGGCAGA AACAGAAACC AAGGTACAAG AGAGGGAAAA 
1301 TGGGGAAAGC CCTCTGGAAC TGGAGCAGCT GGACCAGCAC CATGAGATGA 
1351 AGGTAGAGCA TGAGACTAGC TCATGAGCAG GGAAAACCCT GCCTATTCGA 
1401 TTGTTGTCTT AAAACTCTTT ATTTATTGCA CCCCTGAAAT GTATGAATCA 
1451 GATCACCCAC ACTGGCAGTT AAACGATTTT CAAGCTCTGG CTGCTGATTA 
1501 GCATTTCCCC TATGCTCTAA GCAGATATTT CACTTTTTCT TTTCATGTAG 
1551 TTTCTGTTAA TATCTCTGTT GTAATTTCAG GAGTCAGAAC AGTGTGGAAA 
1601 CTTTAATATA GGAAATCCAC AAATGTATTG TTTTTACATA GAAAGAAAAT 
1651 GTTCCTTGTT GCTCTAGATG TTGGTGCTGT ATCCCTAATA CTTACGGGCC 
1701 AAGCAAGAAG AAATTGTATA ATCTTTGTTG TTCAGAAGTT TCTAATAGAA 
1751 TAAATAGGCC TGTAAGATGA ACTTGCCACT AGTAAATGTT ACTTTTAAGG 
1801 ACATGAATAT GGAAGTATTA AATTATTCAA CAGATAAAAA AAAAA 


BLAST Results 


No BLAST result 
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Medline entries 


98334590: 
in, a 
ging 

and activating c-Jun N-terrainal kinase 


Frabin, a novel FGDl-related actin filament-binding protein capable of 
changing cell shape 


Peptide information for frame 3 

ORF from 810 bp to 1373 bp; peptide length: 188 
Category: similarity to known protein 
Classification: Cell structure/motility 

1 MFSCFLCILS FSSLSNYSDL KKESAVNLNA PRTPGRHGLT TTPQQKLLSQ 
51 HLPQRQGNDT DKTQGAQTCV ANGVMAAQNQ MECEEEKAAT LSSDTSIQAS 
101 EPLLDTHIVN GERDETATAP ASPTTDSCDG NASDSSYRTP GIGPVLPLEE 
151 RGAETETKVQ ERENGESPLE LEQLDQHHEM KVEHETSS 


BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_72kl5, frame 3 

TREMBL:AF038388_1 product: "actin-filament binding protein Frabin" • 
Rattus norvegxcus actin-f ilament binding protein Frabin mRNA, complete 
cds., N = 1, Score = 428, P « 1.8e-39 


>TREMBL:AF038388_1 product: "actin-f ilament binding protein Frabin"; Rattus 
norvegicus actin-f ilament binding protein Frabin mRNA, complete cds. 
Length = 766 


HSPs: 


Score « 428 (64.2 bits). Expect = 1.8e-39, P 1 8e-39 
Identities = 90/174 (51%), Positives = 115/174 (66%) 

Query: 12 SSLSNYSDLKKESAVNLNAPRTPGRHGLTTTPCX3KLLSQHLPQRQGNDTDKTQGAQTCVA 71 

S LS+Y+D++K+S +NLN P+TP +HGLT+T QKL S PQ+Q D+D+ QG C+A 
Sb3Ct: 31 SVLSSYTDVQKDSTMNLNIPQTPRQHGLTSTTPQKLPSHKSPQKQEKDSDQNQGQHGCLA 90 

Query: 72 NGVMAAQNQMECEEEKAATLSSDTSIQASEPLLDTHIVNGERDETATAPASPTTDSCOGN 131 

NGV AAQ+QMECE EK A LS +T Q + D H++NG R+ET T AS T+S D N 
Sb3Ct: 91 NGVAAAQSQMECETEKEAALSPETDTQTAAASPDAHVLNGVRNETTTDSASSVTNSHDEN 150 

Query: 132 ASDSSYRTPGIGPVLPLEERGAETETKVQERENGESPLELEQLDQHHEMKVEHE 185 

V A DSS RT G LP +E E ++QERENG S L LDQHHE+K +E 

SbDCt: 151 ACDSSCRTQGTDLGLPSKEGEPVIEAELQERENGLSTEGLNPLDQHHEVKETNE 204 

Pedant information for DKFZphtes3_72kl5, frame 3 


Report for DKFZphtes3_72kl5 . 3 


r LENGTH) 188 

[MWJ 20388.32 

[pl] 4.62 

[HOMOL] TREMBL:AF038388_1 product: "actin-f ilament binding protein Frabin"; Rattus 

norvegicus actm-f ilament binding protein Frabin mRNA, complete cds. 2e-38 

IKW] AllAlpha 

[KW] SIGNAL_PEPTIDE 16 

IKW] LOW^COMPLEXITY 12.77 % 

SEQ MfSCFLClLSFSSLSNYSDLKKESAVNLNAPRTPGRHGLTTTPQQKLLSQHLPQRQGNDT 

PRD ccchhhhhcccccccccccccccccccccccccccccccccccchhhhhhhcc^^ 

SEQ DKTQGAQTCVANGVMAAQNQMECEEEKAATLSSDTSIQASEPLLDTHIVNGERDETATAP 
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SEG xxxxx 

PRD ccccccceeecchhhhhhhhhhhhhhhhhhhccccceeecccccceeeeecccccccccc 

SEQ ASPTTDSCDGNASDSSYRTPGIGPVLPLEERGAETETKVQERENGESPLELEQLDQHHEM 

SEG xxxxx 

PRD ccccccccccccccccccccccccccccccccchhhhhhhhhcccccchhhhhhhhhhhh 

SEQ KVEHETSS 

SEG 

PRD hhhhhccc 

(No Prosite data available for DKFZphtes3_72kl5. 3) 
(No Pfam data available for DKFZphtes3_72kl5.3) 
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DKFZphtes3_72pl6 


group: intracellular transport and trafficing 

DKFZphtes3_72pl6 encodes a novel 796 amino acid protein with very strong similarity to Mus 
musculus maternal-embryonic 3 (Mem3) gene. 

Mem3 was isolated from a partial subtraction library of mouse unfertilized eggs and 
preimplantation embryos. Its transcript is abundant in the unfertilized egg and also actively 
transcribed from the newly formed zygotic genome. As Mem3, the novel protein is similar to 
yeast VPS (vacuolar protein sorting) 35. The null allele of VPS35 results in yeast in a 
differential defect in the sorting of vacuolar carboxypeptidase Y (CPY), proteinase A (PrA), 
proteinase B (PrB), and alkaline phosphatase (ALP). 

The new protein can find application in modulation the sorting of proteins into different 
compartments . 


strong similarity to mouse MEM3 and yeast VPS35 
Sequenced by DKF2 
Locus: /map==*'16pl3.3" 
Insert length: 2707 bp 

Poly A stretch at pos. 2697, no polyadenylation signal found 


1 CTACGCGCGG GGCGGGTGCT GCTTGCTGCA GGCTCTGGGG AGTCGCCATG 
51 CCTACAACAC AGCAGTCCCC TCAGGATGAG CAGGAAAAGC TCTTGGATGA 
101 AGCCATACAG GCTGTGAAGG TCCAGTCATT CCAAATGAAG AGATGCCTGG 
151 ACAAAAACAA GCTTATGGAT TCTCTAAAAC ATGCTTCTAA TATGCTTGGT 
201 GAACTCCGGA CTTCTATGTT ATCACCiW^G AGTTACTATG AACTTTATAT 
251 GGCCATTTCT GATGAACTGC ACTACTTGGA GGTCTACCTG ACAGATGAGT 
301 TTGCTAAAGG AAGGAAAGTG GCAGATCTCT ACGAACTTGT ACAGTATGCT 
351 GGAAACATTA TCCCAAGGCT TTACCTTTTG ATCACAGTTG GAGTTGTATA 
401 TGTCAAGTCA TTTCCTCAGT CCAGGAAGGA TATTTTGAAA GATTTGGTAG 
451 AAATGTGCCG TGGTGTGCAA CATCCCTTGA GGGGTCTGTT TCTTCGAAAT 
501 TACCTTCTTC AGTGTACCAG AAATATCTTA CCTGATGAAG GAGAGCCAAC 
551 AGATGAAGAA ACAACTGGTG ACATCAGTGA TTCCATGGAT TTTGTACTGC 
601 TCAACTTTGC AGAAATGAAC AAGCTCTGGG TGCGAATGCA GCATCAGGGA 
651 CATAGCCGAG ATAGAGAAAA AAGAGAACGA GAAAGACAAG AACTGAGAAT 
701 TTTAGTGGGA ACAAATTTGG TGCGCCTCAG TCAGTTGGAA GGTGTAAATG 
751 TGGAACGTTA CAAACAGATT GTTTTGACTG GCATATTGGA GCAAGTTGTA 
801 AACTGTAGGG ATGCTTTGGC TCAAGAATAT CTCATGGAGT GTATTATTCA 
851 GGTTTTCCCT GATGAATTTC ACCTCCAGAC TTTGAATCCT TTTCTTCGGG 
901 CCTGTGCTGA GTTACACCAG AATGTAAATG TGAAGAACAT AATCATTGCT 
951 TTAATTGATA GATTAGCTTT ATTTGCTCAC CGTGAAGATG GACCTGGAAT 
1001 CCCAGCGGAT ATTAAACTTT TTGATATATT TTCACAGCAG GTGGCTACAG 
1051 TGATACAGTC TAGACAAGAC ATGCCTTCAG AGGATGTTGT ATCTTTACAA 
1101 GTCTCTCTGA TTAATCTTGC CATGAAATGT TACCCTGATC GTGTGGACTA 
1151 TGTTGATAAA GTTCTAGAAA CAACAGTGGA GATATTCAAT AAGCTCAACC 
1201 TTGAACATAT TGCTACCAGT AGTGCAGTTT CAAAGGAACT CACCAGACTT 
1251 TTGAAAATAC CAGTTGACAC TTACAACAAT ATTTTAACAG TCTTGAAATT 
1301 AAAACATTTT CACCCACTCT TTGAGTACTT TGACTACGAG TCCAGAAAGA 
1351 GCATGAGTTG TTATGTGCTT AGTAATGTTC TGGATTATAA CACAGAAATT 
1401 GTCTCTCAAG ACCAGGTGGA TTCCATAATG AATTTGGTAT CCACGTTGAT 
1451 TCAAGATCAG CCAGATCAAC CTGTAGAAGA CCCTGATCCA GAAGATTTTG 
1501 CTGATGAGCA GAGCCTTGTG GGCCGCTTCA TTCATCTGCT GCGCTCTGAG 
1551 GACCCTGACC AGCAGTACTT GATTTTGAAC ACAGCACGAA AACATTTTGG 
1601 AGCTGGTGGA AATCAGCGGA TTCGCTTCAC ACTGCCACCT TTGGTATTTG 
1651 CAGCTTACCA GCTGGCTTTT CGATATAAAG AGAATTCTAA AGTGGATGAC 
1701 AAATGGGAAA AGAAATGCCA GAAGATTTTT TCATTTGCCC ACCAGACTAT 
1751 CAGTGCTTTG ATCAAAGCAG AGCTGGCAGA ATTGCCCTTA AGACTTTTTC 
1801 TTCAAGGAGC ACTAGCTGCT GGGGAAATTG GTTTTGAAAA TCATGAGACA 
1851 GTCGCATATG AATTCATGTC CCAGGCATTT TCTCTGTATG AAGATGAAAT 
1901 CAGCGATTCC AAAGCACAGC TAGCTGCCAT CACCTTGATC ATTGGCACTT 
1951 TTGAAAGGAT GAAGTGCTTC AGTGAAGAGA ATCATGAACC TCTGAGGACT 
2001 CAGTGTGCCC TTGCTGCATC CAAACTTCTA AAGAAACCTG ATCAGGGCCG 
2051 AGCTGTGAGC ACCT6TGCAC ATCTCTTCTG GTCTGGCAGA AACACGGACA 
2101 AAAATGGGGA GGAGCTTCAC GGAGGCAAGA GGGTAATGGA GTGCCTAAAA 
2151 AAAGCTCTAA AAATAGCAAA TCAGTGCATG GACCCCTCTC TACAAGTGCA 
2201 GCTTTTTATA GAAATTCTGA ACAGATATAT CTATTTTTAT GAAAAGGAAA 
2251 ATGATGCGGT AACAATTCAG GTTTTAAACC AGCTTATCCA AAAGATTCGA 
2301 GAAGACCTCC CGAATCTTGA ATCCAGTGAA GAAACAGAGC AGATTAACAA 
2351 ACATTTTCAT AACACACTGG AGCATTTGCG CTTGCGGCGG GAATCACCAG 
2401 AATCCGAGGG GCCAATTTAT GAAGGTCTCA TCCTTTAAAA AGGAAATAGC 
2451 TCACCATACT CCTTTCCATG TACATCCAGT GAGGGTTTTA TTACGCTAGG 
2501 TTTCCCTTCC ATAGATTGTG CCTTTCAGAA ATGCTGAGGT AGGTTTCCCA 
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2551 TTTCTTACCT GTGATGTGTT TTACCCAGCA CCTCCGGACA CTCACCTTCA 
2601 GGACCTTAAT AAAATTATTC ACTTGGTAAG TGTTCAAGTC TTTCTGATCA 
2651 CCCCAAGTAG CATGACTGAT CTGCAATTTA AAATTCCTGT GATCTGTAAA 
2701 AAAAAAA 


BLAST Results 


Entry AC007225 from database EMBLNEW: 

Homo sapiens chromosome 16 clone 480G7, WORKING DRAFT SEQUENCE, 38 
unordered pieces. 

Score - 1081, P « 2.8e-217, identities = 219/221 
13 exons - 

Entry HS01514 6 from database EMBL: 
human STS WI-8848. 
Score = 2033, P = 2.9e-87, identities = 425/436 


Medline entries 


96327632: 

Genetic mapping and embryonic expression of a novel, maternally 
transcribed gene Mem3. 

97258867: 

Endosome to Golgi retrieval of the vacuolar protein sorting receptor, 
VpslOp, requires the function of the 
VPS29, VPS30, and VPS35 gene products. 

92360909: 

Alternative pathways for the sorting of soluble vacuolar proteins in 
yeast: a vps35 null mutant missorts and 
secretes only a subset of vacuolar hydrolases. 

10198044: 

Distinct Domains within Vps3Sp Mediate the Retrieval of Two Different 
Cargo Proteins from the Yeast 

Prevacuolar/EndosOTtal Compartment 


Peptide information for frame 3 


ORF from 48 bp to 2435 bp; peptide length: 796 
Category: strong similarity to known protein 
Classification: unset 


1 MPTTQQSPQD EQEKLLDEAI QAVKVQSFQM KRCLDKNKLM DSLKHASNML 
51 GELRTSMLSP KSYYELYMAI SDELHYLEVY LTDEFAKGRK VADLYELVQY 
101 AGNIIPRLYL LITVGWYVK SFPQSRKDIL KDLVEMCRGV QHPLRGLFLR 
151 NYLLQCTRNI LPDEGEPTDE ETTGDISDSM DFVLLNFAEM NKLWVRMQHQ 
201 GHSRDREKRE RERQELRILV GTNLVRLSQL EGVNVERYKQ IVLTGILEQV 
251 VNCRDALAQE YLMECIIQVF PDEFHLQTLN PFLRACAELH QNVNVKNIII 
301 ALIDRLALFA HREDGPGIPA DIKLFDIFSQ QVATVIQSRQ DMPSEDVVSL 
351 QVSLINLAMK CYPDRVDYVD KVLETTVEIF NKLNLEHIAT SSAVSKELTR 
401 LLKIPVDTYN NILTVLKLKH FHPLFEYFDY ESRKSMSCYV LSNVLDYNTE 
451 IVSQDQVDSI MNLVSTLIQD QPDQPVEDPD PEDFADEQSL VGRFIHLLRS 
501 EDPDOQYLIL NTARKHFGAG GNQRIRFTLP PLVFAAYQLA FRYKENSKVD 
551 DKWEKKCQKI FSFAHQTISA LIKAELAELP LRLFLQGALA AGEIGFENHE 
601 TVAYEFMSQA FSLYEDEISD SKAQLAAITL IIGTFERMKC FSEENHEPLR 
651 TQCALAASKL LKKPDQGRAV STCAHLFWSG RNTDKNGEEL HGGKRVMECL 
701 KKALKIANQC MDPSLQVQLF lEILNRYIYF YEKENDAVTI QVLNQLIQKI 
751 REDLPNLESS EETEQXNKHF HNTLEHLRLR RESPESEGPI YEGLIL 

BLA5TP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_72pl6, frame 3 

TREMBL:AF024504_3 gene: "A_TM017A05 . 7"; Arabidopsis thaliana BAC 
TM017A05., N - 2, Score - 927, P - 1.9e-'162 
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PIR:S56936 vacuolar protein-sorting protein VPS35 - yeast 
(Saccharomyces cerevisiae), N = 3, Score = 826, P = 1.5e-116 

TREMBL:MM4702 4_1 gene: "Mem3"; product: "MEM3"; Mus musculus 
maternal-embryonic 3 (Mem3) mRNA, complete cds., N = 1, Score = 3376, P 

TREMBL:S42186_1 gene: "VPS35"; product: "Vps35p"; VPS35=vacuolar 
protein sortxng [Saccharomyces cerevi5iae=yeast, Genomic, 3790 nt] N = 
3, Score « 813, P = 4.4e-115 


>TREMBL:MM47024_1 gene: "Mem3"; product: "MEMS"; Mus musculus 
maternal-embryonic 3 (Mem3) mRNA, complete cds. 
Length = 754 

HSPs: 

Score = 3376 (506.5 bits). Expect = O.Oe+00, P ^ O.Oe+00 
Identities •= 666/721 (92%), Positives - 682/721 (94%) 


Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 


78 EVYLTDEFAKGRKVADLYELVQYAGNIIPRLYLLITVGWYVKSFPQSRKDILKDLVEHC 137 

+VYLTDEFAKG ++ADLYELVQY+GNIIPRLYLLITVGVVYVKSPPQSRKDILKDLVEMC 
34 KVYLTDEFAKGERLADLYELVQYSGNIIPRLYLLITVGVVYVKSFPQSRKDILKDLVEMC 93 

138 RGVQHPLRGLFLRNYLLQCTRNILPDEGEPTDEETTGDISDSMDFVLLNFAEMNKLWVRM 197 
RGVQHPLRGLFLRNYLLQCTRNILPDEGEPTDEETTGDISDSMDFVLLNFAEMNKLWVRM 
94 RGVQHPLRGLFLRNYLLQCTRNILPDEGEPTDEETTGDISDSMDFVLLNFAEMNKLWVRM 153 

198 QHQGHSRDREKRERERQELRILVGTNLVRLSQLEG-VNVERYKQIVLTGILEQVVNCRDA 256 

QHQGHSRDREKRERERQELRILVGTNLV L+ + +QIVLTGILEQWNCRDA 
154 QHQGHSRDREKRERERQELRILVGTNLVALTLVSWRCKCGTLQQIVLTGILEQWNCRDA 213 

257 LAQEYLMECIIQVFPDEFHLQTLNPFLRACAELHQNVNVKNIIIALIDRLALFAHREDGP 316 

LAQE MECI IQVFPDEFHLQTLNPFLRACAELHQNVNVKNI I TALI DRLALFAHRE P 
214 LAQEISMECIIQVFPDEFHLQTLNPFLRACAELHQNVNVKNIIIALIDRLALFAHREMEP 273 

317 GIPADIKLFDIFSQQVATVIQSRQDMPSEDVVSLQVSLINLAMKCYPDRVDYVDKVLETT 376 

GIPA++KLrDIFSQQVATVIQSR+DMPSEDVVSLQVSLINLAMKCYPDRVDYVDKVLETT 
274 GIPAELKLFDIFSQQVATVIQSRRDMPSEDVVSLQVSLINLAMKCYPDRVDYVDKVLETT 333 

377 VEIFNKLNLEHIATSSAVSKELTRLLKIPVDTYNNILTVLKLKHFHPLrEYFDYESR— K 434 

VEX FNKLNLEHI ATSSAVSKELTRLLKI PVDTYNNILTVLKLKHFHPLFEYFDYES K 
334 VEIFNKLNLEHIATSSAVSKELTRLLKIPVDTYNNILTVLKLKHFHPLFEYFDYESSPGK 393 

435 SMSCYVLSNVLDYNTEIVSQDQVDSIMNLVSTLIQDOPDQPVEDPDPEDFADEQSLVGRF 494 

SMSCYVLSNVLDYNTEIVSQDQVDSIMNLVSTLIQDQPDQPVEDPDPEDFADEQSLVGRF 
394 SMSCYVLSNVLDYNTEIVSQDQVDSIMNLVSTLIQDQPDQPVEDPDPEDFADEQSLVGRF 453 

495 IHLLRSEDPDQQYLILNTARKHFGAGGNQRIRFTLPPLVFAAYQLAFRYKENSKVDDKWE 554 

IHLLRS+DPDQQYLILNTARKHFGAGGNQRIRFTLPPLVFAAYQLAFRYKENSK + 
454 IHLLRSDDPDQQYLILNTARKHFGAGGNQRIRFTLPPLVFAAYQLAFRYKENSKWMTSGK 513 

555 KKCQKrFSFAHQTISALIKAELAELPLRLFLQGALAAGEIGFENHETVAYEFMSQAFSLY 614 

+ ++ F HOTISALIKAELAELPLRLFLOGALAAGEIGFENHETVAYEFMSQAFSLY 
514 RNARRYFHLPHQTISALIKAELAELPLRLFLQGALAAGEIGFENHETVAYEEMSQAFSLY 573 

615 EDEISDSKAQLAAITLIIGTFERMKCFSEENHEPLRTQCALAASKLLKKPDQGRAVSTCA 674 

EDEISDSKAQLAAITLIIGTFERMKCFSEENHEPLRT+CALAASKLLKKPDQ C 
574 EDEISDSKAQLAAITLIIGTFERMKCFSEENHEPLRTECALAASKLLKKPDQAEREHMCT 633 

675 HLFWSGRNTDKNGEELHGGKRVMECLKKALKIANQCMDPSLQVQLFIEILNRYIYFYEKE 734 

L WSGRNTDKNGEELHGGKRVMECLKKALKIANQCMDPSLQVQLFIEILNRYIYFYEKE 
634 SL-WSGRNTOKNGEELHGGKRVMECLKKALKIANQCMDPSLQVQLFIEILNRYIYFYEKE 692 

735 NDAVTIQVLNQI,IQKIREDLPNLESSEETEQINKHFHNTLEHLRLRRESPESEGPIYEGL 794 

NDAVTIQVLNQLIQKIREDLPNLESSEETEQINKHFHNTLEHLR RRESPESEGPIYEGL 
693 NDAVTIQVLNQLIQKIREDLPNLESSEETEQINKHFHNTLEHLRTRRESPESEGPIYEGL 752 


795 IL 796 
IL 

Sbjct: 753 IL 754 


Pedant information for DKFZphtes3_72pl6, frame 3 
Report for DKFZphtes3_72pl6. 3 

(LENGTH 3 796 


929 


wo 01/12659 


PCT/IBOO/01496 


IMW] 91723.67 

[pD 5.32 

[HOMOLJ TREMBL:MM47024_1 gene: "Mem3"; product: "MEM3"; Mus musculus maternal -embryonic 
3 (Mem3) mRNA, complete cds. 0.0 

(FUNCATJ 30.25 vacuolar and lysosomal organization (S. cerevisiae, yjL154cl le-110 

[FUNCATJ 08.13 vacuolar transport [S. cerevisiae, YJL154c} le-110 

[FUNCAT] 06,04 protein targeting, sorting and translocation [S. cerevisiae, yjL154c) 

le-110 

(FUNCATJ 30.22 endosomal organization (S. cerevisiae, YJL154cl le-110 

(FUNCAT] 08.07 vesicular transport (golgi network, etc.) (S. cerevisiae, yjLl54cl 

le-110 

[FUNCAT] 30.08 organization of golgi (S. cerevisiae, yjL154cl le-110 

(FUNCAT] 09.07 biogenesis of endoplasmatic reticulum [S. cerevisiae, YJLl54c] le-110 

(BLOCKS) „ BL01092Q 

[PIRKW] yeast vacuole le-108 

(PIRKW) membrane protein le-108 

(KWJ TRANSMEMBRANE 1 

(KW] LOW COMPLEXITY 5.40 % 


SEQ MPTTQQSPQDEQEKLLDEAIQAVKVQSFQMKRCLDKHKLMDSLKHASNMLGELRTSMLSP 

SEG 

PRD cccccccccchhhhhhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhlihhcccc 

MEM 

SEQ KSYYELYMAISDELHYLEVYLTDEFAKGRKVADLYELVQYAGNII PRLYLLITVGWYVK 

SEG 

PRD cceeeeehhhhhlihhhhhhhhhhhhhchhhhhhhhhhhhhhcccccccceeeeeceeeee 

MEM MMMMMMMMMMMMMM 

SEQ SFPQSRKDILKDLVEMCRGVQHPLRGLFLRNYLLQCTRNILPDEGEPTDEETTGDISDSM 

SEG xxxxxxxxxxxxxx 

PRD ecccchhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhcccccccccccccccccccch 

MEM MMMMMMMMMM - *. 

SEQ DFVLLNFAEMNKLWVRMQHQGHSRDREKRERERQELRILVGTNLVRLSQLEGVNVERYKQ 

SEG xxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhccchhhhhhhhccchhhhhh 

MEM 


SEQ IVLTGILEQVVNCRDALAQEYLMECIXQVFPDEFHLQTLNPFLRACAELHQNVNVKNIII 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccchhhhhhhhhhhhccccchhhhhh 

MEM 

SEQ ALIDRLALFAHREDGPGIPADIKLFDIFSQQVATVIQSRQDMPSEDVVSLQVSLINLAMK 

SEG 

PRD hhhhhhhhhhhhcccccccccchhhhhhhhhhhhhhhhccccccccchhhhhhhhhhhhh 

MEM 

SEQ CYPDRVDYVDKVLETTVEIFNKLNLEHIATSSAVSKELTRLLKIPVDTYNNILTVLKLKH 

SEG 

PRD cccccccchhhhhhhhhhhhhccchhhhhhccchhhhhhhhhccccccchhhhhhhhhhh 

MEM 

SEQ FHPLFEYFDYESRKSMSCYVLSNVLDYNTEIVSQDQVDSIMNLVSTLIQDQPDQPVEDPD 

SEG xxxxxxxxxxxx 

PRO hhhheeecccchhhhhhhhhhhhccccceee hhhhhhhhhhhhhhhhhhccccccccccc 

MEM 

SEQ PEDFADEQSLVGRFIHLLRSEDPDQQYLILNTARKHFGAGGNQRIRFTLPPLVFAAYQLA 

SEG XXX 

PRD ccccchhhhhhlihhhhhhhccccchhhhhhhhhhhhhcccccceeeeeccchhhhhhhhh 

MEM 

SEQ FRYKENSKVDDKWEKKCQKI FSFAHQTISALIKAELAELPLRLFLQGALAAGEIGFENHE 

SEG 

PRD hhhhhcccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccc 

MEM 

SEQ TVAYEFMSQAFSLYEDEISDSKAQLAAITLIIGTFERMKCFSEENHEPLRTQCALAASKL 

SEG 

PRD eeeeehhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhcccccccchhhhhhhhhhhh 

MEM 

SEQ LKKPDQGRAVSTCAHLFWSGRNTDKNGEELHGGKRVMECLKKALKIANQCMDPSLQVQLF 

SEG 

PRD hhcccceeeeecccccccccccccccccccccchhhhhhhhhhhhhhhhhhchhhhhhhh 

MEM 

SEQ lEILNRYIYFYEKENDAVTIQVLNQLIQKIREDLPNLESSEETEQINKHFHNTLEHLRLR 
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SEG 

PRD hhhhhhhhhhhccccceeeeehhhhhhhhhhhhhhhhhccccchhhhhhhhh^^ 

MEM 

SEQ RESPESEGPIYEGLIL 

SEG 

PRD hhcccccccceeeccc 
MEM 

(No Prosite data available for DKFZphtes3_72pl6 . 3) 
(No Pfam data available for DKF2phtes3_72pl6.3) 
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group: cell structure and motility 

DKFZphtes3_7b22 encodes a novel 443 amino acid protein with weak similarity to paramyosins . 

The novel protein is related to paramyosin, a major structural component of thick filaments 
and invertebrate muscle. Paramyosins are promising antigens for immunization against several 
parasites, such as Schistosoma mansoni. 

The new protein can find application in modulating cell adhesion/motility and membrane/cyto 
skeleton structure and dynamic. 


similarity to paramyosins 

complete cDNA, complete cds, few EST hits 

Sequenced by BMFZ 

Locus: /map* •'3" 

Insert length: 2291 bp 

Poly A stretch at pos. 2241, polyadenylation signal at pos. 2213 


1 GGAAGAAAGG CTAGCGGGCG TTGGCCGTAT GTGGGTGTCT TGAGGCAGTT 
51 TTTCAGTTCT TTCATTTACC AAAGTGACAT GCACCTACTA GGTGCCAGGT 
101 GTTTAGACGT ACATACAACC CTCTGCAAAA TCTTTCAGTG TAGTCCTCTG 
151 TATGAAAAGT TTCCAGCCAA GAATTGCCAC TGCACCTGAG ATAAGGGGGA 
201 TCCTGGCCAT TAAGGAAACC TTGCCTTCGA AACTGAGCCG TGAGGAACTA 
251 TACAAAATGG GAAATTGGGA CAAATCCCAG TGGCTCATGA CACTAAGAAG 
301 TAAAATTACG AACTCACTGA GCTGGAAGTC ATTCAACGGG AATTGAATAG 
351 GTAACTGCAC TTTTGTGAGA TTATAAATAT ACCACGGAGG GTAACGAAGC 
401 TACAGAAGAA TGGAAGAAGA CAGCCTGGAA GACTCAAACC TTCCTCCAAA 
451 AGTTTGGCAT TCTGAGATGA CGGTGTCAGT GACAGGCGAA CCACCTAGTA 
501 CCGTAGAAGA AGAAGGAATA CCTAAAGAAA CAGACATAGA AATCATCCCA 
551 GAAATCCCGG AAACTCTAGA GCCACTGTCC CTTCCAGATG TGCTGAGGAT 
601 CTCGGCAGTT CTGGAGGACA CCACAGACCA GCTCTCTATT CTGAACTACA 
651 TCATGCCCGT TCAGTACGAA GGGAGACAGA GCATCTGCGT GAAAAGCAGA 
701 Gi\AATGAATC TAGAAGGAAC GAATCTAGAC AAACTTCCAA TGGCCTCAAC 
751 AATCACAAAA ATACCCAGTC CGTTAATAAC TGAGGAAGGA CCCAACTTGC 
801 CAGAAATCAG ACACAGAGGC CGGTTCGCTG TGGAGTTTAA CAAAATGCAG 
851 GATCTTGTCT TCAAAAAACC TACAAGGCAG ACCATCATGA CTACGGAGAC 
901 ACTGAAGAAA ATTCAGATTG ATAGGCAGTT TTTCAGCGAT GTGATTGCAG 
951 ATACCATTAA GGAGTTGCAA GATTCGGCCA CTTACAACAG TCTCCTGCAA 
1001 GCTTTGAGCA AAGAGAGGGA AAACAAAATG CATTTCTATG ACATCATTGC 
1051 CAGGGAGGAA AAAGGAAGAA AACAGATAAT ATCACTTCAA AAACAGCTAA 
1101 TTAATGTCAA AAAGGAATGG CAATTTGAAG TCCAGAGTCA GAATGAGTAT 
1151 ATTGCTAACC TCAAGGACCA ACTGCAAGAG ATGAAGGCAA AATCCAACTT 
1201 GGAGAATCGC TACATGAAAA CCAATACCGA GCTGCAGATT GCCCAGACCC 
1251 AGAAAAAGTG TAACAGAACA GAGGAACTCT TGGTGGAAGA GATTGAGAAA 
1301 CTCAGGATGA AAACCGAAGA AGAGGCCCGG ACTCATACAG AGATTGAAAT 
1351 GTTCCTTAGA AAGGAGCAGC AGAAACTTGA GGAGAGGCTG GAGTTCTGGA 
1401 TGGAGAAATA CGATAAGGAC ACAGAAATGA AACAGAATGA ACTAAATGCT 
1451 CTCAAAGCCA CAAAGGCCAG TGACTTAGCA CACCTTCAAG ACCTGGCAAA 
1501 GATGATAAGA GAGTATGAAC AGGTCATCAT TGAAGATCGT ATAGAAAAGG 
IS 51 AGAGGAGCAA GAAGAAGGTA AAACAGGATC TCTTGGAATT AAAGAGCGTT 
1601 ATAAAGCTCC AGGCCTGGTG GCGAGGCACT ATGATACGGA GAGAAATTGG 
1651 TGGTTTCAAG ATGCCTAAAG ACAAAGTTGA TAGCAAGGAT TCAAAAGGCA 
1701 AAGGTAAAGG CAAGGATAAG AGGAGAGGCA AGAAGAAGTG ACCAAGTTCT 
1751 CTTTTGTGTT TTCTGCTGGT ATTCTGGAGG TGGGAAGGAC TTGGAGAGTT 
1801 AAGAAACACC TGGTACCTCA AAGATGACTC ATCTACAGGT TGTTTCCTAT 
1851 TGAGACTTTC CCAGGGAAGC CTGATTTCAC TTTGCCTGTT AATTTCACTC 
1901 TGCCTGTTAG GTGGGTTTTC AAACCCTGAT TTAGGATTAC ACCATTGACT 
1951 TAGGGCTTCC TCATACCTTG CTGGGAAGAA GTTTCTAGTA GTCCTGTGAA 
2001 GATTCATTCT TCTTGCTCTT TCTCAGCAGA ACAAAGGAGT TCACTGGCTT 
2051 AGCTACAGTG ACGCATTGAA ACTTGAGTAA TTCCTGTAAT GTCAGATTTT 
2101 GATTTTACCC AATTTGTCTG TAGTGAAAAA ACTCTTATGA GCAAAAGTAT 
2151 TCAGTAGGAA TTACAATATG ATGTTATTAG CT6TCCAGCA TAATATATAC 
2201 ACAGCAAAGT TTTAATAAAT GTTGGTTCCT GCCTGCCTTT TAAAAAAAAA 
2251 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA A 


BLAST Results 


Entry G36731 from database EMBL: 
SHGC-52923 Human Homo sapiens STS cDNA. 
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Medline entries 

NO Medline entry 

Peptide information for frame 2 


ORF from 410 bp to 1738 bp; peptide length: 443 
Category: similarity to known protein 


1 MEEDSLEDSN LPPKVWHSEM TVSVTGEPPS TVEEEGIPKE TDIEIIPEIP 
51 ETLEPLSLPD VLRISAVLED TTDQLSILNY IMPVQYEGRQ SICVKSREMN 
101 LEGTNLDKLP MASTITKIPS PLZTEEGPNL PEIRHRGRFA VEFNKMQDLV 
151 FKKPTRQTIM TTETLKKIQI DRQFFSDVIA DTIKELQDSA TYNSLLQALS 
201 KEREN KMH FY DIIAREEKGR KQIISLQKQL INVKKEWQFE VQSQNEYIAN 
251 LKDQLQEMKA KSNLENRYMK TNTELQIAQT QKKCNRTEEL LVEEIEKLRM 
301 KTEEEARTHT EIEMFLRKEQ QKLEERLEFW MEKYDKDTEM KQNELNALKA 
351 TKASDLAHLQ DLAKMIREYE QVIIEDRIEK ERSKKKVKQD LLELKSVIKL 
401 QAWWRGTMIR REIGGFKMPK DICVDSKDSKG KGKGKDKRRG KKK 

BLAST? hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_7b22, frame 2 

SWISSPROT:MYSP_BRUMA PARAMYOSIN. , N = 1, Score = 158, P = 5.8e-08 

PIR:A44972 paramyosin - nematode (Dirofilaria immitis) (fragment), N = 
1, Score = 157, p = 7.1e-08 

SWISS PROT:MYSP_ONCVO PARAMYOSIN., N - 1, Score - 157, P - 7.4e-08 

PIR:S52537 emm L 15 protein - Streptococcus pyogenes, N « 1, Score = 
151, P = 8.6e-08 

>SWISSPROT :MYSP_BRUMA PARAMYOSIN . 
Length » 880 

HSPs : 

Score - 158 (23.7 bits). Expect * 5,8e-08, P » 5.8e-08 
Identities = 66/259 (25%), Positives = 125/259 (48%) 

NKMQDLVFKKPTRQTIMTTETLKKIQIDRQFFSDVIADTIKELQDSATYNSLLQALSK 201 

K + L K R T E K++ + +D +A + LQ A N LL+ + 
KKDKHLAEKAAERFEAQTVELSNKVEDLNRHVND- LAQQRQRLQ- - AENNDLLKE I HD 225 

ENKMHF-YDIIAREEKGRKQIISLQKQLINVKKEWQFEVQSQNEYIANLKDQLQE 257 

+N H Y + + E+ R+++ +++ ++ + +VQ + + + D+ E 


A++ E++ NTE I Q + K + L EE+E LR K +++A +IE+ L 


RKEQQ- -KLEERLEFWMEKYDKDTEMKQNELNALKATKASDLAHLQDLAKMIREYEOVI I 374 

+K Q K + RL+ +E DEQN+L+K +LK + E+ I 

QKI S QLEKAKS RLQSE VEVLI VDLEKAQNTI AX LBRAK EQLEKTVNELKVRI D 3 93 


Query: 

142 

Sbjct: 

169 

Query: 

202 

Sbjct: 

226 

Query: 

258 

Sbjct: 

283 

Query: 

317 

Sbjct: 

341 

Query: 

375 

Sbjct: 

394 

Score = 

118 

Identities = 

Query: 

181 

Sbjct: 

218 


E +E E ++++ + L EL+ + L 


.3e-03, P = 1.3e-03 
s « 108/231 (46%) 


D +KE+ D LQ L+++ E + RE + Q+ +Q +L +V+ 
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Query: 236 EWQFE— VQSQNEY-IANLKDQLQEMKAKSNLENRYMKTNTE-LQIAQTQKKCNRTEELL 291 
E +++ E+ +A ++ + K+K + E E L+ QK+ E++ 

Sbjct: 278 ALDEESAARAEAEHKLALANTEITQWKSKFDAEVALHHEEVEDLRKKMLQKQAEYEEQIE 337 

Query: 292 VEEIEKLRWKTEEEARTHTEIEMF LRKEQQKLE — ERLEFWMEKYDKDTEMKQNELN 34 5 

+ ++K+ + ++R +E+E+ L K Q + ER + +EK + +++ +Eh 
Sbjct: 338 IM-LQKISQLEKAKSRLQSEVEVLIVDLEKAQNTIAILERAKEQLEKTVNELKVRIDELT 396 

Query: 347 A-LKATKASDLAHLQDLAKMIREYEQVIIEDRIEKERSKKKVKQDLLELKSVI 398 

L+A + A L +L K+ YE+ + E + R KK++ DL E K + 
Sbjct: 397 VELEAAQREARAALAELQKLKNLYEKAV-EQKEALARENKKLQDDLHEAKEAL 448 

Score = X07 (16.1 bits). Expect = 2.1e-02, P = 2,le-02 
Identities = 49/279 (17%), Positives = 124/279 (44%) 

Query: 123 ITEEGPNLPEIRHRGRFAV-EFNKMODLVFKKPTRQTIMTTETLKKIQIDRQFFSDVIAD 181 

IE L + R A+ E K+++L K ++ + E KK+Q D + +AD 

Sbjct: 392 IDELTVELEAAQREARAALAELQKLKNLYEKAVEQKEALAREN-KKLQDDLHEAKEALAD 450 

Query: 182 TIKELQDSATYNSLLQALSKERENKMHFYDIIAREEKGRKQ— IISLQKOLINVKKEWQF 239 

++L + N+L +E+ + + R+ + RQ + LQ+ I +++ Q 
Sbjct: 451 ANRKLHELDLENARLAGEIRELQTALKESEAARRDAENRAQRALAELQQLRIEMERRLQE 510 

Query: 240 EVQSQNEYIANLKDQLQEMKAKSNLENRYMKTNTELQIAQTQKKCNRTE-ELLVEEIEKL 298 

+ + N + + +++A L+ + E+ + + + EE-I-V+ + + 

Sbjct: 511 KEEEMEALRKNMQFEIDRLTAA— LADAEARMKAEISRLKKKYQAEIAELEMTVDNLNRA 568 

Query: 299 RMKTEEEARTHTEIEMFLRKEQQKLEERLEFWMEKYDKDTEMKQNELNALKATKASDLAH 358 

++ ++ + +E L+ + + +L+ +++Y + Q +++AL A + + 

Sbjct: 569 NIEAQKTIKKQSEQLKILQASLEDTQRQLQQTLDQY ALAQRKVSALSA-ELEECKV 623 

Query: 359 LQDLAKMIREYEQVIIEDRIEKERSKKKVKQDLLELKSVIKLQ 401 

DA R+ ++ +E+ + V +L +K+ ++ + 

Sbjct: 624 ALDNAIRARKQAEIDLEEANGRITDLVSVNNNLTAIKNKLETE 666 


Pedant information for DKFZphtes3_7b22, frame 2 


Report for DKFZphtes3_7b22 . 2 


[LENGTH] 
[MWl 
Cpll 
[HOMOL] 
[FUNCATJ 
[FUNCAT] 
7e-07 
{FUNCAT) 
jannaschii, 
(FUNCAT) 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 


443 

51917. 95 
6. 18 

PIR:S28589 trichohyalin - rabbit 2e-08 

30.03 organization of cytoplasm [S. cerevisiae, 

08.07 vesicular transport (golgi network, etc.) 


YDL058W] 7e-07 
[S. cerevisiae, YDL058w} 


1 genome replication, transcription, recombination and repair [M. 
MJ1322} 5e-06 

03.22 cell cycle control and mitosis [S. cerevisiae, YPR141cJ le-05 
03.13 meiosis [S. cerevisiae, YPR141c) le-05 
11.01 stress response [S. cerevisiae, YPR141cl le-05 

03.07 pheromone response, raating-type determination, sex-specific proteins 


[FUNCAT] 
(FUNCAT) 
(FUNCAT] 
(FUNCATJ 
(FUNCAT) 
(FUNCAT) 
repair) 
[FUNCAT] 
[FUNCAT] 
(FUNCATJ 
(FUNCAT) 
le-04 
(FUNCAT] 
(FUNCAT) 
(FUNCATJ 
(FUNCAT) 


[S. cerevisiae, YPR141c) le-05 

08.22 cytoskeleton-dependent transport [S. cerevisiae, YPR141c] le-05 

09.10 nuclear biogenesis (S. cerevisiae, YPR141c3 le-05 

30-05 organization of centrosome [S. cerevisiae, YPR141c] le-05 

06.10 assembly of protein complexes (S. cerevisiae, YPR141ci le-05 
99 unclassified proteins [S. cerevisiae, YOR216c] 3e-05 

11.04 dna repair {direct repair, base excision repair and nucleotide excision 
(S. cerevisiae, YKR095wl 6e-05 

30.10 nuclear organization [S. cerevisiae, YKR095w] 6e-05 
30.02 organization of plasma membrane (S, cerevisiae, YEROOScJ le-04 

08.16 extracellular transport [S. cerevisiae, YEROOSc] le-04 

03.04 budding, cell polarity and filament formation [S. cerevisiae, YER008c] 


30,04 organization of cytoskeleton (S. cerevisiae, YOR356w] 2e-04 
08.01 nuclear transport [S. cerevisiae, YDL207w] 4e-04 

04.07 rna transport (S. cerevisiae, YDL207w) 4e-04 
06.07 protein modification (glycolsylation, acylation, myristylation, 
palmitylation, farnesylation and processing) (S. cerevisiae, YKL20Xc] 5e-04 

(EC) 3.6.1.32 Myosin ATPase 3e-08 

[PIRKW] phosphotransferase 6e-06 

(PIRKW) citrulline 8e-06 

( PIRKW] tandem repeat le-07 

(PIRKW) heart 6e-06 

(PIRKW] polymorphism 4e-06 

(PIRKWJ serine/threonine-specific protein Icinase 6e-06 

(PIRKW) DNA binding 8e-08 
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(PIRKW] 

(PIRKW] 

[PIRKW] 

[PIRKW) 

I PIRKW) 

(PIRKW J 

[PIRKW] 

t PIRKW} 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW J 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

(PIRKW J 

[PIRKW] 

(SUPFAMJ 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

(SUPFAM) 

[SUPFAM] 

(PROSITE] 

[PROSITE] 

(PROSITE) 

(PROSITE) 

[PROSITE) 

[PROSITE] 

[KW] 

(KW) 


muscle contraction le-07 
actin binding 3e-0B 
ATP 3e-08 

thick filament le~07 
pliosphoprotein 3e-08 
glycoprotein 4e-06 
skeletal muscle le-07 
calcium binding 8e-06 
alternative splicing 3e-08 
coiled coil 3e-08 
P-loop 3e-08 
heptdd repeat 4e-06 
methylated amino acid 3e-08 
basement membrane 4e-*06 
cardiac muscle 6e-06 
extracellular matrix 4e-06 
hydrolase 3e-08 
membrane protein 4e-06 
EF hand 8e-06 
cytos)celeton 8e-06 
hair 8e~06 

myosin heavy chain 3e-08 

unassigned Ser/Thr or Tyr-specific protein Jcinases 6e- 

calmodulin repeat homology 8e-06 

myosin motor domain homology 3e-0B 

trichohyalin 8e-06 

protein Icinase homology 6e-06 

AMIDATION 2 

1 

12 
2 
4 
1 


CAMP_PHOSPHO_SITE 
CK2_PHOSPH0_SITE 
TYR_PHOSPHO_SITE 
PKC_PHOS PHO_S ITE 
ASN_GLYCOSYLATION 
AllAlpha 
LOW COMPLEXITY 


10,61 % 


SEQ 
SEG 
PRO 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 


MEEDSLEDSNLPPKVWHSEMTVSVTGEPPSTVEEEGIPKETDIEIIPEIPETLEPLSLPD 

xxxxxxxxxxxxxxxxxxxxxxx . 

cccccccccccccccccceeeeeccccccceeeeecccccceeeeeeccccccccccccc 

VLRI SAVLEDTTDQLSILNYIMPVQYEGRQSICVKSREMNLEGTNLDKLPMASTITKI PS 

chhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhlihhhhhhhhhhhhhhlihhli 

PLITEEGPNLPEIRHRGRFAVEFNKMQDLVFKKPTRQTIMTTETLKKIQIDRQFFSDVIA 

hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhlih 

DTI KELQDS AT YN3 LLQALSKERENKMH FYDI I AREEKGRKQI I S LQKQL I NVKKEWQFE 

hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

VQSQNEYIANLKDQLQEMKAKSNLENRYMKTNTELQIAQTQKKCNRTEBLLVEEIEKLRM 

hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhliltihh^ 

KTEEEARTHTEIEMFLRKEQQKLEEiOiEFWMEKYDKDTEMKQNELNALKATKASDLAHLQ 

hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhlihhhhhhhhhhhhhlihh 

DLAKMIREYEQVIIEDRIEKERSKKKVKQDLLELKSVIKLQAWWRGTMIRREIGGFKMPK 

hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhlihhhhhhhhcccccc 

DKVDSKDSKGKGKGKDKRRGKKK 
xxxxxxxxxxxxxxxxxxxxxxx 
ccccccccccccccccccccccc 


Prosite for DKFZphtesB 7b22.2 


PSOOOOl 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 


285->289 
152->156 
164->167 
182->185 
280->283 
383->386 
5->9 
30->34 


asn_glycosylation 
camp_phospho_site 
pkc phospho_site 
pkc2phospho_site 
pkc_phospho__site 

PKC_PHOSPHO_SITE 
CK2 PH0SPH0_SITE 
CK2 PHOSPHO SITE 


PDOCOOOOl 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
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IT VUU V 

41->45 

fnUofnU bllh. 

PDOC00006 

PS00006 

57->61 

CK2 PHOSPHO SITE 

t Lf\J^ KJiJ\J\J\J 

PS00006 

104- 

->108 

CK2^PH0SPH0~SITE 

PDOC00006 

PS00006 

182- 

->186 

CK2_PH0SPHO SITE 

PDOC00006 

PS00006 

243- 

->247 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

262- 

->266 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

271- 

->275 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

302- 

->306 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

308- 

■>312 

CK2 PHOSPHO SITE 

PDOC00006 

PS00006 

310->314 

CK2 PHOSPHO SITE 

PDOC00006 

PS00007 

261- 

■>269 

TYR PH0SPH0"SITE 

PDOC00007 

PS00007 

184- 

■>193 

TYR PHOSPHO SITE 

PDOC00007 

PS00009 

218- 

■>222 

AMI DAT I ON 

PDOC00009 

PS00009 

439- 

>443 

AMIDATION 

PDOC00009 


(NO Pfam data available for DKrzphtes3_7b22.2) 
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DKFZphtes3_^7dl7 


group: testes derived 

DKFZphtes3_7dl7 encodes a novel 633 amino acid protein with weak similarity to human KIAA0454. 
Pfam predicts a TNFR/NGFR cysteine-rich region. 

No informative BLAST results; No predictive prosite or SCOP motife. 

The new protein can find application in studying the expression profile of testis-soecif ic 
genes. 

similarity to KIAA0454 

complete cDNA, complete cds, EST hits 

Sequenced by BMFZ 

Locus: unknown 

Insert length: 3608 bp 

Poly A stretch at pos . 3587, polyadenylation signal at pos. 3570 

1 GGGAAGTTAC GGCGAAGTCC ACCCAGCGTT TCTCAGGCAA TCTGAAGGCA 
51 AATCCTGTTT AGACCCAGGC GAAGGTTCCT GGTGACCCAG GCTCTCACCA 

101 GCCAATTGTC CCTTGCCGTC CTCCTGAGGG TATCTGGAGC TTCAGTGCTG 

151 TGTGCTCTTG GCCTCCACAC TGGGGATGCC ACTGACTCCC ACTGTCCAGG 

201 GCTTCCAGTG GACTCTCCGA GGCCCTGATG TAGAAACTTC CCCATTCGGT 

251 GCACCAAGAG CAGCCTCACA TGGTGTGGGC CGACATCAAG AGCTGCGAGA 

301 TCCAACAGTC CCTGGCCCCA CCTCTTCTGC CACAAACGTC AGCATGGTGG 

351 TATCTGCCGG CCCTTGGTCC GGTGAGAAGG CAGAGATGAA CATTCTAGAA 

401 ATCAACAAGA AATCGCGCCC CCAGCTGGCA GAGAACAAAC AGCAGTTCAG 

451 AAACCTCAAA CAGAAATGTC TTGTAACTCA AGTGGCCTAC TTCCTGGCCA 

501 ACCGGCAAAA TAATTACGAC TATGAAGACT GCAAAGACCT CATAAAATCT 

551 ATGCTGAGGG ATGAGCGGCT GCTCACAGAA GAGAAGCTTG CAGAGGAGCT 

601 CGGGCAAGCT GAGGAGCTCA GGCAATATAA AGTCCTGGTT CACTCTCAGG 

651 AACGAGAGCT GACCCAGTTA AGGGAGAAGT TACAGGAAGG GAGAGATGCC 

701 TCCCGCTCAT TGAATCAGCA TCTCCAGGCC CTCCTCACTC CGGATGAGCC 

751 GGACAACTCC CAGGGACGGG ACCTCCGAGA ACAGCTGGCT GAGGGATGTA 

801 GGCTGGCACA GCACCTCGTC CAAAAGCTCA GCCCAGAAAA TGATGACGAT 

851 GAGGATGAAG ATGTTAAAGT TGAGGAGGCT GAGAAAGTAC AGGAATTATA 

901 TGCCCCCAGG GAGGTGCAGA AGGCTGAAGA AAAGGAAGTC CCTGAGGACT 

951 CACTGGAGGA GTGTGCCATC ACTTGTTCAA ATAGCCACCA CCCTTGTGAG 
1001 TCCAACCAGC CTTACGGGAA CACCAGAATC ACATTTGAGG AAGACCAAGT 
1051 CGACTCAACT CTCATTGACT CATCCTCTCA TGATGAATGG TTGGATGCTG 
1101 TATGCATTAT CCCAGAAAAT GAAAGTGATC ATGAGCAAGA GGAAGAAAAA 
1151 GGGCCAGTGT CTCCCAGGAA TCTGCAGGAG TCTGAAGAGG AGGAAGCCCC 
1201 CCAGGAGTCC TGGGATGAAG GTGATTGGAC TCTCTCAATT CCTCCTGACA 
1251 TGTCTGCCTC ATACCAGTCT GACAGGAGCA CCTTTCACTC AGTAGAGGAA 
1301 CAGCAAGTCG GCTTGGCTCT TGACATAGGC AGACATTGGT GTGATCAAGT 
1351 GAAAAAGGAG GACCAAGAGG CCACAAGTCC CAGGCTCAGC AGGGAGCTGC 
1401 TGGATGAGAA AGAGCCTGAA GTCTTGCAGG ACTCACTGGA TAGATTTTAT 
1451 TCAACTCCTT TTGAGTACCT GGAACTGCCT GACTTATGCC AGCCCTACAG 
1501 AAGTGACTTT TACTCATTGC AGGAACAACA CCTTGGCTTG GCTCTTGACT 
1551 TGGACAGAAT GAAAAAGGAC CAAGAAGAGG AAGAAGACCA AGGCCCACCA 
1601 TGCCCCAGGC TCAGCAGAGA GCTGCCGGAG GTAGTAGAGC CTGAGGACTT 
1651 GCAGGACTCA CTGGATAGAT GGTATTCGAC TCCTTTCAGT TATCCAGAAC 
1701 TGCCTGATTC ATGCCAGCCC TACGGAAGTT GCTTTTACTC ATTGGAGGAA 
1751 GAACACGTTG GCTTTTCTCT TGACGTGGAT GAAATTGAAA AGTACCAAGA 
1801 AGGGGAAGAA GATCAAAAGC CACCATGCCC CAGGCTCAAC GAGGTGCTGA 
1851 TGGAAGCAGA AGAGCCTGAA GTCTTGCAGG ACTCACTGGA TAGATGTTAT 
1901 TCGACTACTT CAACTTACTT TCAACTACAT GCCTCATTCC AGCAGTACAG 
1951 AAGTGCCTTT TACTCATTTG AGGAACAGGA CGTCAGCTTG GCCCTTGACG 
2001 TGGACAATAG GTTTTTTACT TTGACAGTGA TAAGGCACCA CCTGGCCTTC 
2051 CAGATGGGAG TCATATTCCC ACACTAAGCA GCCCTTACTA AGCTGAGAGA 
2101 TGTCATTGCT GCAGGCAGGA CCTATAGGCA CATGTAGGTT TGAATGAAAC 
2151 TGTAGTTCCC TTTGGAAGCC CAGTCATAGG ATGGGAAAGT GGGCATGGCT 
2201 CTATTCCTAT TCTCAGACCA TGCCAGTGGC CACCTGTGCT CAGTCTGAAG 
2251 ACGTTGGACC CAAGTTAGGT GTGACACGTT CACACGACTA TGTAGCACAT 
2301 GCCGGGAGTG ATCTGCCAGA CATTCTAATT TGAACCAGAT ATCTCTGGGT 
2351 AGCTACAAAG TTCCTCAGGG GTTTCATTTT GCAGGCATGT CTCTGAGCTT 
2401 CTATACCTGC TCAAGGTCAG TGTCATCTTT GTGTTTAGCT CATCCAAAGG 
2451 TGTTACCCTG GTTTCATTGA ACCTAACCCC ATTCTTTGTA TCTTCAGTGT 
2501 TGGTTTGTTT TAGCTGATCC ATCTGTAACA CAGGAGGGAT CCTTGGCTGA 
2551 GGATTGTATT TCAGAACCAC TGACTGCTCT TGACAGTTGT TAACCCACTA 
2601 GGCTCCTTTG AGTAGAGAAG CCATAGTCCT TCAGCCTCCA ATTGATATCA 
2651 ATACTTAGGA AGACCACAGC TAGACGGACA AACAGCATTG GGAGGCCTTA 
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2701 GTCCTGCTCC TTTCAATTCC ATCCTGTAAA GAACAGGAGT CAGGAGCCGC 
2751 TGGCAAGAGA CAGCATGTCA CCTGGGACTC TGCCAGTGCA GAATATGAAC 
2801 AATGCCATGT TCTTGCAGAA AATGCTTAGC CTGAGTTTCA TAGGAGGTAA 
2851 TCACCAGACA ACTGCAGAAT GTAGAACACT GAGCAGGACA ACTGACCTGT 
2901 CTCCTTCACA CAGTCCACGT CACCACGAAT CACACAACAA AAAGGAGGAG 
2951 AGATATTTTG GGTTCAGAAG AAGTAAATGA TAATGTAGCT ACATTTCTTT 
3001 AGTTATTTTG AACCCCAAAT ATTTCCTCAT CTTTTTGTTG TTGTCATTGA 
3051 TTTTGGTGAC ATGGACTTGT TTGTAGAGGA CAGGTCAGCT GTCTGGCTCA 
3101 ATGGTCTACA TTCTGAAGTT GTCTGAAAAT GTCTTCATGA TTAAATTCAG 
3151 CCTAAACGTT TCATCAAGAA CACTACAGAG TCGATACTGT GAGTTTCCAA 
3201 CCTCAGCCCA TCTGTGGGCA GAGAAGGTCT AGTTTGTCCA TCAGCATTAT 
3251 CATGATATCA GGACTGGTTA CTTGGTTAAG GAGGGGTCTA GGAGATCTGT 
3301 CCCTTTTAGA GACACCTTAC TTATGATGAA GTATTTGGGA GAGTGGTTTT 
3351 TCAAAGTAGA AATGTCCTGT ATTCCAGTGA TCATCCTCTA AACGTTTTAT 
3401 CATTTATTAA TCATCCCTGC CTGTGTCTAT TATTATATTC ATATCTCTAC 
3451 GCTGGAAATT TGCTGCCTCA ATGTTTACTG TGCCTTTGTT TTTGCTAGTG 
3501 TGTGTTGTTG AAAAAAAAAC ATTCTCTGCC TGAGTTTTAA TTTTTGTCCA 
3551 AAGTTATTTT AATCTATACA ATTAAAAACT TTTGCCTATC AAAAAAAAAA 
3601 AAAAAAAA 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 2 


ORF from 176 bp to 2074 bp; peptide length: 633 
Category: similarity to known protein 


1 MPLTPTVQGF QWTLRGPDVE TSPFGAPRAA SHGVGRHQEL RDPTVPGPTS 
51 SATNVSMVVS AGPWSGEKAE MNILEINKKS RPQLAENKQQ FRNLKQKCLV 
101 TQVAYFLANR QNNYDYEDCK DLIKSMLRDE RLLTEEKLAE ELGQAEELRQ 
151 YKVLVHSQER ELTQLREKLQ EGRDASRSLN QHLQALLTPD EPDNSQGRDL 
201 REQLAEGCRL AQHLVQKLSP ENDDDEDEDV KVEEAEKVQE LYAPREVQKA 
251 EEKEVPEDSL EECAITCSNS HHPCESNQPY GNTRITFEED QVDSTLIDSS 
301 SHDEWLDAVC IIPENESDHE QEEEKGPVSP RNLQESEEEE APQESWDEGD 
351 WTLSIPPDMS ASYQSDRSTF HSVEEQQVGL ALDIGRHWCD QVKKEDQEAT 
401 SPRLSRELLD EKEPEVLQDS LDRFYSTPFE YLELPDLCOP YRSDFYSLQE 
451 QHLGLALDLD RMKKDQEEEE DQGPPCPRLS RELPEVVEPE DLQDSLDRWY 
501 STPFSYPELP DSCQPYGSCF YSLEEEHVGF SLDVDEIEKY QEGEEDQKPP 
551 CPRLNEVLME AEEPEVLQDS LDRCYSTTST YFQLHASFQQ YRSAFYSFEE 
601 QDVSLALDVD NRFFTLTVIR HHLAFQMGVI FPH 

BLASTP hits 

No BIASTP hits available 

Alert BLASTP hits for DKFZphtes3_7dl7, frame 2 

PIR:T00069 hypothetical protein KIAA0454 - human (fragment), N = 1, 
Score = 199, P = le-11 

PIR:A45592 liver stage antigen LSA-1 - Plasmodium falciparum, N = 1, 
Score =158, P = 2.7e-07 


>PIR:T00069 hypothetical protein KIAA0454 - human (fragment) 
Length « 1, 882 

HSPs: 

Score = 199 (29-9 bits). Expect = l,0e-ll, P = l.Oe-11 
Identities = 74/261 (28%), Positives = 122/261 (46%) 

Query: 117 EDCKDLIKSMLRDERLLT EEKLAEELGQAEELRQYKVLVHSQERELTQLREKLQEG 172 

+D + LI+ + + E L EEKLAEEL A +Y L+ Q REL+ LR+K++EG 

Sbjct: 964 KDLESLIQRVSQLEAQLPKMGLEEKLAEELRSASWPGKYDSLIQDQARELSYLRQKIREG 1023 
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Query: 173 RDASRSLNQH LQALLTPDEPDNSQGRDLREQLAEGCRLAQHLVQKLSPENDDD 225 

o..- ^ "^^ + LL ++ D G+ REQLA+G +L + L KLS ^- + 

Sb:ct: 1024 RGICYLITRHAKDTVKSFEDLLRSNDIDYYLGQSFREQLAQGSQLTERLTSKLSTKDHKS 1083 

Query: 226 EDEDVKVEEAEKVQELYAPREVQKAEEK-EVPEDSLEECAITCSNSHHPCESNQPYGNTR 284 

^ + ^ RE+Q+ E+ EV + L+ ++T S+SH +S++ +T 

Sbjct: 1084 EKDQAGLEPLA LRLSRELQEKEKVIEVLQAKLDARSLTPSSSHALSDSHRSPSSTS 1139 

Query: 285 ITFEEDQV-DSTLIDSSSHDSWLDAVCIIPENESDHEQEEEKGPVSPRNLQESEEEEAP 342 

+E + D ++ +H E A P + +S + S + A 

Sbjct: 1140 FLSDELEACSDMDIVSEYTHYEEKKAS — PSHSDSIHHSSHSAVLSSKPSSTSASQGAK 1196 

Query*: 343 QESWDEGDWTLSIPPDMSASYQSDRSTFH 371 

ES + +L P + s FH 

Sbjct: 1197 AES-NSNPISLPTPQNTPKEANQAHSGFH 1224 

Score = 89 (13.4 bits), Expect - l.le-01, P - l.Oe-01 
Identities = 35/89 (39%), Positives = 44/89 (49%) 

Query: 464 KDQEEEEDQG— PPCPRLSRELPEVVEP-EDLQDSLDRWYSTPFSYPELPDSCQ-PYGS 518 

KD + E+DQ P RLSREL E + E LQ LD TP S L DS + P + 
Sbjct: 1079 KDHKSEKDQAGLEPLALRLSRELQEKEKVIEVLQAKLDARSLTPSSSHALSDSHRSPSST 1138 

Query: 519 CFYSLEEEHVGFSLDVDEIEKYQEGEEDQKPP 550 

F S E E D+D + +Y EE + P 

Sbjct: 1139 SFLSDELEACS— -DMDIVSEYTHYEEKKASP 1167 

Score « 73 (11.0 bits). Expect = 4.Be+00, P = 9 9e-01 
Identities = 31/88 (35%), Positives = 40/88 (45%) 

Query: 390 DQVKKEDQEATSP— RLSRELLD-EKEPEVLQDSLDRFYSTPFEYLELPDLCQ-PYRSD 444 

D ++DQ P RLSREL + EK EVLQ LD TP L D + P + 

Sbjct: 1080 DHKSEKDQAGLEPLALRLSRELQEKEKVIEVLQAKLDARSLTPSSSHALSDSHRSPSSTS 1139 

Query: 445 FYSLQEQHLGLALDLDRMKKDQEEEEDQGPP 475 

F S L D+D + + EE + P 

Sbjct: 1140 FLS DELEACSDMDIVSEYTHYEEKKASP 1167 

Score - 68 (10.2 bits). Expect = l.le-01, P = 1 Oe-01 
Identities = 36/156 (23%), Positives = 68/156 (43%) 

Query: 31 SHGVGRHQELRDPTV— PGPTSSATNVSMWSAGPWS GEKAEMNILEINKK 79 

S G +HQE +TVPPS +VA G++++ + 

Sbjct: 684 SPGKHQHQEEGNVTVRPFPRPQSLDLGATFTVDAHQLDNQSQPRDPGPQSAFSLPGSTQH 743 

Query: 80 SRPQLAENKQQFRNLKQKCLVTQVAYFL-ANRQNNYDYE-DCKDLIKSMLRDERLLTEEK 137 

R QL++ KQ++++L++K L+++ F AN Y + l+K + ++ 

Sbjct: 744 LRSQLSQCKQRYQDLQEKLLLSEATVFAQANELEKYRVMLTGESLVKQDSKQIQVDLQDL 803 

Query: 138 LAEELGQAEELRQYKVLVHSQERELTQLREK-LQEG 172 

E G++E + + + E L+E L EG 

Sbjct: 804 GYETCGRSENEAEREETTSPECEEHNSLKEMVLMEG 839 

Score = 65 (9.8 bits). Expect = 2.2e-01, P = 2. Oe-01 
Identitxes = 23/96 (23%), Positives = 52/96 (54%) 

Query: 123 IKSMLRDERLLTEEKLAEELGQAEE LRQYKVLVHSQERELTQLREKLQEGRDASRS 178 

CKS «. ++ + D+ + E + E+ EE LRQ V ++ +l +LR+ L ++ + 

Sb.3Ct: 5 LRQRIHDKAVALERAIDEKFSALEEKEKELRQLRLAVRERDHDLERLRDVLS SNEA 60 

Query: 179 LNQHLQALLTPDEPDNSQGRDLREQLAEGCRLAQHLVQKL 218 

Q ++G ++ EQL+ C+ Q L +++ 

Sbjct: 61 TMQSMESLL RAKGLEV-EQLSTTCQNLQWLKEEM 93 

Score » 61 (9.2 bits). Expect = 5.5e-01, P - 4.2e-01 
Identities » 27/95 (28%), Positives - 47/95 (49%) 

Query: 134 TEEK-LAEELGQAEELRQY-— KVLVHSQERELTQLREKLQEGRDASRSLNQHLQALLT 188 

+E K L +LG+ EE R Y +LV +++ l+ +LQ +5h.l 

Sb3ct: 855 SERKPLENQLGKQEEFRVYGKSENILV— LRKDIKDLKAQLQNANKVIQNLKSRVRSLSV 912 

Query: 189 PDEPDNSQGRDLREQLAEGCRLAQHLVQKLSPENDDDEDE 228 

+ +5 R R+ A G ++ SP + DEDE 

Sbjct: 913 TSDYSSSLERP-RKLRAVGT LEGSSPHSVPDEDE 945 

Score - 57 (8.6 bits). Expect = 1.4e+00, P = 7 5e-01 
Identities => 26/92 (28%), Positives = 47/92 (51%) 

Query: 127 LRDERLLTEEKLAEELGQAEEL—RQYKVLVHSQERELTQLREKLQEGRDASRSLNQHL 183 

L E LL EK+A Q +E+ R+ ++L+ + L R+LE ARL L 
Sb3Ct: 358 LTQEVLLLREKVASVESQGQEISGNRRQQLLLMLEG-LVDERSRLKEALQAERQLYSSL 415 
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Query: 184 QALLTPDEPDNSQ-GRDLREQLAEGCRLAQHLVQKL 218 

P++S+ R L+ ¥L EG ++ + ++++ 
Sbjct: 416 VKFHA— HPESSERDRTLQVBL-EGAQVLRSRLEEV 448 

Score =54 (8.1 bits). Expect = 2.7e+00, P « 9.3e-01 
Identities = 61/264 (23%), Positives « 121/264 (45%) 

Query: 3 ltptvqgfqwtlrgpdvetspfgapraashgvgrhqe--lrdptvpgptssatnvsmvvs 60 

L+ T Q QW L+ ++ET F + + + + L D SAT ++ 
Sbjct: 79 LSTTCQNLQW-LK-EEMETK-FSRWQKEQESIIQQLQTSLHDRNKEVEDLSAT LLCK 132 

Query: 61 AGPWSGEKAEMNILEINKKSR PQLAENKQQFRNLKQKCLVTQVAYFLANRQNNYDYE 117 

GP E AE + +K R ' L++ +Q L+ + + + ++ R-i- 

Sbjct: 133 LGPGQSEIAEELCQRLQRKERMLQDLLSDRNKQV--LEHEMEIQGLLQSVSTREQE-SQA 189 

Query: 118 DCKDLIKSMLRDERLLTEEKLAEELGQAEELRQYKVLVHSQERELT QLREKLQEG-- 172 

+ L++4-++ ER +L + LG+L ++ +Q+ E+T +L ++ +G 
Sbjct: 190 AAEKLVQALM—ERNSELQALRQYLGGRDSLMS-QAPISNQQAEVTPTGRLGKQTDQGSM 246 

Query: 173 RDASRSLNQHLQALLTPDEPDNSQGRDLREQLAEGCRLAQHLVQKLSPENDDDEDEDVKV 232 

+ SR + L A P ++ G DL + +A G L ++LS N +E E + 

Sbjct: 247 QIPSRDDSTSLTAKEDVSIPRSTLG-DL-DTVA-G LEKELS— NAKEELELMAK 295 

Query: 233 EEAEKVQELYAPREVQKAEEKEVPEDSLEECAIT 266 

+E E EL A + + +E+E+ + + ++T 
Sbjct: 296 KERESQMELSALQSMMAVQEEELQVQAADMESLT 329 

Score « 49 (7.4 bits). Expect = 6.3e+00, P = l.Oe+00 
Identities = 21/87 (24%), Positives = 39/87 (44%) 

Query: 192 PDNSQGRDLREQLAEGCRLAQHLVQKLSPENDDDEDEDVKVEEAEKVQELYAPREVQKAE 251 

P ++Q LR QL++ + Q L +KL + +EEK + + +K + 

Sbjct: 738 PGSTQ—HLRSQLSQCKQRYQDLQEKLLLS EATVFAQANELEKYRVMLTGESLVKQD 792 

Query: 252 EKEVPEDSLEECAI-TCSNSHHPCESNQ 278 

K-»-+ D L++ TC S + E + 
Sbjct: 793 SKQIQVD-LQDLGYETCGRSENEAEREE 819 

Score =46 (6.9 bits). Expect « 6.3e+00, P = l.Oe+00 

Identities = 19/77 (24%), Positives = 39/77 (50%) 

Query: 112 NNYDYEDCKDLIKSMLRDERLLTEEKLAEELGQAEELRQYKVLVHSQERELTQLREKLQ- 170 

+ ++ E+ K+ K + E ++T+E L+E QAE R+ + + + + L+E+L 
Sbjct: 597 DGWEIEEDKE— KGEVMVETWTKEGLSESSLQAE-FRKLQGKLKNAHNIINLLKEQLVL 653 

Query: 171 EGRDASRSLNQHLQALLT 188 

++ + L L LT 
Sbjct: 654 SSKEGNSKLTPELliVHLT 671 


Pedant information for DKF2phtes3_7dl7, frame 2 


Report for DKrzphtes3_7dl7 .2 

[LENGTH) 633 

[MWl 72951.15 

tpl) 4.40 

fHOMOLJ PIR:T00069 hypothetical protein KIAA0454 - human (fragment) 2e-ll 

(BLOCKS] BL00201E 

IPROSITE) MYRISTYL 2 

[PROSITEl CK2_PH0SPH0_SITE 14 

(PROSITEl PKC PHOSPHO SITE 4 

(PROSITEJ ASn'gLYCOSYLATION 2 

(PFAMJ TNFR/NGFR cysteine-rich region 

(KW) All_Alpha 

(KWl L0W_C0MPLEXITY 4.90 % 

(KH] COILED COIL 6.95 % 


SEQ MPLTPTVQGFQWTLRGPDVETSPFGAPRAASHGVGRHQELRDPTVPGPTSSATNVSMWS 

SEG 

PRO ccccceeeeeeeecccccccccccccccccccccccccccccccccccccceeeeeeeee 

COILS 

SEQ agpwsgekaemnileinkksrpqlaenkqqfrnlkqkclvtqvayflanrqnnydyedck 

SEG 

pro ccccccchhhhhhhheeecccchhhhhhhhhhhcccccchhhhhhhhhhcccccccccch 

COILS 
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SEQ 
SEG 
PRD 
COILS 


SEQ 
SEG 
PRD 


DLIKSMLRDERLLTEEKLAEELGQAEELRQYKVLVHSQERELTQLREKLQEGRDASRSLN 
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhlihhhhhhhhhlihhhhhh^ 


cccchhhhh 


.ccccccccccccccccccccccccccccccccccccc 

QHLQALLTPDEPDNSQGRDLREQLAEGCRLAQHLVQKLSPENDDDEDEDVKVEEAEKVQE 


COILS ccccccc*'''*^'''''^''''''''^^^^*'^^^*'^^^^ 


.XXXXXXXXXXXXXXXX . , 


SEQ 
SEG 
PRD 
COILS 


LYAPREVQKAEEKEVPEDSLEECAITCSNSHHPCESNQPYGNTRITFEEDQVDSTLIDSS 
hhhcchhhhhhhhhhcchhhhhhhccccccccccccccccccceeeee^ 


SHDEWLDAVCIIPENESDHEQEEEKGPVSPRNLQESEEEEAPQESWDEGDWT 


SEQ 

Don Jll * uKuwu XXXXXXXXXXXXXXX 

COILS ^ ^^^^^^^^^^^^'^^^'^^'^^^^^^^^^^ccccccccchhhhhhh 


LSIPPDMS 


ccccccccccccccccccc 


SEQ 
SEG 
PRD 
COILS 


ASYQSDRSTFHSVEEQQVGLALDIGRHWCDQVKKEDQEATSPRLSRELLDEKEPEVLQDS 
ccccccccchhhhhhhhhhhhhhccccccchhhhhccccccchhhh^ 


SEQ 
SEG 

PRD hhhhhccc 
COILS 


LDRFYSTPFEYLELPDLCQPYRSDFYSLQEQHLGLALDLDRMKKDQEEEEDQGPPCPRLS 
eeeeecccccccccccchhhhhhhhhhhhhcchhhhhhhhhhccccre^ 


SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PEUS 
COILS 


RELPEVVEPEDLQDSLDRWYSTPFSYPELPDSCQPYGSCFYSLEEEHVGFSLDVDEIEKY 
ccceeeeeccchhhhhhhhhccccccccccccccccccceeeeccce^ 

QEGEEDQKPPCPRLNEVLMEAEEPEVLQDSLDRCYSTTSTYFQLHASFQQYRSAFYSFEE 
hcccccccccccchhhhhhhhhchhhhhccccceeecceeeehhhhhhh^^ 


QDVSLALDVDNRFFTLTVIRHHLAFQMGVI FPH 


SEQ 
SEG 

PRD cchhhhhhcccchhhhhhhhhhhhhhhhhcccc 
COILS 


PSOOOOl 
PSOOOOl 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00008 
PS00008 


Prosite for DKFZphtes3_7dl7 .2 

54->58 ASN_GLYCOSYLATI0N PDOCOOOOl 

315->319 ASN^GLYCOSYLATION PDOCOOOOl 

13->16 PKC_PHOSPHO_SITE PDOC00005 

329->332 PKC_PHOSPHO_SITE PDOC00005 

365->368 PKC_PHOSPHO_SITE PDOC00005 

401->404 PKC_PHOSPH0_SITE PDOC00005 

188->192 CK2_PHOSPHO SITE PDOC00006 

259>>263 CK2_PH0SPH0"SITE PDOC00006 

286->290 CK2_PHOSPH0_SITE PDOC00006 

295->299 CK2_PHOSPHO_SITE PDOC00006 

300->304 CK2_PHOSPH0_SITE PDOC00006 

317->321 CK2_PH0SPH0_SITE PDOC00006 

336->340 CK2__PHOSPH0_SITE PDOC00006 

345->349 CK2_PHOSPHO_SITE PDOC00006 

372->37 6 CK2_PHOSPHO_SITE PDOC00006 

427->431 CK2_PHOSPH0_SITE PDOC00006 

447->451 CK2_PH0SPH0__SITE PDOC00006 

505->509 CK2__PHOSPHO_SITE PDOC00006 

522->526 CK2_PHOSPH0_SITE PDOC00006 

597->601 CK2_PHOSPH0_SITE PDOC00006 

25->31 MYRISTYL PDOC00008 

207->213 -MYRISTYL PDOC00008 


HMM_NAME 

HMM 

Query 


Pfam for DKFZphtes3_7dl7 .2 
TNFR/NGFR cysteine-rich region 

*CpeGtYtDWNHvpqCXpCtrCePEMGQYMvqPCTwTQNTVC* 
C+ ++ + N+ ++ + ++ + +++ +++ ++VC 

274 CESNQPYG-NT-RITFEEDQVDS— TLIDSSSHDEWLDAVC 


310 
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DKFZphtes3_7j3 


group: cell cycle 

DKFZphtes3_7j3.2 encodes a novel 628 amino acid putative protein kinase, which is related to 
the C-TAKl Cdc25C associated protein kinase. 

Cdc25C is a protein kinase that controls entry into mitosis by dephosphorylation of Cdc2. 
Cdc25C function is regulated by phosphorylation, too. Serine 216 phosphorylation of Cdc25C 
mediates the binding of 14-3-3 protein to Cdc25C. C-TAKl (Cdc twenty-five C associated protein 
kinase) phosphor ylates Cdc25C on serine 216 in vitro. The new protein is closely related to C- 
Takl and 'therefore should be involved in cell-cycle regulation, too. 

The new protein can find application in modulating /blocking the cell cycle. 


strong similarity to serine/threonine-specif ic protein kinases 

complete cDNA, complete cds, potential start at Bp 128, few EST hits 

Sequenced by BMFZ 

Locus: unknown 

Insert length: 3443 bp 

Poly A stretch at pos. 3399, polyadenylation signal at pos. 3376 


1 GTGCTTTACT GCGCGCTCTG GTACTGCTGT GGCTCCCCGT CCTGGTGCGG 
51 GACCTGTGCC CCGCGCTTCA GCCCTCCCCG CACAGCCTAC TGATTCCCCT 
101 GCCGCCCTTG CTCACCTCCT GCTCGCCATG GAGTCGCTGG TTTTCGCGCG 
151 GCGCTCCGGC CCCACTCCCT CGGCCGCAGA GCTAGCCCGG CCGCTGGCGG 
201 AAGGGCTGAT CAAGTCGCCC AAGCCCCTAA TGAAGAAGCA GGCGGTGAAG 
251 CGGCACCACC ACAAGCACAA CCTGCGGCAC CGCTACGAGT TCCTGGAGAC 
301 CCTGGGCAAA GGCACCTACG GGAAGGTGAA GAAGGCGCGG GAGAGCTCGG 
351 GGCGCCTGGT GGCCATCAAG TCAATCCGGA AGGACAAAAT CAAAGATGAG 
401 CAAGATCTGA TGCACATACG GAGGGAGATT GAGATCATGT CATCACTCAA 
451 CCACCCTCAC ATCATTGCCA TCCATGAAGT GTTTGAGAAC AGCAGCAAGA 
501 TCGTGATCGT CATGGAGTAT GCCAGCCGGG GCGACCTTTA TGACTACATC 
551 AGCGAGCGGC AGCAGCTCAG TGAGCGCGAA GCTAGGCATT TCTTCCGGCA 
601 GATCGTCTCT GCCGTGCACT ATTGCCATCA GAACAGAGTT GTCCACCGAG 
651 ATCTCAAGCT GGAGAACATC CTCTTGGATG CCAATGGGAA TATCAAGATT 
701 GCTGACTTCG GCCTCTCCAA CCTCTACCAT CAAGGCAAGT TCCTGGAGAC 
751 ATTCTGTGGG AGCCCCCTCT ATGCCTCGCC AGAGATTGTC AATGGGAAGC 
801 CCTACACAGG CCCAGAGGTG GACAGCTGGT CCCTGGGTGT TCTCCTCTAC 
851 ATCCTGGTGC ATGGCACCAT GCCCTTTGAT GGGCATGACC ATAAGATCCT 
901 AGTGAAACAG ATCAGCAACG GGGCCTACCG GGAGCCACCT AAACCCTCTG 
951 ATGCCTGTGG CCTGATCCGG TGGCTGTTGA TGGTGAACCC CACCCGCCGG 
1001 GCCACCCTGG AGGATGTGGC CAGTCACTGG TGGGTCAACT GGGGCTACGC 
1051 CACCCGAGTG GGAGAGCAGG AGGCTCCGCA TGAGGGTGGG CACCCTGGCA 
1101 GTGACTCTGC CCGCGCCTCC ATGGCTGACT GGCTCCGGCG TTCCTCCCGC 
1151 CCCCTCCTGG AGAATGGGGC CAAGGTGTGC AGCTTCTTCA AGCAGCATGC 
1201 ACCTGGTGGG GGAAGCACCA CCCCTGGCCT GGAGCGCCAG CATTCGCTCA 
1251 AGAAGTCCCG CAAGGAGAAT GACATGGCCC AGTCTCTCCA CAGTGACACG 
1301 GCTGATGACA CTGCCCATCG CCCTGGCAAG AGCAACCTCA AGCTGCCAAA 
1351 GGGCATTCTC AAGAAGAAGG TGTCAGCCTC TGCAGAAGGG GTACAGGAGG 
1401 ACCCTCCGGA GCTCAGCCCA ATCCCTGCGA GCCCAGGGCA GGCTGCCCCG 
1451 CTGCTCCCCA AGAAGGGCAT TCTCAAGAAG CCCCGACAGC GCGAGTCTGG 
1501 CTACTACTCC TCTCCCGAGC CCAGTG7VATC TGGGGAGCTC TTGGACGCAG 
1551 GCGACGTGTT TGTGAGTGGG GATCCCAAGG AGCAGAAGCC TCCGCAAGCT 
1601 TCAGGGCTGC TCCTCCATCG CAAAGGCATC CTCAAACTCA ATGGCAAGTT 
1651 CTCCCAGACA GCCTTGGAGC TCGCGGCCCC CACCACCTTC GGCTCCCTGG 
1701 ATGAACTCGC CCCACCTCGC CCCCTGGCCC GGGCCAGCCG ACCCTCAGGG 
1751 GCTGTGAGCG AGGACAGCAT CCTGTCCTCT GAGTCCTTTG ACCAGCTGGA 
1801 CTTGCCTGAA CGGCTCCCAG AGCCCCCACT GCGGGGCTGT GTGTCTGTGG 
1851 ACAACCTCAC GGGGCTTGAG GAGCCCCCCT CAGAGGGCCC TGGAAGCTGC 
1901 CTGAGGCGCT GGCGGCAGGA TCCTTTGGGG GACAGCTGCT TTTCCCTGAC 
1951 AGACTGCCAG GAGGTGACAG CGACCTACCG ACAGGCACTG AGGGTCTGCT 
2001 CAAAGCTCAC CTGAGTGGAG TAGGCATTGC CCCAGCCCGG TCAGGCTCTC 
2051 AGATGCAGCT GGTTGCACCC CGAGGGGAGA TGCCTTCTCC CCCACCTCCC 
2101 AGGACCTGCA TCCCAGCTCA GAAGGCTGAG AGGGTTTGCA GTGGAGCCCT 
2151 GAGCAGGGCT GGATATGGGA AGTAGGCAAA TGAAATGCGC CAAGGGTTCA 
2201 GTGTCTGTCT TCAGCCCTGC TGAACGAAGA GGATACTAAA GAGAGGGGAA 
2251 CGGGAATGCC CGCGACAGAG TCCACATTGC CTGTTTCTTG TGTACATGGG 
2301 GGGGCCACAG AGACCTGGAA AGAGAACTCT CCCAGGGCCC ATCTCCTGCA 
2351 TCCCATGAAT ACTCTGTACA CATGGTGCCT TCTAAGGACA GCTCCTTCCC 
2401 TACTCATTCC CTGCCCAAGT GGGGCCAGAC CTCTTTACAC ACACATTCCC 
2451 GTTCCTACCA ACCACCAGAA CTGGATGGTG GCACCCCTAA TGTGCATGAG 
2501 GCATCCTGGG AATGGTCTGG AGTAACGCTT CGTTATTTTT ATTTTTATTT 
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2551 TTATTTATTT ATTTATTTTT TTGAGACGGA GTTTCGCTCT TGGTGCCCAG 
2601 GCTAGAGTGC AATGGCGCGA TCTCAGCTCA CCTCAACCTC CGCCTCCCGG 
2651 GTTCAAGCGA TTCTCCTGCC TCAGCCTCCC TAGTAGCTGG GATTACAGGC 
2701 GCCCGCCACC ATGCCCGGCT AATTTTGTAT TTTTAGTAGA GACAGGGTTT 
2751 CTCCATGTTG GTCAGGCTGG TCTCAAACTC CCGACCTCAG GTGATCCACC 
2801 CACCTCGGCC TCCCAAAGTG CTGGGATTAC AGGCGTGAGC CACCGCGCCC 
2851 CACCTAACCC TTCCTTATTT AGCCTAGGAG TAAGAGAACA CAATCTCTGT 
2901 TTCTTCAATG GTTCTCTTCC CTTTTCCATC CTCCAAACCT GGCCTGAGCC 
2951 TCCTGAAGTT GCTGCTGTGA ATCTGAAAGA CTTGAAAAGC CTCCGCCTGC 
3001 TGTGTGGACT TCATCTCAAG GGGCCCAGCC TCCTCTGGAC TCCACCTTGG 
3051 ACCTCAGTGA CTCAGAACTT CTGCCTCTAA GCTGCTCTAA AGTCCAGACT 
3101 ATGGATGTGT TCTCTAGGCC TTCAGGACTC TAGAATGTCC ATATTTATTT 
3151 TTATGTTCTT GGCTTTGTGT TTTAGGAAAA GTGAATCTTG CTGTTTTCAA 
3201 TAATGTGAAT GCTATGTTCT GGGAAAATCC ACTATGACAT CTAAGTTTTG 
3251 TGTACAGAGA GATATTTTTG CAACTATTTC CACCTCCTCC CACAACCCCC 
3301 CACACTCCAC TCCACACTCT TGAGTCTCTT TACCTAATGG TCTCTACCTA 
3351 ATGGACCTCC GTGGCCAAAA AGTACCATTA AAACCAGAAA GGTGATTGGA 
3401 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAA 


BLAST Results 


No BLAST result 


Medline entries 


98202387: 

C-TAKl protein kinase phosphorylates human Cclc25C on serine 216 and 
promotes 14-3-3 

protein binding. 


Peptide information for frame 2 


ORF from 128 bp to 2011 bp; peptide length: 628 
Category: strong similarity to known protein 


1 MESLVFARRS GPTPSAAELA RPLAEGLIKS PKPLMKKQAV KRHHHKHNLR 
51 HRYEFLETLG KGTYGKVKKA RESSGRLVAI KSIRKDKIKD EQDLMHIRRE 
lOX lEIMSSLNHP HIIAIHEVFE NSSKIVIVME YASRGDLYDY ISERQQLSER 
151 EARHFFRQIV SAVHYCHQNR VVHRDLKLEN ILLDANGNIK lADFGLSNLY 
201 HQGKFLQTFC GSPLYASPEI VNGKPYTGPE VDSWSLGVLL YILVHGTMPF 
251 DGHDHKILVK QISNGAYREP PKPSDACGLI RWLLMVNPTR RATLEDVASH 
301 WWVNWGYATR VGEQEAPHEG GHPGSDSARA SMADWLRRSS RPLLENGAKV 
351 CSFFKQHAPG GGSTTPGLER QHSLKKSRKE NDMAQSLHSD TADDTAHRPG 
401 KSNLKLPKGI LKKKVSASAE GVQEDPPELS PIPASPGQAA PLLPKKGILK 
451 KPRQRESGYY SSPEPSESGE LLDAGDVFVS GDPKEQKPPQ ASGLLLHRKG 
501 ILKLNGKFSQ TALELAAPTT FGSLDELAPP RPLARASRPS GAVSEDSILS 
551 SESFDQLDLP ERLPEPPLRG CVSVDNLTGL EEPPSEGPGS CLRRWRQDPL 
601 GDSCFSLTDC QEVTATYRQA LRVCSKLT 

BLASTP hits 

NO BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_7 j3, frame 2 
NO Alert BLASTP hits found 

Pedant information for DKFZphtes3_7j3, frame 2 


Report for DKF2phtes3_7 j3,2 


[LENGTHJ 628 

(MWJ 69612.39 

tpl) 9.01 

[HOMOL] TREMBL:AB011109 1 gene: "KXAA0537 
mRNA for KIAA0537 protein, complete cds. le-152 

[FUNCAT] ^- ^- - . 
5e-66 


[FUNCAT J 


product: -KIAA0537 protein"; Homo sapiens 
01.05.04 regulation of carboh^S^ate utilization {S. cerevisiae, YDR477wl 

11.01 stress response (s. cerevisiae, YDR477wl 5e-66 
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IFUNCATJ 
[FUNCAT] 
(rUNCAT) 
{FUNCAT] 
8e-52 
[FUNCAT] 
[FUNCAT) 
[FUNCAT] 
[FUNCAT] 
[FUNCAT J 
(FUNCAT J 
[FUNCAT J 
repair) 
[FUNCAT] 
(FUNCAT) 
(FUNCAT) 
(FUNCAT) 


30.03 organization of cytoplasm [S. cerevisiae, YDR477w] 5e-66 
98 classification not yet clear-cut (S. cerevisiae, yLR096w] 6e-54 

30.02 organization of plasma membrane [S. cerevisiae, YLR096w] 6e-54 

03.04 budding, cell polarity and filament formation (S. cerevisiae, YDR507c] 


6G-44 
6e-44 


03.25 cytokinesis (S. cerevisiae, yDR507cJ 8e-52 

03.22 cell cycle control and mitosis [S. cerevisiae, YKLlOlw) 9e-51 
30.10 nuclear organization [S. cerevisiae, YKLlOlw) 9e-51 
99 unclassified proteins [S. cerevisiae, YPL141c] le-45 

10.99 other signal-transduction activities (S. cerevisiae, YPL153c] 
03.22.01 cell cycle checlc point proteins [S. cerevisiae, YPL153c) 

11.04 dna repair (direct repair, base excision repair and nucleotide excision 
[S. cerevisiae, YPH53c) 6e-44 

03.19 recombination and dna repair (S. cerevisiae, 
03.16 dna synthesis and replication (S. cerevisiae, 
10.02.11 Jcey kinases (S. cerevisiae, YBLlOSc) 3e-34 
04.05.01.04 transcriptional control (S. cerevisiae, 
terminal domain] 2e-28 

(FUNCAT] 03.01 cell growth (S. cerevisiae, YFR014cJ 4e-28 

03.10 sporulation and germination (S. cerevisiae, YGL180w] 2e-26 
06.13.04 lysosomal and vacuolar degradation (S. cerevisiae, YGLl80wJ 2e-26 
08.13 vacuolar transport (S. cerevisiae, YGLlSOw] 2e-26 

04.99 other transcription activities (S. cerevisiae, YER129w] 4e-26 
02.19 metabolism of energy reserves (glycogen, trehalose) (S. cerevisiae. 


YPL153C] 6e~44 

YMROOlc] 2e-42 

YKL139W CTKl - carboxy- 


[ FUNCAT] 
(FUNCAT J 
(FUNCAT] 
[FUNCAT] 
[FUNCAT] 
YPL031C] 5e-24 
(FUNCAT] 
5e-24 
(FUNCAT J 

[S 

[FUNCAT] 
(FUNCAT) 
(FUNCAT) 
(FUNCAT) 
(FUNCAT] 
6e-21 
(FUNCAT] 
palmitylation, 
(FUNCAT] 
(FUNCAT) 
[FUNCAT) 
YNL183C) le-17 
[FUNCAT] 
le-17 
(FUNCAT) 
(FUNCAT) 
(FUNCAT] 
le-15 
(FUNCAT) 
5e-15 
(FUNCAT] 
(FUNCAT) 
YBR097W) 2e-08 
[FUNCAT] 
2e-08 
[FUNCAT] 
2e-08 
[FUNCAT) 
( FUNCAT] 
8e-05 
[FUNCAT] 
cerevisiae, 
(BLOCKS) 
(BLOCKS) 
(BLOCKS) 
(SCOP) 


01.04.04 regulation of phosphate utilization 


(S. cerevisiae, YPL031c) 
sex-specific proteins 


03.07 pheromone response, mating-type determination 
cerevisiae, YHL007cJ 6e-24 

10.05.11 key kinases [S, cerevisiae, YHL007c] 6e-24 
09.01 biogenesis of cell wall (S. cerevisiae, YNR031c) le-22 

10.03.11 key kinases [S. cerevisiae, YNR031c) le-22 
03.13 meiosis [S. cerevisiae, YDR523c) 8e-22 

04.05.01-01 general transcription activities (S. cerevisiae, YDLlOBw) 

06.07 protein modification (glycolsylation, acylation, myristylation, 
farnesylation and processing) (S. cerevisiae, YFL033c) 6e-2l 

10.05.09 regulation of g-protein activity (S. cerevisiae, YBL016wJ 7e-19 
10.04.11 key kinases [S. cerevisiae, YDLlSSw] 3e-18 

01.02.04 regulation of nitrogen and sulphur utilization (S. cerevisiae. 


08.99 other intracellular-transport activities 


(S. cerevisiae, YNL183c) 


05.07 translational control (S. 
09.04 biogenesis of cytoskeleton 


04.03.99 other trna-transcription activities (S. 

10.04.99 other nutritional-response activities (S. 

c energy conversion (M. genitalium, MG109) 3e-12 
30.09 organization of intracellular transport vesicles 

08.07 vesicular transport (golgi network, etc.) (S. 

06.04 protein targeting, sorting and translocation (S. 


cerevisiae, YDR283c] 2e-17 

(S. cerevisiae, YNL020c] 4e-16 

cerevisiae, YOR061w] 


cerevisiae, YJR059w] 


[S. cerevisiae. 


cerevisiae. 


cerevisiae. 


YBR097W] 
YBR097W) 


30.08 organization of golgi [S. cerevisiae, YBR097w) 2e-08 

30.07 organization of endopXasmatic reticulum (S. cerevisiae. 


01.06,10 regulation of lipid, fatty-acid and sterol biosynthesis 
YHR079C) 8e-05 

BL00479C Phorbol esters / diacylglycerol binding domain proteins 
BL00239B Receptor tyrosine kinase class II proteins 
BL00107A Protein kinases ATP-binding region proteins 

dlgol 5.1.1.1.9 MAP kinase Erk2 [rat Rattus norvegicus le-77 

(SCOP) dlwfc 5.1.1.1,8 MAP kinase p38 [human (Homo sapiens) 4e-68 

[SCOP] dlkoa_2 5.1.1.1.7 (1-350) Twitchin, kinase domain (Caenorhabditi 2e-85 

(SCOP) dlkoba_ 5.1.1.1.6 Twitchin, kinase domain (California sea har le-80 

(SCOP) dlphk 5.1.1.1.5 gainma-subunit of glycogen phosphorylase kinas 2e-76 

[SCOP] dlirk 5.1.1.2.4 insulin receptor (Human (Homo sapiens) le-69 

(SCOP) dlaprae_ 5.1.1.1.4 cAMP-dependent PK, catalytic subunit (mouse (Mu le-84 

(SCOP) dlfgka_ 5.1.1.2.3 Fibroblast growth factor receptor 1 (human (Horn le-68 

(SCOP) dlydre_ 5.1.1.1-3 cAMP-dependent PK, catalytic subunit [bovine (Bo 9e-85 

(SCOP] dlfmk_3 5.1.1.2.2 (168-437) c-src tyrosine kinase (human (Horn le-69 

[SCOP] dlcdka_ 5.1.1.1.2 cAMP-dependent PK, catalytic subunit [pig (Su le-85 

[SCOP] d2hcka3 5.1.1.2.1 (167-437) Haemopoetic cell kinase Hck (huraa 5e-66 

[SCOP] dlcsn 5.1.1.1.11 Casein kinase-1, CKl (Schizosaccharomyces pombe 9e-47 

[SCOP] dljsua_ 5.1.1.1.1 Cycl in-dependent PK [Huinan (Homo sapiens) le-75 

(SCOP) dlckja_ 5.1.1.1.10 Casein kinase-1, CKl (rat (Rattus norvegicus) 5e-54 

(EC) 2.7.1.38 Phosphorylase kinase le-36 

[EC] 2.7.1.123 Ca2+/calmodulin-dependent protein kinase 4e-40 


YHR079C) 

ts. 
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(ECl 

fECl 

I EC) 

(EC] 

[PIRKW] 

[PIRKW] 

(PIRKW J 

[PIRKW] 

[PIRKW} 

[PIRKW] 

[PIRKW] 

[PIRKWJ 

[PIRKW] 

[PIRKWJ 

[PIRKWJ 

[PIRKW] 

[PIRKW) 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

(PIRKWJ 

(PIRKWJ 

(PIRKW) 

[PIRKW] 

[PIRKW] 

(PIRKW) 

(PIRKW) 

(PIRKW) 

(PIRKW) 

(PIRKW) 

(PIRKW) 

[PIRKW] 

[PIRKWJ 

(PIRKW) 

(PIRKW) 

(PIRKW) 

(PIRKW) 

(PIRKW) 

[PIRKW] 

(PIRKW) 

(PIRKW) 

(PIRKW) 

[SUPFAM] 

[SUPFAM] 

(SUPFAM) 

(SOPFT^) 

(SUPFAM) 

[SUPFAM) 

(SUPFAM] 

(SUPFAM) 

(SUPFAM) 

[SUPFAM) 

(SUPFAM) 

(SUPFAM) 

(SUPFAM) 

[SUPFAM) 

[SUPFAM] 

[SUPFAM) 

[SUPFAM) 

(SUPFAM) 

[SUPFAM] 

[SUPFAM] 

[SUPFAM) 

(SUPFAM] 

(SUPFAM] 

[SUPFAM] 

(SUPFAM) 

[SUPFAM] 

(SUPFAM] 

[SUPFAM] 

[SUPFAM) 

[SUPFAM] 

(SUPFAM) 

[PROSITEJ 

(PROSITE) 

(PROSITE) 

(PROSITEJ 

(PROSITE) 

(PROSITE) 

(PROSITE) 


2,7 1.128 (Acetyl-COA carboxylase] kinase le-61 
2.7.1.117 Myosin-light-chain kinase 2e-40 

2:7:^-37°%lo^lT^r^asfnr'''"'°'^ "ductase ,«.DPH, ] kinase le-61 

phosphotransferase 6e-66 

nucleus le-64 

calcium 7e-35 

duplication le-38 

tandem repeat 4e-39 

phorbol ester binding le-38 

zinc le-38 

cell cycle control le-42 

serine/threonine-specific protein Jcinase 8e-68 

oncogene le-40 

phospholipid binding le-38 

autophosphorylation le-64 

brain le-40 

heterotetramer 2e-36 
mitosis 7e-42 
polymer le-35 
magnesium 6e-66 
ATP 8e-68 
polyprotein le-40 
phosphoprotein le-64 
apoptosis 4e-39 
glycoprotein 7e-42 
leucine zipper 3e-35 
skeletal muscle 7e-35 
protein kinase 5e-41 
CAMP binding 3e-38 
testis 9e-36 

purine nucleotide binding 2e-49 

calcium binding 8e-39 

alternative splicing 3e-37 

P-loop 2e-49 

lipoprotein 2e-33 

segmentation le-33 

core protein le-40 

muscle 7G-35 

myristylation 2e-33 

EF hand 8e-39 

cell division 2e-40 

calmodulin binding 4e-40 

ribosomal protein S6 kinase II 5e-36 

fibronectin type III repeat homology 3e-33 

xramunoglobulin homology 3e-33 

calcium-dependent protein kinase 8e-39 

AMP-activated protein kinase 6e-66 

protein kinase akt 3e-42 

protein kinase SPKl le-42 

Caf^/r^?!!^?'''"^!;'' ^^Tyr-specific protein kinases 8e-68 
Ca2-r/calmodulan-dependent protein kinase 3e-37 
calmodulin repeat homology 8e-39 

^tei^KeTzl^^ tr.^r "-^-"<*-''-<li»9 domain h«»ology 6e-33 

°^elXllltT.tir^^^^^^^ "^^^-"c c.ai„ le-3. 

pleckstrin repeat homology 3e-42 
ankyrin repeat homology 4e-39 
protein kinase homology 8e-68 

Ca2+/calmodulin-dependent protein kinase II 8e-41 

twUchin 3e-!3 "^"'^-^^'^^^^^ repeat homology le-38 

protein kinase C delta le-38 

cGMP-dependent protein kinase 6e-33 

protein kinase cdrl 7e-42 

protein kinase C C2 region homology 3e-37 

protein kinase C alpha 3e-37 

yeast protein kinase C 5e-36 

kinase-related transforming protein le-41 

kinase interaction domain homology le-42 

gag-akt polyprotein le-40 

Ca2+/calmodulin-dependent protein kinase I 4e-40 

protein kinase C mu 4e-33 

PROTEIN_KINASE ATP 2 

RGD 1 

MYRISTYL 4 

CAMP_PHOSPHO SITE 3 

CK2_PHOSPH0_SITE 13 

TYR_PHOSPHO_SITE 2 

PKC PHOSPHO_SITE 12 
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IPROSITEJ ASN_GLyCOSYLATION 2 

[PROSITE] PROTEIN_KINASE_ST 1 

IPFAM] Eukaryotic protein kinase domain 

[KW] All Alpha 

IKW} 3D *" 

[KW] LOW_COMPLEXITY 10.51 % 

SEQ MESLVFARRSGPTPSAAELARPLAEGLIKSPKPLMKKQAVKRHHHKHNLRHRYEFLETLG 

SEG xxxxxxxxxxxx 

IctpE HHHHHHHHHHHHHHHCCCCCCCC— GGGEEEEEEEE 

SEQ KGTYGKVKKARESSGRLVAIKSIRKDKIKDEQOLMHIRREIEIMSSLNHPHIIAIHEVFE 

SEG 

ictpE ctt'teeeeeeeettteeeeeeeeehhhhhhhcchhhhhhhhhhhhccctttbcceeeeee 

SEQ NSSKIVIVMEYASRGDLYDYISERQQLSEREARHFFRQIVSAVHYCHQNRVVHRDLKLEN 

SEG 

IctpE ETTEEEEEEECTTTTBHHHHHHHHCCCCHHHHHHHHHHHHHHHHHHHHCCEECCCCCGGG 

SEQ ILLDANGNIKIADFGLSNLYHQGKFLQTFCGSPLYASPEIVNGKPYTGPEVDSWSLGVLL 

SEG 

IctpE EEETTTTCEEECCTTTTEET-TTT-BCCCCCCGGGCCHHHHHCCCBC-HHHHHHHHHHHH 

SEQ Y I LVHGTMPFDGHDHKI LVKQI SNGAYREPPKPSDACGLI RWLLMVNPTRRATLEDVASH 

SEG 

IctpE HHHHHCCTTTTTTTHHHHHHHHHHCCCCCTTTCHHHHHHHHHTTTTTGGGTTTHHHHHHC 

SEQ WWVNWGYATRVGEQEAPHEGGHPGSDSARASMADWLRRSSRPLLENGAKVCSFFKQHAPG 

SEG 

IctpE GG 

SEQ GGSTTPGLERQHSLKKSRKENDMAQSLHSDTADDTAHRPGKSNLKLPKGILKKKVSASAE 

SEG 

IctpE 

SEQ GVQEDPPELSPIPASPGQAAPLLPKKGILKKPRQRESGYYSSPEPSESGELLDAGDVFVS 

SEG xxxxxxxxxxxx . . . xxxxxxxxxxxxxxx 

IctpE 

SEQ GDPKEQKPPQASGLLLHRKGILKLNGKFSQTALELAAPTTFGSLDELAPPRPLARASRPS 

S EG xxxxxxxxxxxxxx 

IctpE 

SEQ GAVSEDSILSSESFDQLDLPERLPEPPLRGCVSVDNLTGLEEPPSEGPGSCLRRWRQDPL 

SEG xxxxxxxxxxxxx 

IctpE 

SEQ GDSCFSLTDCQEVTATYRQALRVCSKLT 

SEG 

IctpE 


Prosite for DKF2phtes3_7 j3 .2 


PSOOOOl 
PSOOOOl 
PS00004 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PSOOOOS 
PS00005 
PSOOOOS 
PSOOOOS 
PSOOOOS 
PSOOOOS 
PSOOOOS 
PSOOOOS 
PSOOOOS 
PSOOOOS 
PS0000 6 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 


121- >125 
576->580 
290->294 
337->341 
413->417 

30->33 
74->77 
82->85 

122- >125 
142->145 
148->1S1 
289->292 
327->330 
339->342 
373->376 
377->380 
616->619 

15->19 
133->137 
148->152 
227->231 
293->297 
331->335 
377->381 
391->395 


ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

C AMP_PH OS PHO_S I TE 

CAMP_PHOSPHO_SITE, 

CAMP_PHOS PHO_S I TE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC PHOSPHO_SITE 

PKC'PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC PHOSPHO_SITE 

PKC]^PHOSPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0 SITE 

CK2 PHOSPHORS I TE 

CK2''PH0SPH0 SITE 


POOCOOOOl 
PDOCOOOOl 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00005 
PDOCOOOOS 
PDOC00005 
PDCX:00005 
PDCX:00005 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOC00006 
PDOC00006 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOC00006 
PDOCOOOOS 
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PS00006 461->465 CK2_PH0SPH0_SITE 

PS00006 511->515 CK2_PHOSPHO_SITE 

PS00006 523->527 CK2_PHOSPHO SITE 

PS00006 578->582 CK2_PH0SPH0~SITE 

PS00006 606->6l0 CK2_PH0SPH0_SITE 

PS00007 453->4 60 TYR^PHOSPHO SITE 

PS00007 4 53->4 61 TyR_PHOSPHO*"siTE 

PS00008 320->326 MYRISTYL 

PS00008 324->330 MYRISTYL 

PS00008 347->353 MYRISTYL 

PS00008 360->366 MYRISTYL 

PS00016 134->137 RGD 

PS00107 59->82 PROTEIN_KINASE_ATP 

PS00107 59->8 6 PROTEIN_KINASE ATP 

PS00108 " 171->184 PROTEIN KINASE ST 


PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDCX:00007 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00016 
PDOCOOlOO 
PDOCOOlOO 
PDOCOOlOO 


Pfam for DKFZphtes3_7j3 .2 


HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 


Eukaryotic protein kinase domain 

♦YeigRilGeGsFGtVYkCiWrTGelVAIKIIkkrsms FIREI 

YE+++++G+G++G+V+K+++ +G++VAIK I+K++++ ++REI 
53 YEFLETLGKGTYGKVKKARESSGRLVAIKSIRKDKIKDEQDLMHIRREI 101 

qlMRrLnHPNIIRFYDwFedddDHIYMIMEYMeGGDLFDYIrrngpMsEw 
+IM +LNHP+II + ++FE ++ I ++MEY+ GDL+DYI+++ ++SE+ 
102 EIMSSLNHPHIIAIHEVFE-NSSKIVIVMEYASRGDLYDYISERQQLSER 150 

e I r f IMyQI Lr GMe YLHSMgl IHRDLKPEMI LI DeNgql KIcDFGLARqM 
E+R++++QI++++ Y+H ++++HRDLK ENIL+D NG+IKI+DFGL+ ++ 
151 EARHFFRQIVSAVHYCHQNRVVHRDLKLENILLDANGNIKIADFGLSNLY 200 

nn YerM 1 1 f CGTPWYMMAPEVI Img . ny Y 1 1 kVDMWSFGCI LWEMMTGep 
+ + ++ TFCG+P Y +PE+ ++G +Y +++VD WS+G++L++++ G+ 
201 HQGKFLQTFCGSPLYA-SPEI-VNGKPYTGPEVDSWSLGVLLYILVHGTM 248 

PFyddnMemlmrliqrfrrpfWpnCSeElyDFMrwCWnyDPekRPTFrQI 
PF+++ ++ I + +++ +p s+ + ++RW++ ++P++R T +++ 
249 PFDGHDHKILVKQISNGAYREPPKPSD-ACGLIRWLLMVNPTRRATLEDV 297 


LnHPWF* 
H W+ 
298 ASHWWV 


303 
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DKFZphtes3_7j8 


group: testes derived 

DKFZphtes3_7 j8 encodes a novel 410 amino acid protein nearly identical to human 
WUGSC:H_DJ1159O04 .1. 

The novel protein contains an additional C-terrainal domain, which is not present in 
WUGS C : H_D J 1 1 5 9 OO 4 . 1 , 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 
genes. 


WUGSC:H_DJ1159O04.1 similarity to yBL104p 

verifies and extends the genmodel WUGSC:H_DJ1159O04 . 1 
similarity to S.cerevisiae YBL104p 

Sequenced by BMFZ 

Locus: /map="7p21-p22" 

Insert length: 3353 bp 

Poly A stretch at pos. 3231, no polyadenylation signal found 


1 GCAAAATATG TTGTATTTGT GGCATAGTTC ATATTTACAC TATCATAAAA 

51 TTATGGCCGA GAAGTTAAAT ATTCTAAATG TGTCAACATA GTTCTCTGTA 

101 AAACTGACTT ATTTTCCAAA TATATTTTGA AATAAAACAA TATAAAAATG 

151 TTTTCTGTTT TTAGGAATGG TGGAAAGCAG CAGACATAAT TGGAGTGGGT 

201 TGGATAAGCA AAGTGATATT CAAAATTTAA ATGAAGAGAG AATCTTAGCT 

251 TTACAGCTTT GTGGGTGGAT AAAGAAAGGA ACGGATGTAG ACGTGGGGCC 

301 ATTTTTGAAC TCCCTTGTAC AAGAAGGGGA ATGGGAAAGA GCTGCTGCTG 

351 tggcattgtt caacttggat attcgccgag caatccaaat cctgaatgaa 
401 ggggcatctt ctgaaaaagg agatctgaat ctcaatgtgg tagcaatggc 
451 tttatcgggt tatacggatg agaagaactc cctttggaga gaaatgtgta 
501 gcacactgCg attacagcta aataacccgt atttgtgtgt catgtttgca 
551 tttctgacaa gtgaaacagg atcttacgat ggagttttgt atgaaaacaa 
601 agttgcagta cgtgacagag tggcatttgc ttgtaaattc cttagtgata 
651 ctcagttaaa tagatacatc gaaaagttga ccaatgaaat gaaagaggct 
701 ggaaatttgg aaggaatttt gcttacaggc cttactaaag atggagtgga 
751 cttaatggag agttatgttg atagaactgg agatgttcaa acagcaagtt 
801 actgtatgtt acagggttca cctttagatg ttcttaaaga tgaaagggtt 
8 51 cagtactgga ttgagaatta tagaaattta ttagatgcct ggaggttttg 
901 gcataaacga gctgaatttg atattcacag gagtaagttg gatcccagtt 
951 ccaagccttt agcacaagtt tttgtgagtt gcaatttctg tggcaagtca 
1001 atctcctaca gctgttcagc tgtgcctcat cagggcagag gttttagtca 
1051 gtatggtgtg agtggctcac caacgaaatc taaagtcaca agttgtcctg 
1101 gctgtcgaaa accacttcct cgatgtgcgc tttgtctcat taatatggga 
1151 acaccagttt ctagctgtcc tggaggaacc aaatcagatg aaaaagtgga 
1201 cttgagcaag gacaaaaaat tagcccaatt taacaactgg tttacatggt 
1251 gtcataattg caggcacggt ggacatgctg gacatatgct tagttggttc 
1301 agggaccatg cagagtgccc tgtgtctgca tgcacgtgta aatgtatgca 
1351 gttggataca acggggaatc tggtacctgc agagactgtc cagccataaa 
14 01 atgttaccac cttaagagaa cccttcaagt gtggagcttt ctagtaggtg 
14 51 tccttcatag ctcagaaaca tacctcagaa caagccattc atgacttacc 
1501 tgtaatggga aaataaatca ttctatcaga tcagcagttt tgatgtttga 
1551 gtgattttga tatgcttcac agagacaaat gctgccaaaa taaacatcga 
1601 agtatagaca tgagttctgt tcagcaggtt gaaaagtctg atttagaaaa 
1651 actttctaag ttttggttga aattatgaac actctagaag cagaatttct 
1701 ggaagagcca agaacagact ttgagcctat atcttcaaag ctgaaactgg 
1751 atatctttca ataaaatatg tgcactttta aaataaaatg actaattctg 
1801 tgattcagac aatagtttta agttcagctg tgcttagatt tctttcagat 
1851 taatttaaaa ttatagattt ttacttttag aattgcagag cccctatccc 
1901 acactggaga atatttttta ttactgtctg ttatatatgt gtctatgtgt 
1951 gtgtgtatat ttatgtgtgt atgtataaat atgtactttt taaaggagcc 
2001 ttttccctcc tttgatttta agataagcaa tcttttggca taacattatc 
2051 gtcttcctag aaaagccaag atgaagaatc tatcttacaa ctttttctct 
2101 tcagtagaga aaaacatgta ccatttcagg tgaacataca aaattttcac 
2151 tttctacctt ttgccttcca atgtcctgat ttgtcttcaa aggtttttct 
2201 ccatattaat ttgtcatctt atcctcatca cctgagaaca ttttactgca 
2251 tacaaagtct atgcaagatt atatgtaact agccatttag tataatctat 
2301 gtcagtgttt ctgtgctgtc aaattccgtc ctgatttgga ataccatacc 
2351 ttgttctttc caaggtagac taggaagtgt tggggaaata gggtcacttc 
2401 agagaccatt ttagatgtaa gtttttaaat gtaagtgtta ctggggctaa 
2451 gtcagggact ttatttaaaa catttttttt ttctcatttc atagctagat 
2501 agttgtaaga gaaatacaaa gaatttacaa gatgcttctc tgtcatctgc 
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2551 CGTATGCAGA GGGACTGAAC TAGGAATTTT GTAGTTGAAG CTGTGTTCAT 
2 601 AAAGAGTAAA TCTTATTTTA TAGATTTTGG AGAAATAAAA CAAGAATTTT 
2651 AAGAGCTTTC GTATTAGCAG TTTTGCCTTA TAAAAACTAA' GATTTGTCAG 
2701 ATTAGTTTGA GGTGTAACCT AAATATTAAA AGTAGATTAA ATTTATTTTT 
2751 TACCTTGAGT GTCTGATACA TAAAACCCTT TTCTAGGAAA ACATTGGAAG 
2801 TAGTACATAT TTACTCTAAA TGTCTCACCT GCATGACAGT CTTTTCAAAT 
2851 GAAAGACATG GTAATTGCAA TTTTTTTTTA AAGATTGCTA TTAAGGGTAC 
2901 TTTTTCCAGC CTTCATTTGA GTAAATCTTA ATTGATTTCA TTTTATTAAC 
2951 ATATACCCTT TACCTTTAAT ATTTCATTTG AAGTGTTCCT TTCAAACTTA 
3001 CTGTCTTAAA TATGAAAGTC AGCTTTAAGT AATGTCAGAC TCATATGCAT 
3051 TTTCATTCTC ATTAGCTAAA GTAAAATGTA AAATTATCTC AAATAGTTAC 
3101 AAGTTTTGGA AATACAGTAT AAAACATGAA TGTAAAGTCT ATTATGTAAT 
3151 ATGCTTATTT GTAATCCTAA TATATGAGGG TGACATTTTT AAGATTGTAT 
3201 GTATGTGTCA ACCTCTTAAA TGTTTTCTGT GAAAAAAAAA AAAAAAAAAA 
3251 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
3301 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
3351 AAA 


BLAST Results 


Ho BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 2 


ORF from 167 bp to 1396 bp; peptide length: 410 
Category: known protein 
Classification: unclassified 


1 MVESSRHNWS GLDKQSDIQM LNEERILALQ LCGWIKKGTD VDVGPFLNSL 
51 VQEGEWERAA AVALFNLDIR RAIQILNEGA SSEKGDLNLN VVAMALSGYT 
101 DEKNSLWREM CSTLRLQLNN PYLCVMFAFL TSETGSYDGV LYENKVAVRD 
151 RVAFACKFLS DTQLNRYIEK LTNEMKEAGN LEGILLTGLT KDGVDLMESY 
201 VDRTGDVQTA SYCMLQGSPL DVLKDERVQY WIENYRNLLD AWRFWHKRAE 
251 FDIHRSKLDP SSKPLAQVFV SCNFCGKSIS YSCSAVPHQG RGFSQYGVSG 
301 SPTKSKVTSC PGCRKPLPRC ALCLINMGTP VSSCPGGTKS DEKVDLSKDK 
351 KLAQFNNWFT WCHNCRHGGH AGHMLSWFRO HAECPVSACT CKCMQLDTTG 
401 NLVPAETVQP 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_7 j8, frame 2 

PIR:S45391 probable membrane protein YBL104C - yeast (Saccharomyces 
cerevisiae), N = 2, Score = 446, P =» 4.5e-47 

TRENBL:AC004982_1 gene: "WUGSC : H_DJ1159O04 . 1"; Homo sapiens PAC Clone 

DJ1159O04 from 7p21-p22, complete sequence., N =» 1, Score = 2038, P - 
7.6e-211 


>TREMBL:AC004 982_1 gene: "wuGSC:H_bJ1159O04 . 1"; Homo sapiens PAC clone 
DJ1159O04 from 7p21-p22, complete sequence. 
Length » 379 

HSPs: 

Score = 2038 (305.8 bits). Expect = 7.6e-211, p ^ 7.6e-211 
Identities « 379/379 (100%), Positives « 379/379 (100%) 

Query: 1 mvessrhnwsgldkqsdiqnlneerilalqlcgwikkgtdvdvgpflnslvqegeweraa 60 

MVESSRHNWSGLDKQSDIQNLNEERILALQLCGWIKKGTDVDVGPFLNSLVQEGEWERAA 
Sbjct: 1 MVESSRHNWSGLDKQSDIQNLNEERILALQLCGWIKKGTDVDVGPFLNSLVQEGEWERAA 60 

Query: 61 AVALFNLDIRRAIQILNEGASSEKGDLNLNVVAMALSGYTDEKNSLWREMCSTLEILQLNN 120 

AVALFNLDIRRAIQILNEGASSEKGDLNLNVVAMALSGYTDEKNSLWREMCSTLRLQLNN 
Sbjct: 61 AVALFNLDIRRAIQILNEGASSEKGDLNLNVVAMALSGYTDEKNSLWREMCSTLRLQLNN 120 
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Query: 121 PYLCVMFAFLTSETGSYDGVLYENKVAVRDRVAFACKFLSDTQLNRYIEKLTNEMKEAGN 180 

PYLCVMFAFLTSETGSYDGVLYENKVAVRDRVAFACKFLSDTQLNRYIEKLTNEMKEAGN 
SbjCt: 121 PYLCVMFAFLTSETGSYDGVLYENKVAVRDRVAFACKFLSDTQLNRYIEKLTNEMKEAGN 180 

Query: 181 LEGILLTGLTKDGVDLMESYVDRTGDVQTASYCMLQGSPLDVLKDERVQYWIENYRNLLD 240 

LEGILLTGLTKDGVDLMESYVDRTGDVQTASYCMLQGSPLDVLKDERVQYWIENYRNLLD 
Sbjct: 181 LEGILLTGLTKDGVDLMESYVDRTGDVQTASYCMLQGSPLDVLKDERVQYWIENYRNLLD 240 

Query: 241 AWRFWHKRAEFDIHRSKLDPSSKPLAQVFVSCNFCGKSISYSCSAVPHQGRGFSQYGVSG 300 

AWRFWHKRAEFDIHRSKLDPSSKPLAQVFVSCNFCGKSISYSCSAVPHQGRGFSQYGVSG 
Sbjct: 241 AWRFWHKRAEFDIHRSKLDPSSKPLAQVFVSCNFCGKSISYSCSAVPHQGRGFSQYGVSG 300 

Query: 301 SPTKSKVTSCPGCRKPLPRCALCLINMGTPVSSCPGGTKSDEKVDLSKDKKLAQFNNWFT 360 

SPTKSKVTSCPGCRKPLPRCALCLINMGTPVSSCPGGTKSDEKVDLSKDKKLAQFNNWFT 
Sbjct: 301 SPTKSKVTSCPGCRKPLPRCALCLINMGTPVSSCPGGTKSDEKVDLSKDKKLAQFNNWFT 360 

Query: 361 WCHNCRHGGHAGHMLSWFR 379 

WCHNCRHGGHAGHMLSWFR 
Sbjct: 361 WCHNCRHGGHAGHMLSWFR 379 


Pedant inforznatxon for DKFZphtes3_7j8, frame 2 


Report for DKFZphtes3_7j8.2 


(LENGTH] 410 

[MW] 45862.45 

(plj 6.51 

[HOMOLJ TREMBL:AC004982_1 gene: "WUGSC :H_DJ1159O04 . 1"; Homo sapiens PAC clone DJ1159O04 
from 7p21-p22, complete sequence. 0.0 

(FUNCAT] 99 unclassified proteins [S. cerevisiae, YBLX04c] 7e-48 

[BLOCKS) BL00028 Zinc finger, C2H2 type, domain proteins 

(BLOCKS] BL00534A Ferrochelatase proteins 

(PIRKW) transmembrane protein 2e-46 

(KW] All_Alpha 


SEQ 
PRD 


MVESSRHNWSGLDKQSOIQNLNECRI LALQLCGHI KKGTOVDVGPFLNSLVQEGEWERAA 
cccccccccccccccccchhhhhhhhhhhhhhccccccccccccccccccccccchhhhh 


SEQ 
PRD 


AVALFNLDZRRAIQILNEGASSEKGDLNLNWAMALSGYTDEKNSLWREMCSTLRLQLNN 
hhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhccc 


SEQ 
PRD 


PYLCVMFAFLTSETGSYDGVLYENKVAVRDRVAFACKFLSDTQLNRYIEKLTNEMKEAGN 
ccccceeeccccccccccceeeccchhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhcc 


SEQ 
PRD 


LEGI LLTGLTKDGVDLMES YVDRTGDVQTASYCMLQGS PLDVLKDERVQYWI ENYRNLLD 

cceeeeeeccccchhhhhhhhcccccceeeeeccccccccccchhhhhhhhhhhhhhhhh 


SEQ 
PRD 


AWRFWHKRAEFDIHRSKLDPSSKPLAQVFVSCNFCGKSISYSCSAVPHQGRGFSQYGVSG 
hhhhhhhhhhhhhhcccccccccceeeeeeeccccccccccccccccccccccccccccc 


SEQ 
PRD 


SPTKSKVTSC PGCRKPLPRCALCLI NMGTPVSSCPGGTKS DEKVDLSKDKKLAQFNNWFT 
ccccccccccccccccccceeeeecccccccccccccccccceeeehhhhhhhhhcceee 


SEQ 
PRD 


WCHNC RHGGHAGKMLSWFRDHAEC P VSACTCKCMQLDTTGNLVPAETVQP 
eecccccccccchhhhhhhhhccccccccccccccccccccccccccccc 


{No Prosite data available for DKFZphtes3_7j8.2} 


(No Pfam data available for DKFZphtes3_7j8 .2) 
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DKr2phtes3_7plO 


group: Cell Cycle 

DKFZphtes3 7pl0.1 encodes a novel 422 amino acid putative protein, which is closely related to 

the Xenopus laevis XPMC2 protein. 

In fission yeast the kinases Weel and Mikl control that initiation of mitosis starts after 
completion of DNA synthesis. Yeast in which both Weel and Mikl kinases are defective exhibit a 
mitotic catastrophe phenotype. XPMC2 of xenopus rescues several different yeast mitotic 
catastrophe mutants defective in Weel/Mikl kinase function. The XPMC2 protein is localised in 
the nucleus in Xenopus oocytes. The new protein is the human orthologue of this gene. 

The new protein can find application in modulating/blocking the cell cycle. 


strong similarity to XPMC2 protein 
complete cDMA, complete cds, EST hits 
Sequenced by BMFZ 
Locus: /map«"9q34" 
Insert length: 2380 bp 

Poly A stretch at pos. 2341, polyadenylation signal at pos. 2318 


1 AGCGTGCGTG CTGAGGTATG CGCAACGCGT GCGGGGTCTC TTCCGGAGTC 
51 TTTTCCTGGA CGGGGTCCCT GCGGTGGGTG TGTTTCGGCC TGGCCTGGGC 
101 AGGCGCTTGT GCTGCCAGGG CGCCGGGCCC GGGGAGGCCG GGGTCTCGGG 
151 TGGCCGCCGG CCCAGGCGCT GGACGGCAGC AGGATGGGGA AGGCGAAGGT 
201 CCCCGCCTCC AAGCGCGCCC CGAGCAGCCC CGTGGCTAAG CCGGGTCCTG 
251 TCAAGACGCT CACTCGGAAG AAAAACAAGA AGAAAAAAAG GTTTTGGAAA 
301 AGCAAGGCGC GGGAAGTAAG CAAGAAGCCA GCAAGCGGCC CCGGTGCTGT 
351 GGTGCGACCT CCAAAGGCAC CAGAAGACTT TTCTCAAAAC TGGAAGGCGC 
401 TGCAAGAGTG GCTGCTGAAA CAAAAATCTC AGGCCCCAGA AAAGCCTCTT 
451 GTCATCTCTC AGATGGGTTC CAAAAAGAAG CCCAAAATTA TCCAGCAAAA 
501 CAAAAAAGAG ACCTCGCCTC AAGTGAAGGG AGAGGAGATG CCGGCAGGAA 
551 AAGACCAGGA GGCCAGCAGG GGCTCTGTTC CTTCAGGTTC CAAGATGGAC 
601 AGGAGGGCGC CAGTACCTCG CACCAAGGCC AGTGGAACAG AGCACAATAA 
651 GAAAGGAACC AAGGAAAGGA CAAATGGTGA TATTGTTCCA GAACGAGGGG 
701 ACATCGAGCA TAAGAAGCGG AAAGCTAAGG AGGCAGCCCC AGCCCCACCC 
751 ACCGAGGAAG ACATCTGGTT TGACGACGTG GACCCAGCGG ATATCGAAGC 
801 TGCCATAGGT CCAGAGGCGG CCAAGATAGC GAGGAAACAG TTGGGTCAGA 
851 GCGAGGGCAG CGTCAGCCTC AGCCTCGTGA AAGAGCAGGC CTTCGGCGGC 
901 CTGACAAGAG CCTTAGCCTT GGACTGTGAG ATGGTGGGCG TGGGCCCTAA 
951 GGGGGAGGAG AGCATGGCCG CCCGTGTGTC CATCGTGAAC CAGTATGGGA 
1001 AGTGCGTTTA TGACAAGTAC GTCAAACCAA CTGAGCCCGT GACGGACTAT 
1051 AGGACAGCGG TCAGTGGGAT TCGGCCTGAG AACCTCAAGC AGGGAGAAGA 
1101 GCTTGAAGTT GTTCAGAAGG AAGTGGCAGA GATGCTGAAG GGCAGAATTC 
1151 TAGTGGGGCA CGCTCTGCAT AATGACCTAA AGGTACTATT TCTTGATCAT 
1201 CCAAAAAAGA AGATTCGGGA CACACAGAAA TATAAACCTT TCAAGAGTCA 
1251 AGTAAAGAGT GGAAGGCCGT CTCTGAGACT ACTTTCAGAG AAGATCCTTG 
1301 GGCTCCAGGT CCAGCAGGCG GAGCACTGTT CAATTCAGGA TGCCCAGGCA 
1351 GCAATGAGGC TGTACGTCAT GGTGAAGAAG GAGTGGGAGA GCATGGCCCG 
1401 AGACAGGCGC CCCCTGCTGA CTGCTCCAGA CCACTGCAGT GACGACGCCT 
1451 AGCAGTCCTG CCCTGCTGCT GCTGCCGCCC CGCTACAGAG GCAATGTGAC 
1501 CAGTCACAGG GACAGATCAC ATCTCCCCAG AGTGGCAACT CTGGTGAAAC 
1551 CTTTTCAGAA TCATGGCAGA GGGGCGTGGC GTGGTGCTAC TGAGAAGGTC 
1601 CTCCTTCCTC TTGACTTTGT GGTCTGAAAC CTGGTCTTAC TGTCCATGTG 
1651 TGTTTGGGCC CGGATGGTCA GGGTGGGGAG CAGGGACGGC CATGGGCACG 
1701 CCTGGCCACG CTTTACCGAC TGCTGACCCC CTGGGCCAGG TGAGGTTGGG 
1751 GCCTGTGGGC CGCCAGTCCA TACGGTGCTG TCACTGCCCA TCTTCGGTGA 
1801 CACCCTGGGG TGAGGTGCTC AGCACCTTCC TCTCGAGGAG CCACATTTTC 
1851 CTCCTTTGTG TTAGGGGACA TAACAAGCTC TGCTGGGCTT GAGGGACCCA 
1901 GACCAGGTGT CTGCAGTCAG CTCCTGAGAC ACAGCTGGCC GGCACAACAG 
1951 GTGTTACATC AGGGGTTTCC TGTGGCCGTT TGAACTTTGA GCATTTATCT 
2001 AAATTAAATT GGCCCAGGGT TGGCTGGTGG GTCACCCAGC AGAGGCTTCT 
2051 CCCCATAGCA CGAGGATGTG TTGCCTGGGC ACGGTGACTG CGGTTATTCC 
2101 TGGAGGTCGG CAGACATGCC AACCTTGGGC TATTTGAGCT GGAGAAGCTA 
2151 TGTGATGCTA GCCGGTGGCT TTCTGGGCTA GGCCCCAGTT TGAGGCTCCC 
2201 CTGGGAACTA GAGCCAGGAA CAGCCAGTGG CACTGACAAG GGGACGGAGT 
2251 CCAAGGCGTT ATTGGGCCAC CTGACAGCTG GACAGAAAAG GGGCAGACAC 
2301 ACCGAGGATG CGATTTAAAA TAAATGCAGA TGTTTACTTG GAAAAAAAAA 
2351 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
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Entry HSAC2099 from database EMBL: 

♦** SEQUENCING IN PROGRESS *** Genomic sequence from Human 9q34; HTGS 
phase 1, 2 unordered pieces. 

Score = 5055, P « O.Oe+00, identities = 1011/1011 
8 exons Bp 104219-116190 


Medline entries 


95157530: 

Cloning and expression of a Xenopus gene that prevents mitotic 
catastrophe in fission yeast. 


Peptide information for frame 1 


ORF from 184 bp to 1449 bp; peptide length: 422 
Category: strong similarity to known protein 


1 MGKAKVPASK RAPSSPVAKP GPVKTLTRKK NKKKKRFWKS KAREVSKKPA 
51 SGPGAVVRPP KAPEDFSQNW KALQEWLLKQ KSQAPEKPLV IS(»1GSKKKP 
101 KIIQQNKKET SPQVKGEEMP AGKDQEASRG SVPSGSKMDR RAPVPRTKAS 
151 GTEHNKKGTK ERTNGDIVPE RGDIEHKKRK AKEAAPAPPT EEDIWFDDVD 
201 PADIEAAIGP EAAKIARKQL GQSEGSVSLS LVKEQAFGGL TRALALDCEM 
251 VGVGPKGEES MAARVSIVNQ YGKCVYDKYV KPTEPVTDYR TAVSGIRPEN 
301 LKQGEELEW QKEVAEMLKG RILVGHALHN DLKVLFLDHP KKKIRDTQKY 
351 KPFKSQVKSG RPSLRLLSEK ILGLQVQQAE HCSIQDAQAA MRLYVMVKKE 
401 WESMARDRRP LLTAPDHCSD DA 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_7plO, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_7plO, frame 1 


Report for DKFZphtes3_7plO . 1 


[LENGTH) 422 

(MWJ 46671.91 

Cpll 9.79 

[HOMOL] PIR:S53818 XPHC2 protein - African clawed frog 7e-96 

fFONCAT] 03.22 cell cycle control and mitosis (S. cerevisiae, YOL080c] 2e-42 

[FUNCAT] 01.03.16 polynucleotide degradation IS. cerevisiae, YGR276c) 2e-19 

[FUNCAT] 05.04 translation (initiation, elongation and termination) [S. cerevisiae, 

YGL094C1 7e-13 

[FUNCAT] 04.05.05 mrna processing (5*-end, 3'-end processing and mrna degradation) [S. 

cerevisiae, YGL094c] 7e-13 

[FUNCAT] 99 unclassified proteins [S. cerevisiae, yLR107w) 6e-10 

[PROSITE] RGD 1 

[PROSITE] MYRISTYL 4 

[PROSITE) CAMP_PHOSPHO_SITE 2 

(PROSITE) CK2_PH0SPH0_SITE 6 

[PROSITE) TYR_PHOSPHO_SITE 2 

[PROSITE) GLYCOSAMINOGLYCAN 1 

(PROSITE) PRC PHOSPHORS I TE 8 

(KW) All"Alpha 

[KW) LOW_CC»lPLEXITy 11.37 % 

SEQ MGKAKVPASKRAPSSPVAKPGPVKTLTRKKNKKKKRFWKSKAREVSKKPASGPGAVVRPP 

SEG xxxxxxxxxxxxxxxxxx 

PRO cccccccccccccccccccccccccchhhhhhhhhhhhhhhhhccccccccccccccccc 

SEQ KAPEDFSQNWKALQEWLLKQKSQAPEKPLVI SQMGSKKKPKI IQQNKKETSPQVKGEEMP 
SEG xxxxxxxxxxxx 


PRO cccccccchhhhhhhhhhhhhhhcccccccccccccccccceeeecccccccccccccee 
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SEQ AGKDQEASRGSVPSGSKMDRRAPVPRTKASGTEHNKKGTKERTHGDIVPERGDI EHKKRK 

SE§ ^™^^^^^^"^^°y°^*°^^«^=PEAAKIARKQLGQSEGSVSLSLVKEQAFGGL 

PRD hhhhcccccccceeeecccccchhhAhhccchhhhhhhhhhcccccchAhhhhhhhhhhh 

SEG ^^^^°^^<=VGPKGEESMAARVSIVMQYGKCVyDKYVKPTEPVTDyRTAVSGIRPEN 

PRD hhhcccccccccccccchAhhtahhhccccccce^e^ww 

SEQ I-KQGEELEWQKEVAEMLKGRILVGHALHNDLKVLFLDHPKKKIROTQKyKPFKSQVKSG 

PRD cc<:cchhhhhhhhhhhhi;hcc;;Mccchhhhhhhhhcccc^ 

SEQ RPSLRI-LSEKILGLQVQQAEHCSlQDAQAAMRLyVMVKKEWESMARDRRPLLTAPDHCSD 

PRD <:hhhhhhhhhhhhhhhccccccchhAAi;AAhhhhhhhhhhhhhAAhAhccccccc^ 

SEQ OA 

SEG 

PRD cc 


Prosite for DKFZphtes3_7plO . 1 


PS00002 

51->55 

PS00004 

107->iii 

PS00004 

156->160 

PS00005 

9->12 

PS00005 

27->30 

PS00005 

46->49 

PS00005 

96-->99 

PS00005 

347->350 

PS00005 

359->362 

PS00005 

363->366 

PS00005 

368->371 

PS00006 

136->140 

PS00006 

150->154 

PS00006 

163->167 

PS00006 

190->194 

PS00006 

383->387 

PS00006 

413->417 

PS00007 

343->351 

PS00007 

342->35l 

PS00008 

130->136 

PS00008 

151->157 

PS00008 

221->227 

PS00008 

239->245 

P&00016 

171->174 


GL YCOS AMI NOGL YC AN 

CAMP_PHOSPHO_SITE 

C AMP_PHOS PHO_S I TE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC PH0SPHO_SITE 

PKC"PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC__PHOS PHO_S ITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPHO_SITE 

CK2_PH0S PHO_S ITE 

CK2_PHOSPHO_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

TyR_PHOSPHO_SITE 

TYR_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

RGD 


PDOC00002 

PDCX:00004 

PDOC00004 

PDCX:00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00006 

PDOC00006 

PDOC00006 

PCK3C00006 

PDOC00006 

PDOC00006 

PDOC00007 

PDOC00007 

POOC00008 

PDOC00008 

PDOC00008 

PDOC00008 

PDOC00016 


(No Pfam data available for DKF2phtes3__7plO. 1) 


953 


wo 01/12659 


PCT/IBOO/01496 


DKFZphtes3_7p9 


group: nucleic acid management 

DKFZphtes3_7p9 encodes a novel 691 amino acid protein with similarity to human nuclear domain 

10 protein NDP52 . 

The nuclear domain (ND)IO also described as POD or Kr bodies is involved in the development of 
acute promyelocytic leukemia and virus-host interactions. The NDP52 protein is part of this 
complex structure. In vivo, NDP52 is transcribed in all human tissues, but is redistributed 
upon viral infection and interferon treatment. NDIO plays an important role in the viral life 
cycle . 

The novel protein is similar to NDP52. It contains three leucine zippers and a RGD cell 
attachment site. This protein seems to be a novel part of the ND819) complex. 

The new protein can find application in modulation of viral infections and tumour events. 


similarity to nuclear domain 10 protein NDP52 
complete cDNA, complete cds, EST hits 
Sequenced by BMFZ 

Locus: /map=" 329.1 cR from top of Chrl2 linkage group" 
Insert length: 3003 bp 

Poly A stretch at pos. 2957, no polyadenylation signal found 


1 AAGGTGAGGG GAACAGCTGA TCCGTCTGTT GGGAGGACAG ATATCTCAAG 
51 GCCAGGATGG AAGAATCACC ACTAAGCCGG GCACCATCCC GTGGTGGAGT 
101 CAACTTTCTC AATGTAGCCC GGACCTACAT CCCCAACACC AAGGTGGAAT 
151 GTCACTACAC CCTTCCCCCA GGCACCATGC CCAGTGCCAG TGACTGGATT 
201 GGCATCTTCA AGGTGGAGGC TGCCTGTGTT CGGGATTACC ACACATTTGT 
251 GTGGTCTTCC GTGCCTGAAA GTACAACTGA TGGTTCCCCC ATTCACACCA 
301 GTGTCCAGTT CCAAGCCAGC TACCTGCCCA AACCAGGAGC TCAGCTCTAC 
351 CAGTTCCGAT ATGTGAACCG CCAGGGCCAG GTGTGTGGGC AGAGCCCCCC 
401 TTTCCAGTTC CGAGAGCCAA GGCCCATGGA TGAACTGGTG ACCCTGGAGG 
4 51 AGGCTGATGG GGGCTCTGAC ATCCTGCTGG TTGTCCCCAA GGCAACTGTG 
501 TTACAGAACC AGCTCGATGA GAGCCAGCAA GAACGGAATG ACCTGATGCA 
551 GCTGAAGCTA CAGCTGGAGG GACAGGTGAC AGAGCTGAGG AGCCGAGTGC 
601 AGGAGCTCGA GAGGGCTCTG GCAACTGCCA GGCAGGAGCA CACGGAGCTG 
651 ATGGAACAGT ACAAGGGGAT TTCCCGGTCC CATGGGGAGA TCACAGAAGA 
701 GAGGGACATC CTGAGCCGGC AACAGGGAGA CCATGTGGCA CGCATCCTGG 
751 AGCTAGAGGA TGACATCCAG ACCATCAGTG AGAAAGTGCT GACGAAGGAA 
801 GTGGAGCTGG ACAGGCTTAG AGACACAGTG AAGGCCCTGA CTCGGGAACA 
851 AGAGAAGCTC CTTGGGCAAC TGAAAGAAGT ACAAGCAGAC AAGGAGCAAA 
901 GTGAGGCTGA GCTCCAAGTG GCACAACAGG AGAACCATCA CTTAAATTTG 
951 GACCTGAAGG AGGCGAAGAG CTGGCAAGAG GAGCAGAGTG CTCAGGCTCA 
1001 GCGACTGAAA GACAAGGTGG CCCAGATGAA GGACACCCTA GGCCAGGCCC 
1051 AGCAGCGGGT GGCCGAGCTG GAGCCCTTGA AGGAGCAGCT TCGAGGGGCC 
1101 CAGGAGCTTG CAGCCTCAAG CCAGCAGAAA GCCACCCTTC TTGGGGAGGA 
1151 GTTGGCCAGC GCAGCAGCAG CCAGGGACCG CACCATAGCC GAACTACACC 
1201 GCAGCCGCCT GGAAGTGGCT GAAGTTAACG GCAGGCTGGC TGAGCTCGGT 
1251 TTGCACTTGA AGGAAGAAAA ATGCCAATGG AGCAAGGAGC GGGCAGGGCT 
1301 GCTGCAGAGT GTGGAGGCAG AGAAGGACAA GATCCTGAAG CTGAGTGCAG 
1351 AGATACTTCG ATTGGAGAAG GCAGTTCAGG AGGAGAGGAC CCAAAACCAA 
1401 GTGTTCAAGA CTGAGCTGGC CCGGGAGAAG GATTCTAGCC TGGTACAGTT 
1451 GTCAGAAAGT AAGCGGGAGC TGACAGAGCT GCGGTCAGCC CTGCGTGTGC 
1501 TCCAGAAGGA AAAGGAGCAG TTACAGGAGG AGAAACAGGA ATTGCTAGAG 
1551 TACATGAGAA AGCTAGAGGC CCGCCTGGAG AAGGTGGCAG ATGAGAAGTG 
1601 GAATGAGGAT GCCACCACAG AGGATGAGGA GGCCGCTGTG GGGCTGAGCT 
1651 GCCCGGCAGC TCTGACAGAC TCAGAGGACG AGTCCCCAGA AGACATGAGG 
1701 CTCCCACCCT ATGGCCTTTG TGAGCGTGGA GACCCAGGCT CCTCTCCTGC 
1751 TGGGCCTCGA GAGGCTTCTC CCCTTGTTGT CATCAGCCAG CCGGCTCCCA 
1801 TTTCTCCTCA CCTCTCTGGG CCAGCTGAGG ACAGTAGCTC TGACTCGGAG 
1851 GCTGAAGATG AGAAGTCAGT CCTGATGGCA GCTGTGCAGA GTGGGGGTGA 
1901 GGAGGCCAAC TTACTGCTTC CTGAACTGGG CAGTGCCTTC TATGACATGG 
1951 CCAGTGGCTT TACAGTGGGT ACCCTGTCAG AAACCAGCAC TGGGGGCCCT 
2001 GCCACCCCCA CATGGAAGGA GTGTCCTATC TGTAAGGAGC GCTTTCCTGC 
2051 TGAGAGTGAC AAGGATGCCC TGGAGGACCA CATGGATGGA CACTTCTTTT 
2101 TCAGCACCCA GGACCCCTTC ACCTTTGAGT GATCTTACTC CCTCGTACAT 
2151 GCACAAATAC ACACTCATGC ACACACACAC TCACACACAT GCATACACTT 
2201 AGGTTTCATG CCCATTTTCT ATCACACTGG GCTCCATGAT ATTCTGTTCC 
2251 CTAAGAACTG CTTCTGTGTG CCCTGTTTTC ATCCCAAGAT TTCTCACTTC 
2301 ATCCTCTCCT ACCTGGCTCT TTTGTCCCAG GGAGGGGTCC TGTTCGGAAG 
2351 CAGTGGCTGA ATTTATCCCC TGAAAGTGGT TTTGGAGGAA CCGGGATGGA 
2401 GGAGGCCTTC CCCTGTGGGA ATAGAATCGT CCACTCCTAG CCCTGGTTGC 
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2451 TTCTGATACA CAGCCACTGC ACACACACAC TCACACTCAC ACTCCCTTGT 
2501 CTGATGCCCC AAAGCCAATT CCTGGGGCAC CCTACCCTCT CTTATtJggI 
2^' flllrrTr?. GTTTACCTGA GTTTTCTCTG GGGTCTGCAC AGAGGCAGCA 

III] tS^.S^^'^''^ catggcctct caggtccctt ttggttctca gtttcattgg 
lin] JIS^™'' tgttccccca ttgacttctg tgccccaccc tagccttttc 

2701 CATAACCTTA GGTATTCAGT TTGGAGGGGT TTTTTGTATT TTTGAGGATT 
^pm ^"^^^T^CT GTATCCTCTC CTCGCATCTC CTCACATGGA aIgaS? 

It^] rlrir'''''''' cttctgtgag gaatgggggg aacaagtggt CCCAGGTATC 

2^0^ ™ccS CCCTCTCCAG GTCCCCCCAC AGCA^tI^a 

lit] r^ll^?E^l gatatccatc cctttgtagt ttgaacaaat atatttatat 
loll ^'^^^^'^ aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaI 


BLAST Results 


Entry HS189353 from database EMBL- 
human STS Wl-11261. 
Score « 2191, P - 1.4e-92, identities - 463/485 


Medline entries 


95310349: 

nnrTS^i^H ^^^"^terization of NDP52, a novel protein of the 
nuclear domain 10, which is redistributed upon virus 
infection and interferon treatment. 

97375672: 

Sot^piotein2"^''^''"' expression, and structure of the nuclear 


Peptide information for frame 3 


ORF from 57 bp to 2129 bp; peptide length: 691 
Category: similarity to known protein 
Prosite motifs: RGD (557-560) 
LEUCINE_ZIPPER (163-185) 
LEUCINEZIPPER (475-497) 
LEUCINE_ZIPPER (482-504) 


1 MEESPLSRAP SRGGVNFLNV ARTYIPNTKV ECHYTLPPGT MPSASDWTGT 
51 FKVEAACVRD YHTFVWSSVP ESTTDGSPIH TSVQFOASYL PKPCAof^n^ 
ll\ Tn7nTrSil ^Q^^^^^^^ PRPMDELVTL IZ^fsOjl LVVpSq 
151 NQLDESQQER NDLMQLKLQL EGQVTELRSR VQELERALAT AROEHTELME 
11] EITEERDILS RQQGDHVARI LELEDMQTI SEKvSe 

251 LDRLRDTVKA LTREQEKLLG QLKEVQADKE QSEAELQVAQ QEKHHLNLDL 

J^ll^'r^ll^ SAQAQRLKDK VAQMKDTLGQ AQQRVAELEP 
351 LAASSQQKAT LLGEELASAA AARDRTIAEL HRSRLEVAEV NGRIAELGLH 
11] ™o2:^^c ^^^^^QSVE AEKDKILKLS AEILRLEKAV Sq^VF 
tm llfi^^l^t SLVQLSESKR ELTELRSALR VLQKEKEQLQ EEKQe£l5^ 
501 RKLEARLEKV ADEKWNEDAT TEDEEAAVGL SCPAALTDSE DESPEDMRLP 
551 PYGLCERGDP GSSPAGPREA SPLVVISQPA PISPHLSGPA EdIssSsSe 
601 DEKSVLMAAV QSGGEEANLL LPELGSAFYD MASGFTVGTL SETSTOG^^ 
651 PTWKECPICK ERFPAESDKD ALEDHMDGHF FFSTQDPFTF ^^^^^^^^'^ 

BLASTP hits 

Ho BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_7p9, frame 3 
"^^7^28''''''^"" P'^^^^" "'"^^ - N - 2^ Score - 307, 
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PIR:G02043 TXBP151 - human, N = 2, Score = 270, P = 8.5e-25 

TREMBIj:DM35816_4 gene: "zip**; product: "nonmuscle myosin- 1 1 heavy 
chain**; Drosophila melanogaster nonmuscle myosin-II heavy chain (zip) 
gene, complete cds., N - 1, Score » 254, P « 1.4e-17 


>PIR:A56733 nuclear domain 10 protein NDP52 
Length = 446 

HSPs: 


human 


Score = 307 (46.1 bits). Expect = 7.7e-28, Sum P(2) =^ 7.7e-28 
Identities = 104/323 (32%), Positives = 158/323 (48%) 

Query: 15 VNFLNVARTYIPNTKVECHYTLPPGTMPSASDWIGIFKVEAACVRDYHTFVWSSVPESTT 74 

V F +V + YIP V CHYT +P DWIGIF+V R+Y+TF+W ++P 
Sbjct: 23 VIFNSVEKFYIPGGDVTCHYTFTQHFIPRRKDWIGIFRVGWKTTREYYTFMWVTLPIDLN 82 

Query: 75 DGSPIHTSVQFQASYLPKPGAQLYQFRYVNRQGQVCGQSPPFQFREPRPMDELVTLEEAD 134 

+ S VQF+A YLPK + YQF YV+ G V G S PFQFR D LV + 

Sbjct: 83 NKSAKQQEVQFKAYYLPKDD-EYYQFCYVDEEXSWRGASIPFQFRPENEEDILVVTTQ— 139 

Query: 135 GGSDILLVVPECATVLQNQ-LDES— QQERNDLMQLKLQLEGQVTE-LRSRVQELERALA 189 

G + + K +NQ L +S Q++K MQ +LQ + + E L+S ++LE + 
Sbjct: 140 GEVEEIEOHNKELCKENQELKDSCISLQKQNSDMQAELQKKQEELETLQSINKKLELKVK 199 

Query: 190 TARQE-HTELMEQYKGISRSHGEITEERDI-LSRQQGDHVARILELEDDIQTISEKVLTK 247 

+ TEL+ QK++ E+I + + Q + E+E +Q +K T+ 

Sbjct: 200 EQKDYWETELL-QLKEQNQKMSSENEKMGIRVDQLQAQLSTQEKEMEKLVQGDQDK--TE 256 

Query: 248 EVE-LDRLRDTVKALTREQEKLLGQLKEVQAOKEQSEAELQVAQQENHHLNLDLKEAKSW 306 

++E L + D + EQ K +L++ +Q+E QQE N DL + S 

Sbjct: 257 QLEQLKKENDHLFLSLTCQRK[>QKKLEQTVEQMKQNETTAMKKQQELMD£NFDLSKRLSE 316 

Query: 307 QEEQSAQAQRLKDKVAQMKDTLGQAQQRV 335 

E QR K+++ D L + R+ 

Sbjct: 317 NEIICNALQRQKERLEGENDLLKRENSRL 345 

Score • 304 {45,6 bits). Expect = 2.1e-27, Sum P(2) - 2.1e-27 
Identities » 98/337 (29%), Positives » 163/337 (48%) 

Query: 15 VNFLNVARTYIPNTKVECHYTLPPGTMPSASDWIGIFKVEAACVRDYHTFVWSSVPESTT 74 

V F +V + YIP V CHYT +P DWIGIF+V R+Y+TF+W ++P 
Sbjct: 23 VIFNSVEKFYIPGGDVTCHYTFTQHFIPRRKDWIGIFRVGWKTTREYYTFMWVTLPIDLN 82 

Query: 7 5 DGSPIHTSVQFQASYLPKPGAQLYQFRYVNRQGQVCGQSPPFQFREPRPMDELVTLEEAD 134 

+ S VQF+A YLPK + YQF YV+ G V G S PFQFR P +E 
Sbjct: 83 NKSAKQQEVQFKAYYLPKDD-EYYQFCYVDEDGWRGASIPFQFR PENE 130 

Query: 135 GGSOILLWPKATVLQNQLDESQQERNDLMQLKLQLEGQVTELRSRVQELERALATARQE 194 

DIL+V Q +++E +Q +L + +L+ L+ + +++ L +QE 

Sbjct: 131 — EDILVVTT QGEVEEIEQHNKELCKENQELKDSCISLQKQNSDMQAELQK-KQE 182 

Query: 195 HTELMEQYKGISRSHGEITEERDILSRQQGDH-VARILELEDDIQTISEKVLTKEVELDR 253 

E ++ I ++ ++ ++Q D+ +L+L++ Q +S + + +D+ 

Sbjct: 183 ELETLQS INKKLELKVKEQKDYWETELLQLKEQNQKMSSENEKMGIRVOQ 232 

Query: 254 LRDTVKALTREQEKLL--GQLKEVQAD KEQSEAELQVAQQENHHLNLDLKEAKSWQE 308 

L+ + +E EKL+ Q K Q + KE L + +Q L+ + Q 

Sbjct: 233 LQAQLSTQEKEMEKLVQGDQDKTEQLEQLKKENDHLFLSLTEQRKDQKKLEQTVEQMKQN 292 

Query: 309 EQSA— QAQRLKDKVAQMKDTLGQAQQRVAELEPLKEQLRGAQEL 351 

E +A + Q L D+ + L + + L+ KE+L G +L 

Sbjct: 293 ETTAMKKQQELMDEHFOLSKRLSENEIICHALQRQKERLEGENDL 337 

Score = 124 (18,6 bits). Expect = 2.3e-06, Sum P(2) = 2.3e-06 
Identities = 53/227 (23%), Positives « 113/227 (49%) 

Query: 138 DILLWPKATVLQNQLDESQQERNDLMQLKLQLEGQVTELRSRVQELERALATARQEHTE 197 

DIL+V Q +++E +Q +L + +L+ L+ + +++ L +QE E 

Sbjct: 132 DILVVTT QGEVEEIEQHHKELCKENQELKDSCISLQKQNSDMQAELQK-KQEELE 185 

Query: 198 LMEQYKGISRSHGEITEERDILSRQQGDH-VARILELEDDIQTISEKVLTKEVELDRLRD 256 

++ I ++ ++ ++Q D+ +L+L++ Q +S + + +D+L+ 

Sbjct: 186 TLQS INKKLELKVKEQKDYWETELLQLKEQNQKMSSENEKMGIRVDQLQA 235 

Query: 257 TVKALTREQEKLLGQLKEVQADKEQSEAELQVAQQENHHLNLDLKEAKSWQEEQSAQAQR 316 

+ +E EKL VQ D++++E +L+ ++EN HL L L E + Q++ ++ 

Sbjct: 236 QLSTQEKEMEKL VQGDQDKTE-QLEQLKKENDHLFLSLTEQRKDQKKLEQTVEQ 288 
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Query: 317 LK-DKVAQMKDTLGQAQQRVAELEPLKEQLRGAQELA-ASSQQKATLLGE 364 

+K ++ MK + Q+ + E L ++L + + A +0K L CE 
Sbjct: 289 MKQNETTAMK-— KQQELMDENFDLSKRLSENEIICNALQRQKERLEGE 334 

Score = 103 (15.5 bits). Expect 4.4e-04, Sum P(2) - 4 4e-04 
Identities - 63/278 (22%), Positives = 123/278 (44%) ^"^^ °^ 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 


299 DLKEAKSWQEEQSAQAQRLK^ DTLGQAQQRVAELEPLKEQLRGAQELAAS 354 

^ Q D + Q++ ELE L + + TT 

141 EVEEIEOHNKELCKENQELKDSCISLQKQNSOMQAELQKKQEEIETL-QSINKKLELKVK 199 

355 SQQKATLLGEELASAAAARDRTIAELHRSRLEVAEVNGRLAELGLHLKEEKCQWSKERAG 414 
EL + +E + + V ++ +L+ + E+ n +** 

200 EQKD-YWETEI,LQLKEQNQKMSSENEKMGIRVDQLQAQI,STQEKEM-EKLv5gdQDKTE 256 

415 I.IQSVM|KDKI-LKLSAEII,---RLEKAVQEERTQNQVFKTEW^ 470 

D + h h+ + +LE+ V E+ QN+ T + ++++ avo 

257 QI.EQLKKENDHLFLSLTEQRKDQKKLE(yrv-EQMKQNET-TAMKKQQELMDENFDLSKR 313 

471 ELTELRSAlRVLQKEKEQLQEEKQELLEyMRKLEARLEKVADE^^ 537 

L+E LQ++KE+L+ E +LL ++ +RL +n t np a 

314 -LSENEIICNALQRQKERLEGEN-DLL-..KRENSRLLSYMGLDFNSLPyQVPTSDEGG^ 368 

528 — VGLSCPAALTD-SEDESPEDMRLPPYGLCERGDPGSSPAGPREASPL 573 

E SP + + +C+ D ++ PT 

369 RQNPGLAYGNPYSGIQESSSPSPLSIKKCPICKADDICDHTLEQQQMQPL 418 


Score - 64 (9.6 bits). Expect = 7.7e-28, Sum P(2) 
Identities = 13/29 (44%), Positives - 17/29 (58%) 

Query: 651 PTWKECPICKERFPAESDKDALEDHMDGH 679 

P CPIC + FPA ++K EDH+ H 
Sbjct: 417 PLCFNCPICDKIFPA-TEKQIFEDHVFCH 444 

Score - 64 (9.6 bits). Expect = 5.8e+00, Sum P(2) 
Identities - 26/90 (28%), Positives = 45/90 (50%) 


« 7.7e-28 


l.Oe+00 


Query: 
Sbjct: 
Query: 
Sbjct: 


470 RELTELRSALRVLQKEKEQLQEE— KQELLEYMRKLEARLE-KVADEK-W 515 

+E EL+ + LQK+ +Q E KQE LE ++ + +LE KV ++K W 

154 KENQELKDSCISLQKQNSDMQAELQKKQEELETLQSINKKLELKVKEQKDYWETELLQLK 213 

516 —NEDATTEDEEAAVGLS-CPAALTDSEDE 542 

N+ ++E+E+ + + A L+ E E 
214 EQNQKMSSENEKMGIRVDQLQAQLSTQEKE 243 


Score « 47 (7.1 bits). Expect = 4.6e-26, Sum P(2) » 4 6e-26 
Identities » U/30 (36%), Positives = 17/30 (56%) 

Query: 631 MASGFTVGTLSETSTGGPATPTWKECPICK 660 
ev. ^ P K+CPICK 

Sbjct: 374 LAYGNPYSGIQESSSPSPLSI— KKCPICK 401 

Pedant information for DKFZphtes3_7p9, frame 3 


(LENGTH) 
[MW] 
(PII 
(HOMOL] 
(FUNCATJ 
(FUNCAT) 
IFUMCATJ 
2e-ll 
[FUNCAT) 
(FUNCATJ 
[FUNCAT] 
[FUNCAT] 
MYOl 


Report for DKF2phtes3_7p9. 3 

691 

77336.52 
4.77 

PIR:A56733 nuclear domain 10 protein NDP52 - human 26-29 
09.10 nuclear biogenesis (S. cerevisiae, yS^?56w1 2e-ll 


30.04 organization of cytoskeleton (s 
08.07 vesicular transport (golgi networJc', 


cerevisiae, 
etc.) 


yDR356w) 2e-ll 
[S. cerevisiae, 


YOL058W] 


cerevisiae, YDR356w] 2e-ll 
cerevisiae, YDL058w) 2e-ll 


03.22 cell cycle control and mitosis (S. 
30.03 organization of cytoplasm (s 

II Srhuddtnn^'* proteins [s. cerevi^iaeV yIrYoVc]' Ve-OS 

myosin.l'isofomr3e?67'''' ^"'^^^'^ ^--^-^ fS. cerevisiae, YHR023w 

m™l isoformJIlo?''''"'''""'"^^^"'^'^^ ^^"^^^^^^ cerevisiae, YHR023w MYOl - 

{Sj X^^o. chJots^e^^^^^^^^^^^ " -^/-^-^ ^-^o-^ 3e.07 

(FUNCATJ 30.10 nuclear oraanization iq " - «revisiae, YJL074c] 4e-07 

rPONCAT nT «K^i^™L?^ cerevxsiae, yNL250w) 4e-06 

' \s. cere" sTae?VB°RT8"9t,'VJTr' """"'"VPe deterodnation. Lx-specific proteins 
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[ FUNCAT 1 

4e-06 

[FUNCAT] 

( FUNCAT ) 

( FUNCAT ) 

( FUNCAT 1 

jannaschii, 

( FUNCAT ) 

C FUNCAT] 

repair) 

[ FUNCAT ] 

[ FUNCAT J 

[FUNCAT] 

[FUNCAT] 

2e-04 

I FUNCAT ) 

[BLOCKS] 

[EC] 

[PIRKWJ 

(PIRKW) 

[PIRKW] 

[PIRKW] 

[PIRKWJ 

[PIRKW] 

(PIRKWJ 

[PIRKW] 

[PIRKWJ 

[PIRKW] 

(PIRKWJ 

[PIRKW] 

[PIRKWJ 

(PIRKW] 

[PIRKW] 

[PIRKWJ 

[PIRKWJ 

[PIRKWJ 

[PIRKWJ 

[PIRKWJ 

(PIRKW) 

(PIRKW) 

(PIRKW) 

(PIRKW) 

(PIRKW) 

(PIRKW) 

(PIRKW) 

(PIRKW) 

(PIRKW) 

[PIRKWJ 

[PIRKW] 

(PIRKWJ 

(PIRKW) 

(PIRKW) 

(PIRKW) 

(PIRKWJ 

[PIRKW] 

(PIRKW) 

f PIRKWJ 

(PIRKWJ 

[PIRKWJ 

(PIRKWJ 

(PIRKW) 

(PIRKWJ 

[PIRKWJ 

[PIRKWJ 

(PIRKW) 

[PIRKW] 

(PIRKW) 

(PIRKWJ 

(SUPFAMJ 

{SUPFAMJ 

t SUPFAMJ 

[SUPFAMJ 

[SUPFAM] 

(SUPFAMJ 

(SUPFAMJ 

[SUPFAMJ 

[SUPFAM] 

[SUPFAMJ 

[SUPFAMJ 

(SUPFAM) 

[SUPFAMJ 


01.05.04 regulation of carbohydrate utilization 


[S. cerevisiae, YBR289w] 


04.05,01.04 transcriptional control [S. cerevisiae, YBR289wJ 4e-06 
03.19 recombination and dna repair [S. cerevisiae, YNL250w) 4e-06 
03.13 meiosis [S, cerevisiae, YNL250wJ 4e-06 

1 genome replication, transcription, recombination and repair [M 
MJ1543I le-05 

98 classification not yet clear-cut [S. cerevisiae, YJR134cJ 4e-05 

11.04 dna repair (direct repair, base excision repair and nucleotide excision 

(S. cerevisiae, YKR095w] 4e-05 

08.19 cellular import [S. cerevisiae, YNL243w) 7e-05 

01.03.16 polynucleotide degradation (S. cerevisiae, YNL243w] 7e-05 

06.10 assembly of protein complexes [S. cerevisiae, YNL243wi 7e-05 

08.99 other intracellular-transport activities (s. cerevisiae, YNL079cl 

03.01 cell growth (S. cerevisiae, YNL079c) 2e-04 

BL00682B 2P domain proteins 

3.6.1.32 Myosin ATPase le-13 

nucleus 66-10 

phosphotransferase 2e-07 

duplication 9e-07 

citrulline le-09 

tandem repeat le-13 

heart 5e-ll 

endocytosis 5e-09 

polymorphism 3e-06 

cornified cell envelope le-06 

transmembrane protein 6e-12 

serine/threonine-specific protein Jcinase 2e-07 

cell wall le-06 

zinc finger 5e-09 

metal binding 5e-09 

DNA binding 8e-08 

muscle contraction le-11 

IgG constant region-binding le-06 

acetylated amino end 4e-09 

actin binding le-13 

mitosis 9e-09 

microtubule binding 9e-09 

ATP le-13 

thick filament le-10 

phosphoprotein le-13 

epidermis le-06 

leucine zipper le-07 

glycoprotein 4e-07 

skeletal muscle 4e-10 

disulfide bond le-07 

calcium binding le-09 

alternative splicing le-10 

coiled coil le-13 

P-loop le-13 

heptad repeat 6e-10 

methylated amino acid le-13 

basement membrane 3e-06 

immunoglobulin receptor 2e-07 

peripheral membrane protein 5e-09 

dimer le-07 

cardiac muscle le-10 

extracellular matrix 3e-06 

hydrolase le-13 

microtubule 6e-10 

muscle 2e-09 

membrane protein 3e-06 

EF hand le-09 

cytoskeleton 6e-12 

hair le-09 

calmodulin binding 5e-09 

Golgi apparatus 3e-08 

myosin heavy chain le-13 

conserved hypothetical PH5 protein le-08 

hypothetical protein YJL074c 5e-07 

centromere protein E 9e-09 

unassigned Ser/Thr or Tyr-specific protein kinases 2e-07 

calmodulin repeat homology le-09 

myosin motor domain homology le-13 

alpha-actinin actin-binding domain homology 3e-13 

tropomyosin 3e-07 

plectin 3e-13 

trichohyalin le-09 

pleckstrin repeat homology 4e-06 

ribosomal protein SIO homology 3e-13 
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[SUPFAM] giantin 3e-08 

ISUPFAM) protein kinase homology 2e-07 

(SUPFAMl protein kinase C zinc-binding repeat homologv 4e-06 

t SUPFAM) involucrin le-06 

[SUPFAM) kinesin motor domain 'homology 9e-09 

[SUPFAM] human early endosome antigen 1 5e-09 

(SUPFAM] unassigned kinesin-related proteins 8e-08 

[SUPFAM] M5 protein 3e-08 

[SUPFAM] cytoskeletal keratin 3e-08 

(PROSITEJ LEUCINE ZIPPER 3 

[PROSITE] RGD ~1 

(PROSITEl MYRISTYL 6 

[PROSITE] CK2_PH0SPH0_SITE 25 

[PROSITEJ PKC_PH0SPHO_SITE 6 

IKW) All_Alpha 

IKWJ LOW_C0MPLEXITy 9,12 % 

[KW] COILED_COIL 39.36 % 

SEQ MEESPLSRAPSRGGVNFLNVARTYIPNTKVECHYTLPPGTMPSASDWIGIFKVEAACVRD 

SEG 

PRD cccccccccccccceeeecceeeeeccccceeeeeccccccccccceeeeeeeeeecccc 
COILS 

SEQ YHTFVWSSVPESTTDGSPIHTSVQFQASYLPKPGAQLYQFRYVNRQGQVCGQSPPFQFRE 
SEG 

COILS ^^^^^^^^^^^^^^'^^^^^^^^^^^^^^'^ccccccceeeeeccccccccccccccc^^^ 

SEQ PRPMDELVTLEEADGGSDILLVVPKATVLQNQLDESQQERNDLMQLKLQLEGQVTELRSR 

PRD cccccceeehhhhhchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhlihh 
^Oll.S CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ VQELERALATARQEHTELMEQYKGISRSHGEITEERDILSRQQGDHVARILELEDDIQTI 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhlihhhhhhhhhhhhh^ 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCC 


SEQ SEKVLTKEVELDRLRDTVKALTREQEKLLGQLKEVQADKEQSEAELQVAQQENHHLNLDL 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhlihhhlihhhhhhh^ 

^OlhS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ KEAKSWQEEQSAQAQRLKDKVAQMKDTLGQAQQRVAELEPLKEQLRGAQELAASSQQKAT 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCC . . CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC . CCCCCCCCCCCCCCCCCCCC 

SEQ LLGEELASAAAARDRTIAELHRSRLEVAEVNGRLAELGLHLKEEKCQWSKERAGLLQSVE 

SEG xxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhlihhhhlih^ 

COI LS CCCCCCCC CCCCCCCCCCC 

SEQ AEKDKILKLSAEILRLEKAVQEERTQNQVFKTELAREKDSSLVQLSESKRELTELRSALR 

SEG , 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh^^ 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCC 

SEQ VLQKEKEQLQEEKQELLEYMRKLEARLEKVADEKWNEDATTEDEEAAVGLSCPAALTDSE 

SEG . xxxxxxxxxxxxxxxxx xxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ DESPEDMRLPPYGLCERGDPGSSPAGPREASPLWISQPAPISPHLSGPAEDSSSDSEAE 

xxxxxxxxxxx 

> PRD hhhhccccccccccccccccccccccccccceeeeeeccccccccccccccccccccchh 

COILS 

SEQ DEKSVLMAAVQSGGEEANLLLPELGSAFYDMASGFTVGTLSETSTGGPATPTWKECPICK 

SEG XX 

PRD hhhhhhhhhhhhcccccccccccccccccccccccccccccccccccccccccccccccc 

COILS 

SEQ ERFPAESDKDALEDHMDGHFFFSTQDPFTFE 

SEG 

PRD cccccccchhhhhhhccccceeecccccccc 

COILS 
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PS00005 

190 

->193 

PS00005 

241 

->244 

PS00005 

257 

->260 

PS00005 

468 

->471 

PS00005 

652 

->655 

PS00005 

t O V/ V V -J 

667- 

->670 

PS00006 

21 

3->32 

PS00006 

43->47 

PS00006 

68->72 

pcn0006 

72->76 

PS00006 

129- 

->133 

tr ^ \J \J \J \J \j 

156- 

->160 

PS00006 

208- 

->212 

t O W V w w V 

239- 

>>243 

pcnnnnc 

282- 

->286 


305->309 

(TO wuu o 

376- 

->380 

£r C> vJ W U D 

383- 

->387 


468- 

->472 

tr o vUUU D 

520- 

->524 

r oVJ U VJ w D 

537- 

->541 


539- 

->543 

foUUUU D 

543- 

->547 

Dcnrtnn c 

593- 

->597 

f oUUUUD 

595- 

->599 


597- 

•>601 

foUUUuD 

612- 

->616 


639- 

->643 

PS00006 

652- 

'>656 

PS00006 

667- 

■>671 

PS00006 

683- 

■>687 

PS00008 

39->45 

PS00008 

107- 

■>113 

PS00008 

204- 

->210 

PS00008 

414- 

'>420 

PS00008 

561- 

•>567 

PS00008 

513- 

>619 

PS00016 

557- 

>560 

PS00029 

163- 

•>185 

PS00029 

475- 

•>497 

PS00029 

482- 

>504 


PKC_PHOSPH0_SITE 

PKC_PHOSPHO_SITE 

PKC_PH0SPH02SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0S PHO_S ITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_S1TE 

CK2_PH0SPH0_SITE 

CK2_PHOSPHO_SITE 

CK2_PH0SPH0 SITE 

CK2 PHOSPHORS ITE 

CK2~PHOSPH0[^SITE 

CK2_PH0S PHO_S I TE 

CK2_PH0SPH0_SITE 

CK2_PHOS PHO_S I TE 

CK2_PH0SPH0_SITE 

CK2_PHOS PHO_S I TE 

CK2_PHOSPH0 SITE 

CK2_PHOS PHO~S ITE 

CK2_PHOSPHO_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PHOSPH0_SITE 

CK2_PHOSPH0_SITE 

CK2_PHOSPHO_SITE 

CK2_PH0SPH0_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

RGD 

LEUCINE_ZIPPER 
LEUCINE_ZIPPER 
LEUCINE ZIPPER 


PDOC00005 
PDOC000O5 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC0000 6 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC0000 6 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00016 
PDOC00029 
PDOC00029 
PDOC00029 


(No Pfam data available for DKFZphtes3_7p9. 3) 
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DKFZphtes3_8e24 


group: signal cransduction 

DKF2phtes3_8e24 .3 encodes a novel 658 amino acid putative GTP-binding protein, related to 
yeast YGL099w and mouse MMRl putative GTP-binding proteins. 

GTP-binding proteins are involved in various signal transduction pathways, transferring the 
signal of a cellular receptor to an intracellular signal cascade. 

The new protein can find clinical application in modulating/blocking the response to a 
cellular receptor. 


strong similarity to guanine nucleotide binding proteins 
complete cDNA, complete cds, potential start at Bp 31, EST hits 

Sequenced by MediGenomix 

Locus: unknown 

Insert length: 3290 bp 

Poly A stretch at pos. 3269, polyadenylation signal at pos. 3251 


1 CGTCCAGCGG TCGTGTTGCC ATGGGCCGGA GGAGAGCCCC GGCCGGTGGG 
51 TCGCTGGGAC GGGCCCTTAT GCGCCATCAG ACTCAGCGGA GCCGAAGCCA 
101 TCGTCACACT GACTCCTGGT TGCACACAAG TGAACTCAAT GATGGCTATG 
151 ATTGGGGTCG TCTTAATCTT CAGTCAGTGA CTGAACflGAG CTCCCTTGAT 
201 GACTTCCTTG CTACTGCAGA ACTTGCAGGA ACAGAGTTTG TAGCTGAAAA 
251 ACTTAATATT AAGTTTGTGC CTGCTGAGGC TAGAACTGGA CTACTGTCTT 
301 TCGAGGAGAG CCAGAGAATT AAGAAGCTCC ATGAAGAAAA CA/^CAGTTC 
351 TTGTGTATAC CGAGGAGACC AAACTGGAAC CAAAATACTA CCCCAGAAGA 
401 ACTCAAACAA GCAGAGAAAG ATAACTTTCT AGAATGGAGA CGTCAGCTTG 
4 51 TCCGGCTAGA AGAGGAACAG AAGCTGATAT TGACTCCATT TGAACGAAAT 
501 TTGGACTTTT GGCGCCAGCT CTGGAGAGTC ATTGAGAGAA GTGATATTGT 
551 GGTCCAGATA GTAGATGCTC GAAACCCACT CCTGTTTAGA TGTGAGGATT 
601 TGGAATGTTA TGTGAAAGAA ATGGATGCCA ATAAGGAGAA CGTCATTCTG 
651 ATCAACAAGG CAGACTTGCT GACTGCTGAG CAGCGGAGTG CCTGGGCCAT 
7 01 GTACTTCGAA AAAGAAGATG TGAAGGTTAT TTTCTGGTCA GCTTTGGCCG 
7 51 GAGCCATTCC CCTGAATGGT GACTCTGAGG AAGAGGCAAA CAGAGATGAT 
801 AGACAAAGCA ACACAACTGA GTTTGGACAT TCCAGTTTCG ACCAGGCTGA 
851 AATTTCCCAC AGTGAATCCG AACATCTCCC AGCTAGGGAT TCTCCTTCAC 
901 TTAGTGAAAA TCCCACAACG GATGAAGATG ACAGTGAGTA TGAGGACTGT 
951 CCAGAGGAGG AGGAAGACGA CTGGCAGACG TGCTCAGAAG AAGACGGTCC 
1001 CAAGGAAGAG GACTGCAGCC AGGACTGGAA GGAAAGCTCT ACTGCAGATT 
1051 CTGAGGCTCG GAGCAGGAAA ACCCCACAGA AGAGGCAGAT ACACAATTTT 
1101 AGCCATCTGG TATCCAAGCA GGAGTTACTG GAGCTCTTTA AGGAGCTACA 
1151 CACTGGGAGA AAGGTGAAAG ATGGGCAACT TACGGTCGGA CTGGTGGGCT 
1201 ACCCTAATGT TGGTAAGAGT TCAACAATCA ACACCATCAT GGGCAACAAG 
1251 AAAGTATCTG TGTCTGCCAC ACCTGGTCAC ACAAAGCACT TTCAGACTCT 
1301 CTATGTGGAG CCTGGCCTCT GCCTGTGTGA CTGTCCTGGC TTGGTGATGC 
1351 CATCTTTTGT GTCTACCAAG GCAGAAATGA CTTGCAGCGG AATCCTCCCA 
14 01 ATTGATCAGA TGAGAGATCA TGTTCCTCCT GTATCACTAG TTTGCCAGAA 
14 51 TATTCCAAGA CATGTTTTAG AAGCTACCTA TGGCATTAAC ATCAT7VACGC 
1501 CTAGAGAGGA TGAAGATCCC CACCGACCTC CAACATCGGA AGAACTGTTG 
1551 ACAGCTTATG GATACATGCG AGGATTCATG ACAGCGCATG GACAGCCAGA 
1601 CCAGCCTCGA TCTGCGCGCT ACATCCTGAA GGACTATGTC AGTGGTAAGC 
1651 TGCTGTACTG CCATCCTCCT CCTGGAAGAG ATCCTGTAAC TTTTCAGCAT 
1701 CAACACCAGC GACTCCTAGA GAACAAAATG AACAGTGATG AAATAAAAAT 
1751 GCAGCTAGGC AGAAATAAAA AAGCAAAGCA GATTGAAAAT ATCGTTGACA 
1801 AAACTTTTTT CCATCAAGAG AATGTGAGGG CTTTGACCAA AGGAGTCCAG 
1851 GCTGTGATGG GTTACAAGCC CGGGAGTGGT GTAGTGACTG CATCCACTGC 
1901 GAGCTCTGAG AACGGGGCGG GGAAGCCCTG GAAAAAACAT GGCAACAGAA 
1951 ATAAAAAAGA AAAAAGTCGT AGACTCTACA AGCACCTGGA TATGTGAGGT 
2001 TGGGCTGCAA CAGAAATGTC ATCTGCATTG TGCAGATGGA AAAGAGCAGA 
2051 AGCTGCCTGT TGCCTGTGGA ACTGTCCCAA GACACTAGCA CTGTAGAACG 
2101 GGCCCTGCTC TTGCAGAGCA CGGCTGCACC CAACAGTCTC CATGTCAAGA 
2151 CCAAGGGCCT CCTGGAAACA CCAGCTCTGA CAAAAAGGAG TCATCTGGGA 
2201 GCCCGAGAAT CCTACTCCTG GCCGGGCACA GTGGCTCACG CACCAACATG 
2251 GAGAAACCCC GTCTCTACTA AAAATACAAA AAAATTAGCC AGGCGTGGTG 
2301 GCGCGCACCT GTAATCCCAG CTACTCGGGA GGCTGAGGCA GGAGAATCAC 
2351 TTGAACCAGG GAGGCAGAGT TTGCAGTGAA TGGAGATTGC GCCGCTGCAC 
2401 TCCAGCCTGG GCGACAGAGT GAGACTGCAT CACAAGAAAA AAAATTTGCA 
2451 AGGGATGGTT CACGAGACAC ATTTGGGACG AAGGTGAAAG AGAAATTCCC 
2501 CATTCTGAGT GTCCTAGTTG GGTTCCTCCG ACTCTAAACA AGGGACTTGG 
2551 GTTCAGTTAG TGTACAGCGG GGGCTCACGT CCACTAAGGA ACATGTAGAA 
2601 TGTAACCACC GGGTGACAGG GAAGCTGCGG TATTTACTAC CTAGCCCCCA 
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2651 TCTTCACTGG TTATTCCACT TATTTAAAAT GTCCAGAATA AGCAAATCTC 
2701 CATATAGAGG AAGTAGATTA GTGGTTGCTT CGGGATGGGA GGAATGGGAA 
2751 GATTGAGGTC TTTCTTTTGC AGTGATAAAA ATGTCCTAAA ATTGACTGTA 
2801 GCGATGGTCA CACAACTCTG AATATGCTTA AGACCATTGA ATTACACACT 
2851 TTACGTTGGT GAATTGTATG GTATGTAAAT TATAGTTCAA TAACATAGTT 
2901 ACAAAAGATA ATCAAAAGCA TGAAAGCACT ATTGATGTGG TTTGGATCTG 
2951 TGTCCTCACC GAGTCTCATG TTGAAATGTA AGCCCCCTGG TGGGAGGCGA 
3001 TGGGATTATG GGGCAGAGTC CTCACAAACG GTTTAGCACC ACCCGCTCAG 
3051 TGCTGTTCTC CTGATATTGA GTCCTCATCA CATCTGGTTG CTTCAAAGTG 
3101 TGTGGTGCCT CCCCTCTGTC TCCCTCCTGC TCTGGCCATA TAAGATGTGC 
3151 CTGCTTCTCC TTCGCCTTCT AACATGATTG TAAGTTTCCT GAGGCCTCCC 
3201 TAGAAGCAAA AGCTGCTGTG CTTCCTGTAC CATCTACTGG ACCGTGAGCC 
3251 AATTAAACCT CTTTTCTTTA TAAAAAAAAA AAAAAAAAGG 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 3 


ORF from 21 bp to 1994 bp; peptide length: 658 
Category: strong similarity to known protein 


1 MGRRRAPAGG SLGRALMRHQ TQRSRSHRHT DSWLHTSELN DGYDWGRLNL 
51 QSVTEQSSLD DFLATAELAG TEFVAEKLNI KFVPAEARTG LLSFEESQRI 
101 KKLHEENKQF LCIPRRPNWN QNTTPEELKQ AEKDNFLEWR RQLVRLEEEQ 
151 KLILTPFERN LDFWRQLWRV lERSDIVVQI VDARNPLLFR CEDLECYVKE 
201 MDANKENVIL INKADLLTAE QRSAWAMYFE KEDVKVIFWS ALAGAIPLNG 
251 DSEEEANRDD RQSNTTEFGH SSFDQAEISH SESEHLPARD SPSLSENPTT 
301 DEDDSEYEDC PEEEEDDWQT CSEEDGPKEE DCSQDWKESS TADSEARSRK 
351 TPQKRQIHNF SHLVSKQELL ELFKELHTGR KVKDGQLTVG LVGYPNVGKS 
401 STINTIMGNK KVSVSATPGH TKHFQTLYVE PGLCLCDCPG LVMPSFVSTK 
451 AEMTCSGILP IDQMRDHVPP VSLVCQNIPR HVLEATYGIN IITPREDEDP 
501 HRPPTSEELL TAYGYMRGFM TAHGQPDQPR SARYILKDYV SGKLLYCHPP 
551 PGRDPVTFQH QHQRLLENKM NSDEIKMQLG RNKKAKQIEN IVDKTFFHQE 
601 NVRALTKGVQ AVMGYKPGSG VVTASTASSE NGAGKPWKKH GNRNKKEKSR 
651 RLYKHLDM 


BLAST? hits 

No BLAST? hits available 

Alert BLAST? hits for DKrZphtes3_8e24, frame 3 

SWISSPROT:YAWG_SCHPO HYPOTHETICAL GTP-BINDING PROTEIN C3F10.16C IN 
CHROMOSOME I., N « 3, Score 560, P = 1.6e-lll 

PIR:S64106 hypothetical protein YGL099w - yeast (Saccharomyces 
cerevisiae), N = 2, Score =544, P = 2.6e-105 

TREMBL:CEAF314 3_1 gene: "C53H9.2"; Caenorhabditis elegans cosmid 
C53H9., N = 1, Score = 551, P = 2.9e-53 

SWISSPR0T:MMR1_M0USE POSSIBLE GTP-BINDING PROTEIN MMRl . , N = 2 Score = 
311, P = 7.58-31 


>SWISSPROT:YAWG_SCHPO HYPOTHETICAL GTP-BINDING PROTEIN C3F10 16C IN 
CHROMOSOME I. 

Length » 616 

HSPs: 

Score = 560 (84.0 bits), Expect = 1.6e-lll, Sum P(3) = 1.6e-lll 
Identities = 119/253 (47%), Positives * 163/253 (64%) 

Query: 12 lgralmrhqtqrsrshrhtdswlhtselndgydwgrlnlqsvteqsslddflataelagt 71 

LGRA+ T+ R+ + H + + R L+SVT ++ LD+FL TAEL 
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Sbjct: 

12 

LGRAIQSDFTKNRRNRK— GGLKHIVDSDPKAH--RAALRSVTHETDLDEFLNTAELGEV 

67 

Qu 6 r y r 

72 

EFVAEKLNIKFVP-AEARTGLLSFEESQRIKKLHEENKQFLCIPRRPNWNQNTTPEELKQ 130 





Sbjct: 

68 

EFIAEKQNVTVIQNPEQNPFLLSKEEAARSKQKQEKNKDRLTIPRRPHWDQTTTAVELDR 

127 

Query. 

131 

AEKDNFLEWRRQLVRLEEEQKLILTPFERNLDFWRQLWRVIERSDIVVQIVDARNPLLFR 

190 



E+++FL WRR L +L++ + I+TPFERNL+ WRQLWRVIERSD+VVQIVDARNPL FR 


Sbjct: 

128 

MERESFLNWRRNLAQLQDVEGFI VTPFERNLE IWRQLWRVIERSDVVVQI VDARN PLFFR 

187 

Query: 

191 

CEDLECYVKEMDANKENVILINKADLLTAEQRSAWAMYFEKEDVKVIFWSALAGAIPLNG 

250 



LE YVKE+ +K+N +L+NKAD+LT EQR+ W+ YF + ++ +F+SA A N 


Sbjct: 

188 

SAHLEQYVKEVGPSKKNFLLVNKADMLTEEQRNYWSSYFNENNIPFLFFSARMAA-EANE 

246 

Query: 

251 

DSEEEANRDDRQSN 264 




E+ + SN 


Sbjct: 

247 

RGEDLETYESTSSN 260 


Score 

« 532 

(79.8 bits). Expect = 1.6e-lll, Sum P(3) = 1.6e-"lll 


Identities = 131/323 (40%), Positives = 192/323 (59%) 


Query: 

340 

STADSEARSRKTPQKRQIHNFSHLVSKQELLELFKELHTGRKVKDGQ— LTVGLVGYPNV 

397 



ST+ +E + +H+ S + + + L +F++ + + DG+ +T GLVGYPNV 


Sbjct: 

256 

STSSNEIPESLQADENDVHS-SRIATLKVLEGIFEKFAS— TLPDGKTKMTFGLVGYPNV 

312 

Query: 

398 

GKSSTINTIMGNKKVSVSATPGHTKHFQTLYVEPGLCLCDCPGLVMPSFVSTKAEMTCSG 

457 



GKSSTIN ++G+KKVSVS+TPG TKHFQT+ + + L DCPGLV PSF +T+A++ G 


Sbjct: 

313 

GKSSTINALVGSKKVSVSSTPGKTKHFQTINLSEKVSLLDCPGLVFPSFATTQADLVLDG 

372 

Query: 

458 

ILPIDQMRDHVPPVSLVCQNIPRHVLEATYGINI-ITPREDEDPHRPPTSEELLTAYGYM 

516 



+LPIDQ+R++ P +L+ + IP+ VLE Y I I I P E E P+++E+L + 


Sbjct: 

373 

VLPIDQLREYTGPSALMAERIPKEVLETLYTIRIRIKPIE-EGGTGVPSAQEVLFPFARS 

431 

Query: 

517 

RGFMTAH-GQPDQPRSARYILKDYVSGKLLYCHPPPG— RDPVTFQHQHQRLLENKMNSD 

573 



RGFM AH G PD R+AR +LKDYV+GKLLY HPPP F +H + + + SD 


Sbjct: 

432 

RGFMRAHHGTPDDSRAARILLKDYVNGKLLYVHPPPNYPNSGSEFNKEHHQKIVSA-TSD 

490 

Query: 

574 

EIKMQLGR NKKAKQIEN-IVDKTFFHQEN— VRALTKGVQAVM-G— YKPGSGVVTA 

624 



I +L R + E+ +VD +F QEN VR + KG M G YK + + 


Sbjct: 

491 

SITEKLQRTAI SDNTLSAESQLVDDEYF-QENPHVRPMVKGTAVAMQGPVYKGRNTMQPF 54 9 

Query: 

625 

STASSENGAGK-PWKKHGNRNKKEKSRRL 652 




+++ + K P G K+R+L 


Sbjct: 

550 

QRRLNDDASPKYPMNAQGKPLSRRKARQL 578 


Score 

= 47 1 

[7.1 bits), Expect = 1.3e-60, Sum P(3) « 1.3e-60 


Identities = 21/84 (25%), Positives = 35/84 (41%) 


Query: 

552 

GRDPVTFQHQHQRLLENKMNSDEIKMQLGRNKKAKQIENIVDKTFFHQENVRALTKGVQA 

611 



G D T++ + + +DE + R K +E I +K F TK 


Sbjct: 

248 

GEDLETYESTSSNEIPESLQADENDVHSSRIATLKVLEGIFEK — FASTLPDGKTKMTFG 

305 

Query: 

612 

VMGYKPGSGWTASTASSENGAGK 635 




++GY P G +ST ++ G+ K 


Sbjct: 

306 

LVGY-PNVG— KSSTINALVGSKK 326 


Score 

« 43 (6.5 bits). Expect - 1.6e-lll, Sum P(3) = 1.6e-lll 


Identities = 7/13 (53%), Positives « 9/13 (69%) 


Query: 

638 

KKHGNRNKKEKSR 650 




KKH +NK+ K R 


Sbjct: 

596 

KKHNKKNKRSKQR 608 



Pedant information for DKFZphtes3__8e24, frame 3 


Report for DKFZphtes3_8e24 .3 


(LENGTH J 658 

[MWl 75226.58 

[pi] 5.86 

[HOMOL) SWISSPROT:YAWG_SCHPO HYPOTHETICAL GTP-BINDING PROTEIN C3F10.16C IN CHROMOSOME 
I. 5e-56 

[FUNCAT] 99 unclassified proteins [S. cerevisiae, YGL099w] 3e-55 

(FUNCATJ r general function prediction [M. jannaschii, MJ1464] le-ie 

[funCAT] 08.16 extracellular transport (S. cerevisiae, YER006wl 3e-09 

(PIRKW3 P-loop le-27 

[PIRKW] GTP binding le-27 

(SUPFAMj conserved hypothetical protein MG442 7e-08 


963 


wo 01/12659 


PCT/IBOO/01496 


[PROSITE] 

ATP GTP A 1 


IPROSITE] 

MYRISTYL 3 


[PROSITEl 

AMI DAT ION 2 


i PROSITE] 

CAMP PHOSPHO SITE 

1 

(PROSITE] 

CK2 PHOSPHO SITE 

19 

[PROSITE] 

TYR PHOSPHO SITE 

2 

{PROSITE] 

PKC PHOSPHO SITE 

10 

(PROSITE J 

AS N_G L YCOS Y LAT I ON 

2 

[KWJ 

Alpha Beta 


(KW] 

LOW^COMPLEXITY 

4.56 % 


SEQ MGRRRAPAGGSLGRALMRHQTQRSRSHRHTDSWLHTSELNDGYDWGRLNLQSVTEQSSLD 

SEG xxxxxxxxxxxxx 

PRD cccccccccccchhhhhhhhhhhccccccccccccccccccccccchhhhhhhhccccch 

SEQ DFLATAELAGTEFVAEKLNIKFVPAEARTGLLSFEESQRIKKLHEENKQFLCIPRRPNWN 

SEG 

PRD hhhhhhhhhhheeeecccceeeeeeccccccchhhhhhhhhhhhhlihhhhhccccccccc 

SEQ QNTTPEELKQAEKDNFLEWRRQLVRLEEEQKLILTPFERNLDFWRQLWRVIERSDIWQI 

SEG 

PRD cccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhhcceeeee 

SEQ VDARNPLLFRCEDLECYVKEMDANKENVILINKADLLTAEQRSAWAMYFEKEDVKVIFWS 

SEG 

PRD eccccccccchhhhhhhhhhhccccceeeeecccchhhhhhhhhhhhhhhhccceeeeec 

SEQ ALAGAI PLNGDSEEEANRDDRQSNTTEFGHSSFDQAEISHSESEHLPARDSPSLSENPTT 

SEG 

PRD cccccccccccchhhhhhhhhhcccccccccccccccccccccccccccccccccccccc 

SEQ DEDDSEYEDCPEEEEDDWQTCSEEDGPKEEDCSQDWKESSTADSEARSRKTPQKRQTHNF 

SEG xxxxxxxxxxxx:xxxxx 

PRD cccccccccccccccccccccccccccccccccccccccchhhhhhhhhccccccccccc 

SEQ SHLVSKQELLELFKELHTGRKVKDGQLTVGLVGYPNVGKSSTINTIMGNKKVSVSATPGH 

SEG 

PRD ccccchhhhhhhhhhhhhhhccccceeeeeecccccccccceeeeccccceeeeeccccc 

SEQ TKHFQTLYVEPGLCLCDCPGLVMPSFVSTKAEMTCSGILPIDQMRDHVPPVSLVCQNIPR 

SEG 

PRD cceeeeeeeccceeecccccccccccchhhhhhhhccccccccccccccceeeeecccch 

SEQ HVLEATYGINIITPREDEDPHRPPTSEELLTAYGYMRGFMTAHGQPDQPRSARYILKDYV 

SEG 

PRD hhhhhhhhccccccccccccccccchhhhhhhhhhhhhhcccccccccchhhhhhhhhcc 

SEQ SGKLLYCHPPPGRDPVTFQHQHQRLLENKMNSDEIKMQLGRNKKAKQIENIVDKTFFHQE 

SEG 

PRD ccceeeeccccccccccchhhhhhhhhhcccchhhhhhhhcchhhhhhhhhhhhccccch 

SEQ NVRALTKGVQAVMGYKPGSGVVTASTASSENGAGKPWKKHGNRNKKEKSRRLYKHLDM 

SEG 

PRD hhhhhhhceeeeeecccccceeecccccccccccccccccccccchhhhhhhhhhccc 


Prosite for DKFZphtes3_8e24 . 3 


PSOOOOl 

264->268 

ASM 

GLYCOSYLATION 

PDOCOOOOl 

PSOOOOl 

359->363 

asn' 

'glycosylation 

PDOCOOOOl 

PS00004 

410->414 

CAMP PHOSPHO SITE 

PDOC00004 

PS00005 

21->24 

PKC 

PHOSPHO SITE 

PDOC00005 

PS00005 

26->29 

PKC" 

'phosphors ITE 

PDOC00005 

PS00005 

97->100 

PKC' 

"PHOSPHO SITE 

PDOC00005 

PS00005 

348->351 

PKC" 

'PHOSPHO SITE 

pixxroooos 

PS00005 

378->38l 

PKC' 

"PHOSPHO SITE 

PDOC00005 

psoobos 

448->451 

PKC" 

"PHOSPHO SITE 

PDOC00005 

PSOOOOS 

493->496 

PKC" 

"PHOSPHO SITE 

PDOC00005 

PS00005 

531->534 

PKC" 

"PHOSPHO SITE 

PDOC00005 

PSOOOOS 

541->544 

PKC' 

'PHOSPHO SITE 

PDOC00005 

PSOOOOS 

649->652 

PKC" 

'PHOSPHO SITE 

PDOC00005 

PS00006 

52->56 

CK2' 

'PHOSPHO SITE 

PDOC00006 

PS00006 

57->61 

CK2" 

"PHOSPHO SITE 

PDOC00006 

PS00006 

93->97 

CK2' 

"PHOSPHO SITE 

PDOC00006 

PS00006 

123->127 

CK2" 

'PH0SPH0""SITE 

PDOC00006 

PS00006 

155->159 

CK2" 

"PHOSPHO'SITE 

PDOC00006 

PS00006 

252->256 

CK2" 

'PHOSPHO SITE 

PDOC00006 

PSOOOOS 

271->275 

CK2*' 

"PHOSPHO SITE 

PDOC00006 

PS00006 

279->283 

CK2" 

'PHOSPHO SITE 

PDOC00006 
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PS00006 

281- 

■>285 

CK2 PHOSPHO 

SITE 

PDOC00006 

PS00006 

293- 

->297 

CK2 PHOSPHO' 

"site 

PDOC00006 

PS00006 

299- 

->303 

CK2 PHOSPHO* 

"site 

PDOC00006 

PS00006 

305- 

■>309 

CK2 PHOSPHO" 

"site 

PDOC00006 

PS00006 

320- 

->324 

CK2 PHOSPHO* 

"site 

PDOC00006 

PS00006 

322- 

■>326 

CK2 PHOSPHO' 

'site 

PDOC00006 

PS00006 

340- 

->344 

CK2 PHOSPHO' 

'site 

PDOC00006 

PS0O006 

365- 

->369 

CK2 PHOSPHO' 

"site 

PDOC00006 

PS00006 

449- 

■>453 

CK2 PHOSPHO' 

"site 

PDOC00006 

PS00006 

493- 

■>497 

CK2 PHOSPHO" 

"site 

PDOC00006 

PS00006 

505- 

•>509 

CK2 PHOSPHO" 

"site 

PDOC00006 

PS00007 

480- 

•>488 

TYR PHOSPHO" 

'site 

PDOC00007 

PSOO0O7 

190- 

•>198 

TYR PHOSPHO" 

'site 

PDOC00007 

PS00008 

9 

l->15 

MYRISTYL 


PDOC00008 

PS00008 

432- 

>438 

MYRISTYL 


PDOC00008 

PS00008 

620- 

>626 

MYRISTYL 


PDOC00008 

PS00009 


l->5 

AMIDATION 


PDOC00009 

PS00009 

378- 

>382 

AMIDATION 


PDOC00009 

PS00017 

393- 

>401 

ATP GTP A 


PDOC00017 


(No Pfam data available for DKFZphtes3_8e24 .3) 
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DKFZphtes3_8gll 


group: testes derived 

DKF2phtes3_8gll encodes a novel proline-rich 939 amino acid protein without similarity to 
known proteins. 

The novel protein contains an ATP/GTP-binding site motif A (P-loop). 

No informative BLAST results; No predictive prosite, pfam or SCOP raotife. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 


unknown, prolin ritch protein 
1 EST hit (from testis library) 
Sequenced by MediGenomix 
Locus: unknown 
Insert length: 3100 bp 

Poly A stretch at pos. 3056, polyadenylation signal at pos. 3041 


1 AGAGTCTTCC CTCAGCATAT TTTACGATAG AGAAGATCTT GTTCCAATGG 
51 AAGAAAGTGA GGACTCACAG AGTGATTCCC AGACAAGGAT TTCTGAGTCC 
101 CAACACTCCC TCAAGCCAAA TTATCTTTCC CAGGCCAAGA CTGACTTCTC 
151 AGAACAGTTC CAGTTGCTAG AAGATCTGCA GCTAAAAATA GCAGCAAAAC 
201 TCTTAAGGAG TCAAATACCC CCCGATGTGC CTCCACCTCT AGCTTCAGGT 
251 CTAGTCCTAA AATACCCTAT CTGCCTACAG TGTGGCCGAT GTTCAGGACT 
301 TAATTGCCAT CATAAATTAC AGACCACTTC GGGGCCTTAT CTTCTTATCT 
351 ATCCACAGCT CCACCTTGTA CGCACTCCTG AAGGCCATGG TGAGGTTCGG 
401 TTGCATCTTG GCTTTAGGCT GAGAATTGGG AAAAGATCCC AAATCTCAAA 
451 GTATCGTGAA AGAGATAGAC CCGTCATACG GAGAAGCCCT ATATCACCAT 
501 CACAAAGGAA AGCTAAAATC TATACTCAAG CTTCCAAGAG TCCTACTTCC 
551 ACAATAGATT TGCAGTCTGG GCCTTCCCAG TCCCCTGCTC CTGTACAAGT 
601 CTACATCAGG CGAGGACAAC GCAGCAGGCC TGACTTAGTA GAAAAGACAA 
651 AAACTAGAGC ACCTGGGCAC TATGAATTCA CTCAAGTTCA CAACCTACCA 
701 GAGAGTGACT CTGAAAGCAC TCAGAATGAA AAACGGGCTA AAGTGAGAAC 
751 CAAAAAGACC TCTGATTCAA AATATCCAAT GAAGAGAATC ACCAAGCGAC 
801 TTAGAAAACA CAGAAAGTTC TACACAAACA GTAGAACCAC AATAGAGAGT 
851 CCTTCTAGGG AATTAGCAGC CCATTTAAGA AGGAAGAGGA TTGGAGCAAC 
901 TCAGACAAGT ACTGCCTCTT TAAAAAGACA ACCTAAGAAA CCTTCCCAAC 
951 CCAAGTTCAT GCAACTGCTT TTTCAGAGCC TAAAGCGGGC ATTCCAAACA 
1001 GCACACAGAG TTATAGCTTC TGTTGGGCGG AAGCCTGTGG ACGGGACAAG 
1051 GCCAGACAAT TTGTGGGCAA GCAAAAACTA TTATCCAAAA CAAAATGCCA 
1101 GGGACTATTG CTTACCAAGC AGTATCAAAA GAGACAAGAG GTCAGCTGAC 
1151 AAGCTAACGC CAGCAGGCTC AACCATTAAG CAGGAGGACA TATTGTGGGG 
1201 AGGAACGGTC CAGTGCAGAT CAGCTCAACA GCCAAGAAGA GCTTACTCTT 
1251 TCCAACCCAG ACCTCTTCGA CTGCCCAAGC CCACAGATTC CCAAAGTGGT 
1301 ATTGCTTTCC AAACTGCCTC AGTGGGGCAG CCTCTGAGAA CTGTTCAAAA 
1351 GGACAGTAGT AGCAGATCAA AGAAAAACTT CTATAGAAAT GAAACCTCCA 
1401 GCCAGGAGTC TAAGAACTTG TCCACACCAG GAACCAGAGT TCAGGCCCGA 
1451 GGAAGAATCC TACCTGGTTC CCCTGTGAAG AGAACCTGGC ACCGACATCT 
1501 TAAAGACAAA CTCACACACA AGGAGCATAA CCACCCCAGC TTCTATAGGG 
1551 AGAGAACCCC ACGCGGTCCT TCTGAGAGAA CCCGTCATAA CCCCTCTTGG 
1601 AGAAACCATC GCAGTCCCTC TGAGAGAAGC CAACGCAGTT CCTTGGAGAG 
1651 AAGACATCAC AGTCCCTCTC AGAGGAGCCA CTGCAGTCCC TCTAGGAAAA 
1701 ACCATTCCAG TCCTTCTGAG AGAAGCTGGC GCAGTCCGTC TCAGAGAAAT 
1751 CACTGCAGTC CCCCCGAGAG GAGCTGTCAC AGTCTCTCTG AAAGGGGCCT 
1801 TCACAGTCCC TCTCAGAGGA GCCATCGCGG TCCCTCTCAG AGAAGACATC 
1851 ACAGTCCCTC AGAGAGAAGC CATCGCAGTC CCTCAGAGAG AAGCCATCGC 
1901 AGTCCCTCTG AGAGAAGACA TCGCAGTCCC TCCCAGAGGA GCCATCGCGG 
1951 TCCCTCAGAG AGAAGCCATT GCAGTCCCTC TGAGAGAAGA CATCGCAGTC 
2001 CCTCTCAGAG GAGCCATCGT GGTCCCTCTG AGAGAAGACA TCACAGTCCC 
2051 TCTAAGAGAA GCCATCGCAG TCCCGCTCGG AGGAGCCATC GCAGTCCCTC 
2101 AGAGAGAAGC CATCACAGTC CCTCTGAGAG AAGCCATCAC AGTCCCTCTG 
2151 AGAGAAGACA TCACAGTCCC TCTGAGAGAA GCCATTGCAG TCCCTCTGAG 
2201 AGAAGCCATT GCAGTCCCTC TGAGAGAAGA CATCGCAGTC CCTCTGAGAG 
2251 AAGACATCAC AGTCCCTCAG AGAAAAGCCA TCACAGTCCC TCTGAGAGAA 
2301 GCCATCACAG TCCCTCTGAG AGAAGACGTC ACAGTCCCTT GGAGAGGAGC 
2351 CGTCACAGTC TCTTGGAGAG GAGCCATCGC AGTCCCTCTG AGAGGAGATC 
2401 TCACAGGTCC TTTGAGAGGA GCCATCGTAG GATTTCTGAG AGAAGTCACA 
2451 GTCCCTCAGA GAAGAGCCAC CTCAGTCCCT TGGAAAGAAG CCGTTGCAGT 
2501 CCCTCTGAGA GGAGAGGACA CAGTTCCTCT GGGAAAACCT GTCACAGTCC 
2551 CTCTGAGAGA AGCCATCGCA GTCCCTCCGG GATGAGGCAA GGGAGGACCT 
2601 CTGAGAGGAG CCATCGCAGT TCCTGTGAGA GAACCCGTCA CAGTCCCTCT 
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2651 GAGATGAGGC CAGGGAGGCC CTCTGGGAGG AACCATTGCA GTCCCTCTGA 
2701 GAGGAGCCGA CGCAGTCCCC TTAAGGAGGG ACTCAAGTAC AGTTTCCCTG 
2751 GAGAGAGGCC CAGCCATAGT TTGTCTAGAG ATTTCAAGAA TCAAACAACT 
2801 CTCCTCGGGA CCACACATAA AAATCCCAAA GCAGGGCAAG TGTGGAGGCC 
2851 TGAAGCTACT CGATGAGGCG AGGTCCGCCC CTATTATTCA TTGTCCTAAG 
2901 TCTTCATCGT GCTGCCCTTT CCAGGCTTCT TTCCTGCTCA GCCACTGCCT 
2951 CCAATTCCTG CGCCCCCAGC GTGGAAAGGC TTCCATTTCT CTCTACCGGG 
3001 GGGGAGGCGG GTGAGAATGG GTCTGTAATT TCTCTAAGAT GAATAAAGGG 
3051 GCAGTTAATT AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAGG 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 2 


ORF from 47 bp to 2863 bp; peptide length: 939 
Category: similarity to unknown protein 
Classification: unclassified 
Prosite motifs: ATP__GTP_A (824-832) 


1 MEESEDSQSD SQTRISESQH SLKPNYLSQA KTDFSEQFQL LEDLQLKIAA 

51 KLLRSQIPPD VPPPLASGLV LKYPICLQCG RCSGLNCHHK LQTTSGPYLL 

101 lYPQLHLVRT PEGHGEVRLH LGFRLRIGKR SQISKYRERD RPVIRRSPIS 

151 PSQRKAKIYT QASKSPTSTI DLQSGPSQSP APVQVYIRRG QRSRPDLVEK 

201 TKTRAPGHYE FTQVHNLPES DSESTQNEKR AKVRTKKTSD SKYPMKRITK 

251 RLRKHRKFYT NSRTTIESPS RELAAHLRRK RIGATQTSTA SLKRQPKKPS 

301 QPKFMQLLFQ SLKRAFQTAH RVIASVGRKP VDGTRPDNLW ASKNYYPKQN 

351 ARDYCLPSSI KRDKRSADKL TPAGSTIKQE DILWGGTVQC RSAQQPRRAY 

401 SFQPRPLRLP KPTDSQSGIA FQTASVGQPL RTVQKDSSSR SKKNFYRNET 

451 SSQESKNLST PGTRVQARGR ILPGSPVKRT WHRHLKDKLT HKEHNHPSFY 

501 RERTPRGPSE RTRHNPSWRN HRSPSERSQR SSLERRHHSP SQRSHCSPSR 

551 KNHSSPSERS WRSPSQRNHC SPPERSCHSL SERGLHSPSQ RSHRGPSQRR 

601 HHSPSERSHR SPSERSHRSP SERRHRSPSQ RSHRGPSERS HCSPSERRHR 

651 SPSQRSHRGP SERRHHSPSK RSHRSPARRS HRSPSERSHH SPSERSHHSP 

701 SERRHHSPSE RSHCSPSERS HCSPSERRHR SPSERRHHSP SEKSHHSPSE 

751 RSHHSPSERR RHSPLERSRH SLLERSHRSP SERRSHRSFE RSHRRISERS 

801 HSPSEKSHLS PLERSRCSPS ERRGHSSSGK TCHSPSERSH RSPSGMRQGR 

851 TSERSHRSSC ERTRHSPSEM RPGRPSGRNH CSPSERSRRS PLKEGLKYSF 

901 PGERPSHSLS RDFKNQTTLL GTTHKNPKAG QVWRPEATR 

BLAST? hits 

No BLAST? hits available 

Alert BLASTP hits for DKFZphtes3_8gll , frame 2 

TREMBL:AF061185_1 gene: "car90"; product: "cyst germination specific 
acidic repeat protein precursor"; Phytophthora infestans cyst 
germination specific acidic repeat protein precursor (car90} gene, 
complete cds,, N = 1, Score =457, P = 2.3e-39 

TREMBL:AC004561_38 gene: "F16P2.41"; product: "putative proline-rich 
protein"; Arabidopsis thaliana chromosome II BAG F16P2 genomic 
sequence, complete sequence., N = 1, Score « 340, P = 4.2e-27 

TREMBL:AF062655_1 product: "plenty-of-prolines-101"; Mus musculus 
plenty-of-prolines-101 mRNA, complete cds., N = 1, Score « 313, P - 
3.6e-24 

PIR:PN0099 son3 protein - human (fragment), N « 1, Score - 292, ? « 
1.2e-22 


>TREa4BL:AF061185_l gene; ••car90"; product: "cyst germination specific acidic 
repeat protein precursor"; Phytophthora infestans cyst germination 
specific acidic repeat protein precursor (car90) gene, complete cds. 
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Length = 1,489 

HSPs: 

Score = 457 (68.6 bits). Expect - 2.3e-39, P = 2.3e-39 

Identities = 91/444 (20%), Positives = 239/444 (53%) 

Query: 475 SPVKRTWHRHLKDKLTHKEHNHPSFY-RERTPRGPSERTRHNPSWRNHRSPSERSQRSSL 533 

+P + T + +++ T+ ++ E TP P+E T + P+ +P+E + +S 

Sbjct: 584 APTEETMYAPIEET-TYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAST 642 

Query: 534 ERRHHSPSQRSHCSPSRKNHSSPSERSWRSPSQRNHCSPPERSCHSLSERGLHSPSQRSH 593 

E ++P++ + +P+ + P+E + +P++ +P E + ++ +E ++P++ + 
Sbjct: 643 EETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETT 702 

Query: 594 RGPSQRRHHSPSERSHRSPSERSHRSPSERRHRSPSQRSHRGPSERSHCSPSERRHRSPS 653 

P++ + P+E + +P+E + +P+E +P + + GP+E + +P+E +P+ 
Sbjct: 703 YAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETTYGPTEETTYAPTEATTYAPT 762 

Query: 654 QRSHRGPSERRHHSPSKRSHRSPARRSHRSPSERSHHSPSERSHHSPSERRHHSPSERSH 713 

+ + P+E + P+ + +P + +P+E + ++P+E + ++P+E + P+E + 
Sbjct: 763 EETPYAPTEETTYEPTGETTYAPTEETTYAPTEETTYAPTEETTYAPTEETPYEPTEETT 822 

Query: 714 CSPSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPSERRRHSPLERSRHSLL 773 

+P+E + P+E +P+E ++P+E++ ++P+E++ ++P+E ++P E + + 
Sbjct: 823 YAPTEETPYEPTEETTYTPTEETTYAPTEETTYAPTEKTTYAPTEETTYAPTEETPYEPT 882 

Query: 774 ERSHRSPSERRSHRSFERS-HRRISERSHSPSEKSHLSPLERSRCSPSERRGHSSSGKTC 832 

E + +P++ ++ E + + E +++P+E++ +P E + P+E ++ + +T 
Sbjct: 883 EETTYAPTKETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETT 942 

Query: 833 HSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSP5EMRPGRPSGRNHCSPSERSRRSPL 892 

++P+E + +P+ +E + + E T + P+E P+ +P+E + +P+ 

Sbjct: 94 3 YAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPI 1002 

Query: 893 KEGLKYSFPGERPSHSLSRDFKNQTT 918 

+E Y+ P E +++ + + + T 
Sbjct: 1003 EE-TTYA-PTEETTYAPAEETPYEPT 1026 

Score ^ 445 (66.8 bits). Expect = 4.5e-38, P = 4.5e-38 
Identities = 83/394 (21%), Positives = 212/394 (53%) 

Query: 502 ERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSW 561 

E TP P+E T + P+ +P+E + + E ++P++ + +P+ + P+E + 

Sbjct: 763 EETPYAPTEETTYEPTGETTYAPTEETTYAPTEETTYAPTEETTYAPTEETPYEPTEETT 822 

Query: 562 RSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPS 621 

+P++ p E + ++ +E ++P++ + P+++ ++P+E + +P+E +- P+ 

Sbjct: 823 YAPTEETPYEPTEETTYTPTEETTYAPTEETTYAPTEKTTYAPTEETTYAPTEETPYEPT 882 

Query: 622 ERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSH 681 
E +P++ + P+E + + +E +P++ + P+E + P++ + +P + 

Sbjct: 883 EETTYAPTKETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETT 942 

Query: 682 RSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPS 741 

+P+E + ++P+E + ++P+E ++P+E + P+E + +P+E +P+E ++P 
Sbjct: 943 YAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPI 1002 

Query: 742 EKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFERS-HRRISERS 800 

E++ ++P+E + ++P+E + P E + ++ E + +P+E ++ S E + + E + 
Sbjct: 1003 EETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYASTEETTYAPTEETT 1062 

Query: 801 HSPSEKSHLSPLERSRCSPSERRGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSSC 860 

++P+E++ P E + +P+E ++ + +T ++P+E + +P+ +E + 

Sbjct: 1063 YAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPT 1122 

Query: 861 ERTRHSPSEMRPGRPSGRNHCSPSERSRRSPLKE 894 

E T ++P+E P+ +P E + P+E 

Sbjct: 1123 EETTYAPTEETTYAPTEETMYAPIEETTYGPTEE 1156 

Score = 439 (65.9 bits). Expect = 2.0e-37, P » 2.0e-37 
Identities = 86/421 (20%), Positives = 223/421 (52%) 

Query: 475 SPVKRTWHRHLKDKLTHKEHNHPSFY-RERTPRGPSERTRHNPSWRNHRSPSERSQRSSL 533 

+ P + T + +K T+ ++ E TP P+E T + P+ +P+E + +S 

Sbjct: 848 APTEETTYAPT-EKTTYAPTEETTYAPTEETPYEPTEETTYAPTKETTYAPTEETTYAST 906 

Query: 534 ERRHHSPSQRSHCSPSRKNHSSPSERSWRSPSQRNHCSPPERSCHSLSERGLHSPSQRSH 593 

E ++P++ + +P+ + P+E + +P++ +P E + ++ +E ++P++ + 
Sbjct: 907 EETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETT 966 
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Query : 
Sbjct : 
Query: 
Sbjct: 
Query : 
Sbjct: 
Query: 


594 
967 
654 

1027 
714 

1087 
774 


Sbjct: 1147 

Query: 833 

Sbjct: 1207 

Query: 893 

Sbjct: 1267 

Score - 439 
Identities = 


Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 


475 
440 
534 
499 
594 
559 
654 
619 
714 
679 
774 
739 
833 
799 
893 
851 


Score = 437 
Identities = 


Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct : 
Query: 
Sbjct: 
Query: 
Sbjct: 


502 
419 
562 
479 
622 
539 
682 
599 
742 
659 


RGPSQRRHHSPSERSHRSPSERSHRSPSERRHRSPSQRSHRGPSERSHCSPSERRHRSPS 653 
P-n- + P+E + +P+E + +P+E +P + + p+E + +P+E P+ 

YAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETTYAPTEETTYAPAEETPYEPT 1026 

QRSHRGPSERRHHSPSKRSHRSPARRSHRSPSERSHHSPSERSHHSPSERRHHSPSERSH 713 
+ + P+E ++P++ + + + +P+E + ++P+E + + P+E ++P+E + 

EETTYAPTEETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETT 1086 

CSPSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPSERRRHSPLERSRHSLL 773 
+P+E + +P+E +P+E ++P+E++ + P+E + ++P+E ++P E + ++ + 

YAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPX 1146 

ERSHRSPSERRSHRSFERS-HRRISERSHSPSEKSHLSPLERSRCSPSERRGHSSSGKTC 832 
E + P+E ++ E + + E ++P+E++ P + +P+E ++ + +T 

EETTYGPTEETTYAPTEATTYAPTEETPYAPTEETTYEPTGETTYAPTEETTYAPTEETT 1206 

HSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPSGRNHCSPSERSRRSPL 892 
++P+E + +P+ +E + + E T + P+E P+ +P+E + +p 

YAPTEETTYAPTEETPYEPTEETTYAPTEETTYEPTEETTYAPTEETTYAPTEETTYAPT 1266 

KE 894 
+E 

EE 1268 

{65.9 bits). Expect = 2.0e-37, p = 2.0e-37 
= 91/434 (20%), Positives = 232/434 (53%) 

SPVKRTWHRHLKDKLTHKEHNHPSFY-RERTPRGPSERTRHNPSWRNHRSPSERSQRSSL 533 
+P + T + +K T+ ++ E TP P+E T + P+ +p+E + +s 

APTEETTYAPT-EKTT YAPTEETTYAPTEETPYEPTEETT YAPTKETTYAPTEETTYAST 4 98 

ERRHHSPSQRSHCSPSRKNHSSPSERSWRSPSQRNHCSPPERSCHSLSERGLHSPSQRSH 593 
E ++P++ + +P+ + p+E + +P++ +P E + ++ +E ++P++ + 

EETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETT 558 

RGPSQRRHHSPSERSHRSPSERSHRSPSERRHRSPSQRSHRGPSERSHCSPSERRHRSPS 653 
P++ + P+E + +P+E + +P+E +P + + p+E + +P+E P+ 

YAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETTYAPTEETTYAPAEETPYEPT 618 

QRSHRGPSERRHHSPSKRSHRSPARRSHRSPSERSHHSPSERSHHSPSERRHHSPSERSH 713 
+ + P+E ++P++ + + + +P+E + ++P+E + + P+E ++P+E + 

EETTYAPTEETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTBETTYAPTEETT 678 

CSPSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPSERRRHSPLERSRHSLL 773 
+P+E + +P+E +P+E ++P+E++ + P+E + ++P+E ++P E + ++ + 

yAPTEETTYAPTEETTyAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPI 738 

ERSHRSPSERRSHRSFERS-HRRISERSHSPSEKSHLSPLERSRCSPSERRGHSSSGKTC 832 
E + P+E ++ E + + E ++P+E++ P + +P+E ++ + +T 

EETTYGPTEETTYAPTEATTYAPTEETPYAPTEETTYEPTGETTYAPTEETTYAPTEETT 798 

HSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPSGRNHCSPSERSRRSPL 892 
++P+E ++P TE + +ET ++P+E P P+ +P+E + +p 

YAPTEETTYAP TEETPYEPT-EETTYAPTEETPYEPTEETTYTPTEETTYAPT 850 

KEGLKYSFPGERPSHS 908 
+E Y+ P E+ +++ 
EE-TTYA-PTEKTTYA 864 

(65.6 bits). Expect - 3.3e-37, p 3.3e-37 
= 85/417 (20%), Positives - 223/417 (53%) 

ERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSW 561 
E TP P+E T + P+ +P+E + + E+ ++P++ + +P+ + P+E + 

EETPYEPTEETTYTPTEETTYAPTEETTYAPTEKTTYAPTEETTYAPTEETPYEPTEETT 478 

RSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPS 621 
+P++ +P E + ++ +E ++P++ + P++ + P+E + +P+E + +P+ 

YAPTKETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPT 538 

ERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSH 681 
E +P++ + P+E + +P+E P++ + P+E ++P++ + +p + 

EETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETT 598 

RSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPS 741 
+P+E + ++P+E + + P+E ++P+E + +P+E + + +E +P+E ++P+ 

YAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYASTEETTYAPTEETTYAPA 658 

EKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFERS-HRRISERS 800 
E++ + P+E + ++P+E ++P E + ++ E + +P+E ++ E + + E + 

EETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETT 718 
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Query: 801 HSPSEKSHLSPLERSRCSPSERRGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHEISSC 860 

++P+E++ +P E + +P E + + +T ++P-I-E + +P+ +E + 

Sbjct: 719 YAPTEETTYAPTEETMYAPIEETTYGPTEETTYAPTEATTYAPTEETPYAPTEETTYEPT 778 

Query: 861 ERTRHSPSEMRPGRPSGRNHCSPSERSRRSPLKEGLKYSFPGERPSHSLSRDFKNQTT 918 

T + + P+E P+ +P+E + +P +E Y P E +++ + + + T 

Sbjct: 779 GETTYAPTEETTYAPTEETTYAPTEETTYAPTEE-TPYE-PTEETTYAPTEETPYEPT 834 

Score = 428 (64.2 bits). Expect = 3.1e-36, P = 3.1e-36 
Identities = 89/440 (20%), Positives - 228/440 (51%) 

Query: 473 PGSPVKRTWHRHLKDKLTHKEHNHPSFYR-ERTPRGPSERTRHNPSWRNHRSPSERSQRS 531 

P P + T + K+ T+ E T P+E T + P+ P+E + + 

Sbjct: 470 PYEPTEETTYAPTKET-TYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYA 528 

Query: 532 SLERRHHSPSQRSHCSPSRKNHSSPSERSWRSPSQRNHCSPPERSCHSLSERGLHSPSQR 591 

E ++P++ + +P+ + +P+E + +P++ P E + ++ +E ++P++ 

Sbjct: 529 PTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEE 588 

Query: 592 SHRGPSQRRHHSPSERSHRSPSERSHRSPSERRHRSPSQRSHRGPSERSHCSPSERRHRS 651 

+ P + ++P+E + +P+E + P+E +P++ + P+E + + +E + 
Sbjct: 589 TMYAPIEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYASTEETTYA 648 

Query: 652 PSQRSHRGPSERRHHSPSKRSHRSPARRSHRSPSERSHHSPSERSHHSPSERRHHSPSER 711 

P++ + P+E + P++ + +P + +P+E + ++P+E + ++P+E ++P+E 
Sbjct: 64 9 PTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEE 708 

Query: 712 SHCSPSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPSERRRHSPLERSRHS 771 

+ P+E + +P+E +P+E ++P E++ + P+E + ++P+E ++P E + ++ 
Sbjct: 709 TPYEPTEETTYAPTEETTYAPTEETMYAPIEETTYGPTEETTYAPTEATTYAPTEETPYA 768 

Query: 772 LLERSHRSPSERRSHRSFERS-HRRISERSHSPSEKSHLSPLERSRCSPSERRGHSSSGK 830 

E + P+ ++ E + + E +++P+E++ +P E + P+E ++ + + 
Sbjct: 769 PTEETTYEPTGETTYAPTEETTYAPTEETTYAPTEETTYAPTEETPYEPTEETTYAPTEE 828 

Query: 


831 TCHSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPSGRNHCSPSERSRRS 890 
T + P+E + +P+ +E + + E+T ++P+E P+ P+E + + 

Sbjct: 829 TPYEPTEETTYTPTEETTYAPTEETTYAPTEKTTYAPTEETTYAPTEETPYEPTEETTYA 888 


Query: 891 PLKEGLKYSFPGERPSHSLSRD 912 

P KE Y+ P E +++ + + 
Sbjct: 889 PTKE-TTYA-PTEETTYASTEE 908 

Score = 427 (64.1 bits). Expect = 4.0e-36, P = 4.0e-36 
Identities = 81/394 (20%), Positives = 213/394 (54%) 

Query: 502 ERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSW 561 

E T GP+E T + P+ +P+B + + E + P+ + +P+ + +p+E + 

Sbjct: 739 EETTYGPTEETTYAPTEATTYAPTEETPYAPTEETTYEPTGETTYAPTEETTYAPTEETT 798 

Query: 562 RSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPS 621 

+P++ +P E + + +E ++P++ + P++ ++P+E + +P+E + +P+ 
Sbjct: 799 YAPTEETTYAPTEETPYEPTEETTYAPTEETPYEPTEETTYTPTEETTYAPTEETTYAPT 858 

Query: 622 ERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSH 681 
E+ +P++ + P+E + P+E +P++ + P+E ++ ++ + +P + 

Sbjct: 859 EKTTYAPTEETTYAPTEETPYEPTEETTYAPTKETTYAPTEETTYASTEETTYAPTEETT 918 

Query: 682 RSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPS 741 

+ P+E + + P+E + ++P+E ++P+E + +P+E + +P+E +P+E + P+ 
Sbjct: 919 YAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPT 978 

Query: 742 EKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFERS-HRRISERS 800 

E++ ++P+E + ++P+E ++P+E + ++ E + +P+E + E + + E + 
Sbjct: 979 EETTYAPTEETTYAPTEETMYAPIEETTYAPTEETTYAPAEETPYEPTEETTYAPTEBTT 1038 

Query: 801 hspsekshlsplersrcspserrghsssgktchspsershrspsgmrqgrtsershrssc 860 

++P+E++ + E + +P+E ++ + +T + P+E + +P+ +E + + 

Sbjct: 1039 YAPTEBTTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPT 1098 

Query: 861 ERTEIHSPSEMRPGRPSGRNHCSPSERSRRSPLKE 894 

E T ++P+E P+ P+E + +P +E 

Sbjct: 1099 EETTYAPTEETTYAPAEETPYEPTEETTYAPTEE 1132 

Score = 424 (63.6 bits). Expect 8.5e-36, P = 8.5e-36 
Identities = 81/394 (20%), Positives = 210/394 (53%) 

Query: 502 ERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSW 561 

E T P+E T + P+ +P+E + + E + P++ + +P+ + +P+E + 

Sbjct: 939 EETTYAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETM 998 
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Query: 562 RSPSQRKHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPS 621 

+P + +P E + ++ +E + P++ + P++ ++P+E + + +E + +P+ 
Sbjct: 999 YAPIEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYASTEETTYAPT 1058 

Query: 622 ERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSH 681 

E +P++ + P+E + +P+E +P++ + p+E ++P++ + +PA + 
Sbjct: 1059 EETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEETP 1118 

Query: 682 RSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPS 741 

P+E + + ++P+E ++P E + P+E + +P+E +P+E ++p+ 

Sbjct: 1119 YEPTEETTYAPTEETTYAPTEETMYAPIEETTYGPTEETTYAPTEATTYAPTEETPYAPT 1178 

Query: 742 EKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFERS-HRRISERS 800 

E++ + P+ + ++P+E ++P E + ++ E + +P+E + E + + E + 
SbDCt: 1179 EETTYEPTGETTYAPTEETTYAPTEETTYAPTEETTYAPTEETPYEPTEETTYAPTEETT 1238 

Query: 801 HSPSEKSHLSPLERSRCSPSERRGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSSC 860 

+ +P E + +P+E ++ + +T ++P + + p+ +E + + 

SbDCt: 1239 YEPTEETTYAPTEETTYAPTEETTYAPTEETMYAPIDETTYGPTEETTYAPTEATTYAPT 1298 

Query: 861 ERTRHSPSEMRPGRPSGRNHCSPSERSRRSPLKE 894 

E T ++P+E P+G +P+E + +P +E 

Sbjct: 1299 EETPYAPTEETTYEPTGETTYAPTEETTYAPTEE 1332 

Score = 422 (63.3 bits). Expect = 1.4e-35, P = 1 4e-35 
Identities = 84/407 (20%), Positives •= 216/407 (53%) 

Query: 502 ERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSW 561 

E T P+E T + P+ p+E + + E + P++ + +P+ + + 

Sbjct: 795 EETTYAPTEETTYAPTEETPYEPTEETTYAPTEETPYEPTEETTYTPTEETTYAPTEETT 854 

Query: 562 RSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPS 621 

+P+++ +P E + ++ +E + P++ + P++ ++P+E + + +E + +P+ 
Sbjct: 855 YAPTEKTTYAPTEETTyAPTEETPYEPTEETTYAPTKirTTvaoTPPT"rvftcn.or<«,n,y^pY 


P 974 


-^^^^ ^r- c T -rt + e++ + P++ ++P+E + + +E + 
YAPTEKTTYAPTEETTYAPTEETPYEPTEETTYAPTKETTYAPTEETTYASTEETTY, 

Query: 622 errhrspsqrshrgpsershcspserrhrspsqrshrgpserrhhspskrshrsparrs 

- E +P++ + p+E + +P+E +P++ + p+E ++P++ + + 

Sbjct : 915 eettyapaeetpyepteettyapteettyapteettyapteettyapteettyapaeet 

Query: 682 RSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPS 741 

P+E + ++P+E + ++P+E ++P E + +P+E + +P+E p+E ++P+ 

SbjCt: 975 YEPTEETTYAPTEETTYAPTEETMYAPIEETTYAPTEETTYAPAEETPYEPTEETTYAPT 1034 

Query: 742 EKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFERS-HRRISERS 800 
ow . + ++ +E ++P E + ++ E + P+E ++ E + + E + 

Sbjct: 1035 EETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETT 1094 

Query: 801 HSPSEKSHLSPLERSRCSPSERRGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSSC 860 

++P+E++ +P E + +P+E + + +T ++P+E + +P+ e + 

Sbjct: 1095 YAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETTYGPT 1154 

Query: 861 ERTRHSPSEMRPGRPSGRNHCSPSERSRRSPLKEGLKYSFPGERPSHS 908 

E T ++P+E P+ +P+E + p E Y+ P E +++ 

Sbjct: 1155 EETTYAPTEATTYAPTEETPYAPTEETTYEPTGE-TTYA-PTEETTYA 1200 

Score - 421 (63.2 bits), Expect « 1.8e-35, p = 1.8e-35 
Identities = 86/418 (20%), Positives = 219/418 (52%) 

Query: 491 HKEHNHPSFYRERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSR 550 

H H E T P+E T + P+ +P+E + + E + P++ + +p+ 

Sbjct: 376 HYAHIEKPCDTEVTMYAPTEETTYAPTEETTYAPTEETTYAPTEETPYEPTEETTYTPTE 435 

Query: 551 KNHSSPSERSMRSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHR 610 

+ +P+E + +P+++ +P E + ++ +E + P++ + P++ ++P+E + 
Sbjct: 436 ETTYAPTEETTYAPTEKTTYAPTEETTYAPTEETPYEPTEETTYAPTKETTYAPTEETTY 495 

Query: 611 SPSERSHRSPSERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSK 670 

. ... * •'^•'"^ * + + P+E ++P++ 

Sbjct: 496 ASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTE 555 

Query: 671 RSHRSPARRSHRSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHR 730 

+ +PA + P+E + ++P+E + ++P+E ++P E + +P+E + +P+E 
Sbjct: - 556 ETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETTYAPTEETTYAPAEETPY 615 

Query: 731 SPSERRHHSPSEKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFE 790 

P+E ++P+E++ ++P+E + ++ +E ++P E + ++ E + P+E ++ E 
Sbjct: 616 EPTEETTYAPTEETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTE 675 

Query: 791 RS-HRRISERSHSPSEKSHLSPLERSRCSPSERRGHSSSGKTCHSPSERSHRSPSGMRQG 849 
+ + E +++P+E++ +P E + +P+E + + +T ++P+E + +P+ 
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Sbjct: 676 ETTYAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMY 735 

Query: 850 RTSERSHRSSCERTRHSPSEMRPGRPSGRNHCSPSERSRRSPLKEGLKYSFPGERPSHS 908 

E + E T ++P+E P+ +P+E + P E Y+ P E +++ 

Sbjct: 736 APIEETTYGPTEETTYAPTEATTYAPTEETPYAPTEETTYEPTGE-TTYA-PTEETTYA 792 

Score = 420 (63.0 bits), Expect = 2.3e-35, P = 2.3e-35 
Identities = 82/393 (20%), Positives = 206/393 (52%) 

Query: 502 ERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSW 561 

E TP P+E T + P+ +P+E + + +E ++P++ + +P+ + P+E + 

Sbjct: 971 EETPYEPTEETTYAPTEETTYAPTEETMYAPIEETTYAPTEETTYAPAEETPYEPTEETT 1030 

Query: 562 RSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPS 621 

+P++ +P E + ++ +E ++P++ + P++ + P+E + +P+E + +P+ 
Sbjct: 1031 YAPTEETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPT 1090 

Query: 622 ERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSH 681 
E +P++ + P+E + +P+E P++ + P+E ++P++ + +p + 

Sbjct: 1091 EETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETT 1150 

Query: 682 RSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPS 741 

P+E + ++P+E + ++P+E ++P+E + P+ + +P+E +P+E ++P+ 
Sbjct: 1151 YGPTEETTYAPTEATTYAPTEETPYAPTEETTYEPTGETTYAPTEETTYAPTEETTYAPT 1210 

Query: 742 EKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFERS-HRRISERS 800 

E++ ++P+E + + P+E ++P E + + E + +P+E ++ E + + E 
Sbjct: 1211 EETTYAPTEETPYEPTEETTYAPTEETTYEPTEETTYAPTEETTYAPTEETTYAPTEETM 1270 

Query: 801 HSPSEKSHLSPLERSRCSPSERRGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSSC 860 

++P +++ P E + +P+E ++ + +T ++P+E + P+G +E + + 

Sbjct: 1271 YAPIDETTYGPTEETTYAPTEATTYAPTEETPYAPTEETTYEPTGETTYAPTEETTYAPT 1330 

Query: 861 ERTRHSPSEMRPGRP SGRNHCSPSE 885 

E T ++P E P P S C+ E 

Sbjct: 1331 EETTYAPMEETPYEPAEESTSTVSTEKPCNTEE 1363 

Score = 419 (62.9 bits). Expect = 3.0e-35, P - 3.0e-35 
Identities = 83/411 (20%), Positives = 215/411 (52%) 

Query: 502 ERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSW 561 

E T P+E T + P+ +P+E + E + + P++ + +P+ + +p E + 

Sbjct: 947 EETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETT 1006 

Query: 562 RSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPS 621 

+P++ +P E + + +E ++P++ + P++ ++ +E + +P+E + +P+ 
Sbjct: 1007 YAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYASTEETTYAPTEETTYAPA 1066 

Query: 622 ERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSH 681 

E P++ + P+E + +P+E +P++ + P+E ++P++ + p + 

Sbjct: 1067 EETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETT 1126 

Query: 682 RSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPS 741 

+P+E + ++P+E + ++P E + P+E + +P+E + +P+E +P+E + P+ 
Sbjct: 1127 YAPTEETTYAPTEETMYAPIEETTYGPTEETTYAPTEATTYAPTEETPYAPTEETTYEPT 1186 

Query: 742 EKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFERS-HRRISERS 800 

++ ++P+E + ++P+E ++P E + ++ E + P+E ++ E + + E + 
Sbjct: 1187 GETTYAPTEETTYAPTEETTYAPTEETTYAPTEETPYEPTEETTYAPTEETTYEPTEETT 1246 

Query: 801 HSPSEKSHLSPLERSRCSPSERRGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSSC 860 

++P+E++ +P E + +P+E ++ +T + P+E + +P+ +E + + 

Sbjct: 1247 YAPTEETTYAPTEETTYAPTEETMYAPIDETTYGPTEETTYAPTEATTYAPTEETPYAPT 1306 

Query: 861 ERTRHSPSEMRPGRPSGRNHCSPSERSRRSPLKEGLKYSFPGERPSHSLSRD 912 

E T + P+ P+ +P+E + +P++E Y P E + ++S + 

Sbjct: 1307 EETTYEPTGETTYAPTEETTYAPTEETTYAPMEE-TPYE-PAEESTSTVSTE 1356 

Score = 415 (62.3 bits). Expect = 8.0e-35, P = 8.0e-35 
Identities = 84/423 (19%), Positives = 218/423 (51%) 

Query: 473 PGSPVKRTWHRHLKDKLTHKEHNHPSFYR-ERTPRGPSERTRHNPSWRNHRSPSERSQRS 531 

P P + T + K+ T+ ++ E T P+E T + P+ P+E + + 

Sbjct: 878 PYEPTEETTYAPTKET-TYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYA 936 

Query: 532 SLERRHHSPSQRSHCSPSRKNHSSPSERSWRSPSQRNHCSPPERSCHSLSERGLHSPSQR 591 

E ++P++ + +P+ + +P+E + +P++ P E + ++ +E ++P++ 

Sbjct: 937 PTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEE 996 

Query: 592 SHRGPSQRRHHSPSERSHRSPSERSHRSPSERRHRSPSQRSHRGPSERSHCSPSERRHRS 651 
+ P + ++P+E + +P+E + P+E +P++ + P+E + + +E + 
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Sbjct: 997 TMYAPIEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYASTEETTYA 1056 

Query: 652 PSQRSHRGPSERRHHSPSKRSHRSPARRSHRSPSERSHHSPSERSHHSPSERRHHSPSER 711 

P++ + p+E + P++ + +P + +P+E + ++P+E + ++P+E ++P+E 
Sbjct: 1057 PTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEE 1116 

Query: 712 SHCSPSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPSERRRHSPLERSRHS 771 

+ P+E + +P+E +P+E ++P E++ + P+E + ++P+E ++P E + ++ 
Sbjct: 1117 TPYEPTEETTYAPTEETTYAPTEETMYAPIEETTYGPTEETTYAPTEATTYAPTEETPYA 1176 

Query: 772 LLERSHRSPSERRSHRSFERS-HRRISERSHSPSEKSHLSPLERSRCSPSERRGHSSSGK 830 

E + P+ ++ E + + E +++P+E++ +P E + P+E ++ + + 
Sbjct: 1177 PTEETTYEPTGETTYAPTEETTYAPTEETTYAPTEETTYAPTEETPYEPTEETTYAPTEE 1236 

Query: 831 TCHSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPSGRNHCSPSERSRRS 890 

T + P+E + +P+ +E + + E T ++P + p+ +P+E + + 

Sbjct: 1237 TTYEPTEETTYAPTEETTYAPTEETTYAPTEETMYAPIDETTYGPTEETTYAPTEATTYA 1296 

Query: 891 PLKE 894 
P +E 

Sbjct: 1297 PTEE 1300 

Score = 403 {60.5 bits). Expect = 1.6e-33, P = 1.6e-33 
Identities = 84/394 (21%), Positives = 213/394 {54%) 

Query: 501 RERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERS 560 

RE T PSE T + P +P+E+ +E + + ++ +p++ ++P+ER 

Sbjct: 319 REETTAAPSEDTTYAPREVTPYAPTEKPy— DVEETTYVTEESTY-APTKSETNAPTERM 375 

Query: 561 WRSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSP 620 

+ ++ C E + ++ +E ++P++ + P++ ++P+E + P+E + +P 
Sbjct: 376 HYAHIEKP-CDT-EVTMYAPTEETTYAPTEETTYAPTEETTYAPTEETPYEPTEETTYTP 433 

Query: 621 SERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRS 680 

+E +P++ + P+E++ +P+E +P++ + P+E ++P+K + +P + 
Sbjct: 434 TEETTYAPTEETTYAPTEKTTYAPTEETTYAPTEETPYEPTEETTYAPTKETTYAPTEET 493 

Query: 681 HRSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSP 740 

+ +E + ++P+E + ++P+E + P+E + +P+E + +P+E +P+E ++P 
Sbjct: 494 TYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAP 553 

Query: 741 SEKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFERS-HRRISER 799 

+E++ ++P+E + + P+E ++P E + ++E++PE++ E + + E 
Sbjct: 554 TEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETTYAPTEETTYAPAEET 613 

Query: 800 SHSPSEKSHLSPLERSRCSPSERRGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSS 859 

+ P+E++ +P E + +P+E ++S+ +T ++P+E + +P+ +E + + 

Sbjct: 614 PYEPTEETTYAPTEETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAP 673 

Query: 860 CERTRHSPSEMRPGRPSGRNHCSPSERSRRSPLKE 894 

E T ++P+E P+ +P+E + +P +E 

Sbjct: 674 TEETTYAPTEETTYAPTEETTYAPTEETTYAPAEE 708 

Score = 398 (59.7 bits), Expect = 5.5e-33, P = 5.5e-33 
Identities = 84/402 (20%), Positives = 209/402 (51%) 

Query: 475 SPVKRTWHRHLKDKLTHKEHNHPSFY-RERTPRGPSERTRHNPSWRNHRSPSERSQRSSL 533 

+P + T + +++ T+ ++ E TP P+E T + P+ +P+E + +S 

Sbjct: 992 APTEETMYAPIEET-TYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAST 1050 

Query: 534 ERRHHSPSQRSHCSPSRKNHSSPSERSWRSPSQRNHCSPPERSCHSLSERGLHSPSQRSH 593 

E ++P++ + +P+ + p+E + +P++ +p E + ++ ++P++ + 

Sbjct: 1051 EETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETT 1110 

Query: 594 RGPSQRRHHSPSERSHRSPSERSHRSPSERRHRSPSQRSHRGPSERSHCSPSERRHRSPS 653 

P++ + P+E + +P+E + +P+E +P + + GP+E + +P+E +P+ 
Sbjct: 1111 YAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETTYGPTEETTYAPTEATTYAPT 1170 

Query: 654 QRSHRGPSERRHHSPSKRSHRSPARRSHRSPSERSHHSPSERSHHSPSERRHHSPSERSH 713 

+ + P+E + P+ + +P + +P+E + ++P+E + ++P+E + P+E + 
Sbjct: 1171 EETPYAPTEETTYEPTGETTYAPTEETTYAPTEETTYAPTEETTYAPTEETPYEPTEETT 1230 

Query: 714 CSPSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPSERRRHSPLERSRHSLL 773 

+P+E + P+E +P+E ++P+E++ ++P+E + ++P + + p E + ++ 
Sbjct: 1231 YAPTEETTYEPTEETTYAPTEETTYAPTEETTYAPTEETMYAPIDETTYGPTEETTYAPT 1290 

Query: 774 ERSHRSPSERRSHRSFERSHRRISERSHSPSEKSHLSPLERSRCSPSERRGHSSSGKTCH 833 

E ♦ +P+E + E E ++ P+ ++ +p E + +P+E ++ +T + 

Sbjct: 1291 EATTYAPTEETPYAPTE ETTYEPTGETTYAPTEETTYAPTEETTYAPMEETPY 1343 

Query: 834 SPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPS 876 
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P+E S+S+ TE+ +ET PS+ P+ 
Sbjct: 1344 EPAEESTSTVSTEKPCNTEEFTDEPTDEPT-DEPSDEPTDEPT 1385 

Score « 368 (55.2 bits). Expect 9.5e-30, P = 9.5e-30 
Identities = 79/386 (20%), Positives « 211/386 (54%) 

Query: 524 psersqrsslerrhhspsqrshcspsrknhsspserswrspsqrnhcspperschslser 583 

PS+ ++ + E + P + + +PS +P E + +P+++ + E + + ++E 

Sbjct: 303 PSDETEAPT-EGTTYVPREETTAAPSEDTTYAPREVTPYAPTEKPY— DVEETTY-VTEE 358 

Query: 584 GLHSPSQRSHRGPSQRRHHSPSER SHRSPSERSHRSPSERRHRSPSQRSHRGPS 637 

++P++ P++R H++ E+ + +P+E + +P+E +P++ + P+ 

Sbjct: 359 STYAPTKSETNAPTERMHYAHIEKPCDTEVTMYAPTEETTYAPTEETTYAPTEETTYAPT 418 

Query: 638 ERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSHRSPSERSHHSPSERSH 697 

E + P+E +P++ + P+E ++P++++ +P + +P+E + + P+E + 
Sbjct: 419 EETPYEPTEETTYTPTEETTYAPTEETTYAPTEKTTYAPTEETTYAPTEETPYEPTEETT 478 

Query: 698 HSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPS 757 

++P++ ++P+E + + +E + +P+E +P+E + P+E++ ++P+E + ++P+ 
Sbjct: 479 YAPTKETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPT 538 

Query: 758 ERRRHSPLERSRHSLLERSHRSPSERRSHRSFERS-HRRISERSHSPSEKSHLSPLERSR 816 

E ++P E + ++ E + +P+E + E + + E +++P+E++ +P+E + 
Sbjct: 539 EETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETT 598 

Query: 817 CSPSERRGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPS 876 

+P+E ++ + +T + P+E + +P+ +E + +S E T ++P+E P+ 

Sbjct: 599 YAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYASTEETTYAPTEETTYAPA 658 

Query: 877 GRNHCSPSERSRRSPLKEGLKYSFPGERPSHS 908 

P+E + +P +E Y+ P E +++ 
Sbjct: 659 EETPYEPTEETTYAPTEE-TTYA-PTEETTYA 688 

* Score = 337 (50.6 bits). Expect = 2.1e-26, P = 2.1e-26 
Identities = 66/328 (20%), Positives = 170/328 (51%) 


Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query : 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 


502 ERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSW 561 
E T P+E T + P+ +P+E + + E ++P++ + +P+ + +P+E + 

1059 EETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEETP 1118 

562 RSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPS 621 
P++ +P E + ++ +E +++P + + GP++ ++P+E + +P+E + +P+ 
1119 YEPTEETTYAPTEETTYAPTEETMYAPIEETTYGPTEETTYAPTEATTYAPTEETPYAPT 1178 

622 ERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSH 681 
E P+ + P+E + +P+E +P++ + P+E + P++ + +P + 

1179 EETTYEPTGETTYAPTEETTYAPTEETTYAPTEETTYAPTEETPYEPTEETTYAPTEETT 1238 

682 RSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPS 741 
P+E + ++P+E + ++P+E ++P+E + +P + + P+E +P+E ++P+ 
1239 YEPTEETT YA PTEETT Y APTEETT YAPTEETM YAPI DETT YGPTEETT YAPTEATT YAPT 1298 

742 EKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFERSHRRIS 797 

E++ ++P+E + + P+ ++P E + ++ E + +P E + E S +S 
1299 EETPYAPTEETTYEPTGETTYAPTEETTYAPTEETTYAPMEETPYEPAEESTSTVSTEKP 1358 

798 ERSHSPSEKSHLSPLERSRCSPSE 821 

E + P+++ P + P++ 
1359 CNTEEFTDEPTDEPTDEPSDEPTDEPTD 1386 


Score = 333 (50.0 bits). Expect = 5.7e-26, P = 5.7e-26 
Identities - 63/320 (19%), Positives « 166/320 (51%) 

Query: 502 ERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSW 561 

E T P+E T + P+ +P+E + + E ++P++ + p+ + +P+E + 

Sbjct: 1075 EETTYAPTEETTYAPTEETTYA PTEETT YAPTEETTYAPAEETPYEPTEETTYAPTEETT 1134 

Query; 562 RSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPS 621 

+P++ +P E + + +E ++P++ + P++ ++P+E + P+ ■ + +P+ 
Sbjct: 1135 YAPTEETMYAPIEETTYGPTEETTYAPTEATTYAPTEETPYAPTEETTYEPTGETTYAPT 1194 

Query: 622 ERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSH 681 

E +P++ + P+E + +P+E P++ + P+E + P++ + +P + 

Sbjct: 1195 EETTYAPTEETTYAPTEETTYAPTEETPYEPTEETTYAPTEETTYEPTEETTYAPTEETT 1254 

Query: 682 RSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPS 741 

+P+E + ++P+E + ++P + + P+E + +P+E + +P+E +P+E + P+ 
Sbjct: 1255 YAPTEETTYAPTEETMYAPIDETTYGPTEETTYAPTEATTYAPTEETPYAPTEETTYEPT 1314 

Query: 742 EKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFERSHRRISERSH 801 
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++ ++P+E + ++P+E ++P+E ++ ES+S+ +E+ E+ 
Sbjct: 1315 GETTYAPTEETTyAPTEETTYAPMEETPYEPAEESTSTVSTEKPCNTEEFTDEPTDEPTD 1374 

Query: 802 SPSEKSHLSPLERSRCSPSE 821 

PS++ P + P++ 
Sbjct: 1375 EPSDEPTDEPTDEPTDLPTD 1394 

Score = 303 (45.5 bits). Expect - 9,6e-23, P = 9.6e-23 
Identities « 70/322 (21%), Positives « 170/322 (52%) 

Query: 584 GLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPSERRHRSPSQRSHRGPSERSHCS 643 

G + PS + P++ + P E + +PSE + +P E +P+++ + E ++ + 
Sbjct: 299 GGYEPSDETE-APTEGTTYVPREETTAAPSEDTTYAPREVTPYAPTEKPy-DVEETTYVT 356 

Query: 644 PSERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSHRSPSERSHHSPSERSHHSPSER 703 

E +P++ P+ER H++ ++ + + +P+E + ++P+E + ++P+E 

Sbjct: 357 — EESTYAPTKSETNAPTERMHYAHIEKPCDTEV— TMYAPTEETTYAPTEETTYAPTEE 412 

Query: 704 RHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPSERRRHS 763 

++P+E + P+E + +P+E +P+E ++P+EK+ ++P+E + ++P+E + 
Sbjct: 413 TTYAPTEETPYEPTEETTYTPTEETTYAPTEETTYAPTEKTTYAPTEETTYAPTEETPYE 472 

Query: 764 PLERSRHSLLERSHRSPSERRSHRSFERS-HRRISERSHSPSEKSHLSPLERSRCSPSER 822 

P E + ++ + + +P+E ■!-+ S E + + E +++P+E++ P E + +P+E 
Sbjct: 473 PTEETTYAPTKETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEE 532 

Query: 823 RGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPSGRNHCS 882 

++ + +T ++P+E + +P+ +E + E T ++P+E P+ + 

Sbjct: 533 TTYAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYA 592 

Query: 883 PSERSRRSPLKEGLKYSFPGERP 905 

P E + +P +E Y+ E P 
Sbjct: 593 PIEETTYAPTEE-TTYAPAEETP 614 

Score « 151 (22.7 bits). Expect = 2.0e~06, P = 2.0e-06 
Identities = 45/198 (22%), Positives « 103/198 (52%) 

Query: 716 PSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPSERRRHSPLERSRHSLLER 775 

PS+ + +P+E P E +PSE + ++P E + ++P+E+ +E + + + E 

Sbjct: 303 PSDETE-APTEGTTYVPREETTAAPSEDTTYAPREVTPYAPTEKPYD— VEETTY-VTEE 358 

Query: 776 SHRSPSERRSHRSFERSHRRISERS HSPSEKSHLSPLERSRCSPSERRGHSSS 828 

S +P++ ++ ER H E+ ++P+E++ +P E + +P+E ++ + 

Sbjct: 359 STYAPTKSETNAPTERMHYAHIEKPCDTEVTMYAPTEETTYAPTEETTYAPTEETTYAPT 418 

Query: 829 GKTCHSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPSGRNHCSPSERSR 888 

+T + P+E + +P+ +E + + E+T ++P+E P+ p+E + 

Sbjct: 419 EETPYEPTEETTYTPTEETTYAPTEETTYAPTEKTTYAPTEETTYAPTEETPYEPTEETT 478 

Query: 889 RSPLKEGLKYSFPGERPSHSLSRD 912 

+P KE Y+ P E +++ + + 
Sbjct: 479 YAPTKE-TTYA-PTEETTYASTEE 500 


Pedant information for DKFZphtes3_8gll, frame 2 


Report for DKFZphtes3_8gll . 2 


[LENGTH] 

(MWJ 

tplj 

[PROSITEJ 

(KW3 

(KW] 


954 

110063.05 

11.40 

ATP_GTP_A 

Irregular 

LOW COMPLEXITY 


27.67 % 


SEQ ESSLSIFYDREDLVPMEESEDSQSDSQTRISESQHSLKPNYLSQAKTDFSEQFQLLEDLQ 

SEG xxxxxxxxxxx 

PRD ccceeeccccccccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhh 

SEQ LKIAAKLLRSQIPPDVPPPLASGLVLKYPICLQCGRCSGLNCHHKLQTTSGPYLLIYPQL 

PRD hhhhhhhhhhcccccccccccceeeeecceeecccccccccccccccccccceeeehhhh 

SEQ HLVRTPEGHGEVRLHLGFRLRIGKRSQISKYRERDRPVIRRSPISPSQRKAKIYTQASKS 

SEG 

PRD hcccccccccceeecccceeeccccccccccccccceeeeeccccccchhhhhhhccccc 

SEQ PTSTIDLQSGPSQSPAPVQVYIRRGQRSRPDLVEKTKTRAPGHYEFTQVHNLPESDSEST 
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SEG 

PRO ccccccccccccccccceeeeeeeccccccchhhhhhcccccceeeeeecccccccccch 

SEQ QNEKRAKVRTKKTSDSKYPMKRITKRLRKHRKFYTNSRTTIESPSRELAAHLRRKRIGAT 

SEG 

PRD hhhhhhhhhhccccccccccchhhhhhhhhhhccccccccccccchhhhhhhhhhhhhcc 

SEQ QTSTASLKRQPKKPSQPKFMQLLFQSLKRAFQTAHRVIASVGRKPVDGTRPDNLWASKNY 

SEG 

PRD ccchhhhhccccccccchhhhhhhhhhhhhhhhhhhhhhccccccccccccccccccccc 

SEQ YPKQNARDYCLPSSIKRDKRSADKLTPAGSTIKQEDILWGGTVQCRSAQQPRRAYSFQPR 

SEG 

PRD cccccccccccccccccccccccccccccccccccceeeccccccccccccccccccccc 

SEQ PLRLPKPTDSQSGIAFQTASVGQPLRTVQKDSSSRSKKNFYRNETSSQESKNLSTPGTRV 

SEG 

PRD ccccccccccccceeeecccccccceeeeeccccccccccccccccccccccccccccee 

SEQ QARGRILPGSPVKRTWHRHLKDKLTHKEHNHPSFYRERTPRGPSERTRHNPSWRNHRSPS 

SEG xxxxx 

PRD eeecccccccccccccccccccccccccccccceeeeccccccccccccccccccccccc 

SEQ ERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSWRSPSQRNHCSPPERSCHSLSERGL 

SEG xxxxxxxxxxxxxx xxxxxxxxxxxx 

PRD chhhhhhhhhhccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ HSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPSERRHRSPSQRSHRGPSERSHCSPS 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ ERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSHRSPSERSHHSPSERSHHSPSERRH 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ HSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPSERRRHSPL 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ ERSRHSLLERSHRSPSERRSHRSFERSHRRISERSHSPSEKSHLSPLERSRCSPSERRGH 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhccccccccchhhhhhhhhhhhhccccccccccccccccccccccccccc 

SEQ SSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPSGRNHCSPSE 

SEG xxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ RSRRSPLKEGLKYSFPGERPSHSLSRDFKNQTTLLGTTHKNPKAGQVWRPEATR 

SEG 

PRD ccccccccccceeecccccccccccccccccccccccccccccccccccccccc 

Prosite for DKFZphtes3_8gll .2 
PS00017 839->847 ATP_GTP__A PDOC00017 

(No Pfam data available for DKFZphtes3_8gll .2} 
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DKFZphtes3_8g5 


group: testes derived 

protein^^^-^^^ encodes a novel 544 amino acid protein nearly identical to human KIAA087 
The novel protein is a new splice variant of KIAA087. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

genes^" P^/*^®^" application in studying the expression profile of testis-specific 

KIAA087, alternative spliced 

complete cDNA, complete cds, EST hits 

Sequenced by MediGenomix 

Locus : unknown 

Insert length: 2762 bp 

No poly A stretch found, no polyadenylation signal found 

1 CCGACATCGG CCGTGTCTCC AGCACCTGCC GGCGGCTGCG CGAGCTGTGC 
51 CAGAGCAGCG GGAAGGTGTG GAAGGAGCAG TTCCGGGTGA GGTGACCTTC 
101 CCTTATGAAA CACTACAGCC CCACCGACTA CGTCAATTGG TTGGAAGAGT 
151 ATAAAGTTCG GCAAAAAGCT GGGTTAGAAG CGCGGAAGAT TGTAGCCTCG 
201 TTCTCAAAGA GGTTCTTTTC AGAGCACGTT CCTTGTAATG GCTTCAGTGA 
251 CATTGAGAAC CTTGAAGGAC CAGAGATTTT TTTTGAGGAT GAACTGGTGT 
301 GTATCCTAAA TATGGAAGGA AGAAAAGCTT TGACCTGGAA ATACTACGCA 
351 AAAAAAATTC TTTACTACCT GCGGCAACAG AAGATCTTAA ATAATCTTAA 
401 GGCCTTTCTT CAGCAGCCAG ATGACTATGA GTCGTATCTT GAAGGTGCTG 
451 TATATATTGA CCAGTACTGC AATCCTCTCT CCGACATCAG CCTCAAAGAC 
501 ATCCAGGCCC AAATTGACAG CATCGTGGAG CTTGTTTGCA AAACCCTTCG 
551 GGGCATAAAC AGTCGCCACC CCAGCTTGGC CTTCAAGGCA GGTGAATCAT 
601 CCATGATAAT GGAAATAGAA CTCCAGAGCC AGGTGCTGGA TGCCATGAAC 
651 TATGTCCTTT ACGACCAACT GAAGTTCAAG GGGAATCGAA TGGATTACTA 
701 TAATGCCCTC AACTTATATA TGCATCAGGT TTTGATTCGC AGAACAGGAA 
751 TCCCAATCAG CATGTCTCTG CTCTATTTGA CAATTGCTCG GCAGTTGGGA 
801 GTCCCACTGG AGCCTGTCAA CTTCCCAAGT CACTTCTTAT TAAGGTGGTG 
851 CCAAGGCGCA GAAGGGGCGA CCCTGGACAT CTTTGACTAC ATCTACATAG 
901 ATGCTTTTGG GAAAGGCAAG CAGCTGACAG TGAAAGAATG CGAGTACTTG 
951 ATCGGCCAGC ACGTGACTGC AGCACTGTAT GGGGTGGTCA ATGTCAAGAA 
1001 GGTGTTACAG AGAATGGTGG GAAACCTGTT AAGCCTGGGG AAGCGGGAAG 
1051 GCATCGACCA GTCATACCAG CTCCTGAGAG ACTCGCTGGA TCTCTATCTG 
1101 GCAATGTACC CGGACCAGGT GCAGCTTCTC CTCCTCCAAG CCAGGCTTTA 
1151 CTTCCACCTG GGAATCTGGC CAGAGAAGTC TTTCTGTCTT GTTTTGAAGG 
1201 TGCTTGACAT CCTCCAGCAC ATCCAAACCC TAGACCCGGG GCAGCACGGG 
1251 GCGGTGGGCT ACCTGGTGCA GCACACTCTA GAGCACATTG AGCGCAAAAA 
1301 GGAGGAGGTG GGCGTAGAGG TGAAGCTGCG CTCCGATGAG AAGCACAGAG 
1351 ATGTCTGCTA CTCCATCGGG CTCATTATGA AGCATAAGAG GT^TGGCTAT 
1401 AACTGTGTGA TCTACGGCTG GGACCCCACC TGCATGATGG GACACGAGTG 
1451 GATCCGGAAC ATGAACGTCC ACAGCCTGCC GCACGGCCAC CACCAGCCTT 
1501 TCTATAACGT GCTGGTGGAG GACGGCTCCT GTCGATACGC AGCCCAAGAA 
1551 AACTTGGAAT ATAACGTGGA GCCTCAAGAA ATCTCACACC CTGACGTGGG 
1601 ACGCTATTTC TCAGAGTTTA CTGGCACTCA CTACATCCCA AACGCAGAGC 
1651 TGGAGATCCG GTATCCAGAA GATCTGGAGT TTGTCTATGA AACGGTGCAG 
1701 AATATTTACA GTGCAAAGAA AGAGAACATA GATGAGTAAA GTCTAGAGAG 
1751 GACATTGCAC CTTTGCTGCT GCTGCTATCT TCCAAGAGAA CGGGACTCCG 
1801 GAAGAAGACG TCTCCACGGA GCCCTCGGGA CCTGCTGCAC CAGGAAAGCC 
1851 ACTCCACCAG TAGTGCTGGT TGCCTCCTAC TAAGTTTAAA TACCGTGTGC 
1901 TCTTCCCCAG CTGCAAAGAC AATGTTGCTC TCCGCCTACA CTAGTGAATT 
1951 AATCTGAAAG GCACTGTGTC AGTGGCATGG CTTGTATGCT TGTCCTGTGG 
2001 TGACAGTTTG TGACATTCTG TCTTCATGAG GTCTCACAGT CGACGCTCCT 
2051 GTAATCATTC TTTGTATTCA CTCCATTCCC CTGTCTGTCT GCATTTGTCT 
2101 CAGAACATTT CCTTGGCTGG ACAGATGGGG TTATGCATTT GCAATAATTT 
2151 CCTTCTGATT TCTCTGTGGA ACGTGTTCGG TCCCGAGTGA GGACTGTGTG 
2201 TCTTTTTACC CTGAAGTTAG TTGCATATTC AGAGGTAAAG TTGTGTGCTA 
2251 TCTTGGCAGC ATCTTAGAGA TGGAGACATT AACAAGCTAA TGGTAATTAG 
2301 AATCATTTGA ATTTATTTTT TTCTAATATG TGAAACACAG ATTTCAAGTG 
2351 TTTTATCTTT TTTTTTTTTA AATTTAAATG GGAATATAAC ACAGTTTTCC 
2401 CTTCCATATT CCTCTCTTGA GTTTATGCAC ATCTCTATAA ATCATTAGTT 
2451 TTCTATTTTA TTACATAAAA TTCTTTTAGA AAATGCAAAT AGTGAACTTT 
2501 GTGAATGGAT TTTTCCATAC TCATCTACAA TTCCTCCATT TTAAATGACT 
2551 ACTTTTATTT TTTAATTTAA AAAATCTACT TCAGTATCAT GAGTAGGTCT 
2601 TACATCAGTG ATGGGTTCTT TTTGTAGTGA GACATACAAA TCTGATGTTA 
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2651 ATGTTTGCTC TTAGAAGTCA TACTCCATGG TCTTCAAAGA CCAAAAAATG 
2701 AGGTTTTGCT TTTGTAATCA GGAAAAAAAA AATTAATGAA CCTTAAAAAA 
2751 AAAAAAAAAA GG 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 3 


ORF from 105 bp to 1736 bp; peptide length: 544 
Category: known protein 
Classification: unclassified 


1 MKHYSPTDYV NWLEEYKVRQ KAGLEARKIV ASFSKRFFSE HVPCNGFSDI 
51 ENLEGPEIFF EDELVCILNM EGRKALTWKY YAKKXLYYLR QQKILNNLKA 
101 FLQQPDDYES YLEGAVYIDQ YCNPLSDISL KDIQAQIDSI VELVCKTLRG 
151 INSRHPSLAF KAGESSMIME lELQSQVLDA MNYVLYDQLK FKGNRMDYYN 
201 ALNLYMHQVL IRRTGIPISM SLLYLTIARQ LGVPLEPVNF PSHFLLRWCQ 
251 GAEGATLDIF DYIYIDAFGK GKQLTVKECE YLIGQHVTAA LYGVVNVKKV 
301 LQRMVGNLLS LGKREGIDQS YQLLRDSLDL YLAMYPDQVQ LLLLQARLYF 
351 HLGIWPEKSF CLVLKVLDIL QHIQTLDPGQ HGAVGYLVQK TLEHIERKKE 
401 EVGVEVKLRS DEKHRDVCYS IGUMKHKRY GYNCVIYGWD PTCMMGHEWI 
4 51 RNMNVHSLPH GHHQPFYNVL VEDGSCRYAA QENLEYNVEP QEISHPDVGR 
501 YFSEFTGTHY IPNAELEIRY PEDLEFVYET VQNIYSAKKE NIDE 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_8g5, frame 3 

TREMBLNEW:AB020682_1 gene: "KIAA0875"; product: "KIAA0875 protein"; 
Homo sapiens raRNA for KIAA0875 protein/ partial cds., N = 1, Score - 
2832, P = 5.5e-295 


>TREMBLNEW:ABO20682_l gene: "KIAA0875"; product: "KIAA0875 protein"; Homo 
sapiens mRNA for KIAA0875 protein, partial cds. 
Length = 621 

HSPs: 

Score - 2832 (424.9 bits). Expect = 5.5e-295, P = 5.5e-295 
Identities = 537/544 (98%), Positives = 537/544 (98%) 

Query : 1 MKHYSPTDYVNWLEEYKVRQKAGLEARKIVASFSKRFFSEHVPCNGFSDIENLEGPEIFF 

MKHYSPTDYVNWLEEYKVRQKAGLEARKIVASFSKRFFSEHVPCNGFSDI ENLEGPEIFF 
Sb jet : 85 MKHYSPTDYVNMLEEYKVRQKAGLEARKIVASFSKRFFSEHVPCNGFSDIENLEGPEIFF 

Query : 61 EDELVCILNMEGRKALTWKYYAKKILYYLRQQKILNNLKAFLQQPDDYESYLEGAVYIDQ 

EDELVCILNMEGRKALTWKYYAKKILYYLRQQKILNNLKAFLQQPDDYESYLEGAVYIDQ 
Sbjct : 145 EDELVCILNMEGRKALTWKYYAKKILYYLRQQKILNNLKAFLQQPDDYESYLEGAVYIDQ 

Query: 121 ycnplsdislkdiqaqidsivelvcktlrginsrhpslafkagessmimeielqsqvlda 

YCNPLSDISLKDIQAQIDSIVELVCKTLRGINSRHPSLAFKAGESSMIMEIELQSQVLDA 
Sbjct : 205 YCNPLSDISLRDIQAQIDSIVELVCKTLRGINSRHPSLAFKAGESSMIMEIELQSQVLDA 

Query : 181 MNYVLYDQLKFKGNRMDYYNALNLYMHQVLIRRTGIPISMSLLYLT I ARQLGVPLEPVNF 
MNYVLYDQLKFKGNRMDYYNALNLYMHQVLIRRTGIPISMSLLYLTIARQLGVPLEPVNF 
Sbjct : 265 MNYVLYDQLKFKGNRMDYYNALNLYMHQVLIRRTGIPISMSLLYLTIARQLGVPLEPVNF 

Query: 241 PSHFLLRWCQGAEGATLDIFDYIYIDAFGKGKQLTVKECEYLIGQHVTAALYGWNVKKV 
PSHFLLRWCQGAEGATLDI FDYI YI DAFGKGKQLTVKECE YLI GQHVT AALYG VVNVKKV 
Sb j C t : 325 PSHFLLRWCQGAEGATLDI FDYI YI DAFGKGKQLTVKECEYLIGQHVTAAL YG VVNVKKV 

Query: 301 LQRMVGNLLSLGKREGIDQSYQLLRDSLDLYLAMYPDQVQLLLLQARLYFHLGIWPEKSF 
LQRMVGNLLSLGKREGIDQSYQLLRDSLDLYLAMYPDQVQLLLLQARLYFHLGIWPEK 


60 
144 
120 
204 
180 
264 
240 
324 
300 
384 
360 
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Sbjct: 385 LQRMVGNLLSLGKREGIDQSYQLLRDSLDLYLAMYPDQVQLLLLQARLYFHLGIWPEK" 442 

Query : 361 clvlkvldilqhiqtldpgqhgavgylvqhtlehierkkeevgvevklrsdekhrdv 420 

«?h-in^ ^^^^^Q"IQTLDPGQHGAVGYLVQHTLEHIERKKEEVGVEVKLRSDEKHRDVCYS 
Sb3Ct: 443 VLDILQHIQTLDPGQHGAVGYLVQHTLEHIERKKEEVGVEVKLRSDEKHRDVCYS 497 


Query: 421 IGLIMKHKRYGYNCVIYGWDPTCMMGHEWIRNMNVHSLPHGHHQPFYNVLVEDGSCRYAA 480 

.no ^^^^"'^"^^^g^ncviygwdptcmmghewirnmnvhslphghhqpfynvlvetcsS^ 

Sbjct: 498 IGLIMKHKRYGYNCVIYGWDPTCMMGHEWIRNMNVHSLPHGHHQPFYNVLVeS^S^^^ 557 

Sbjct: 558 


Query: 481 QENLEYNVEPQEISHPDVGRYFSEFTGTHYIPNAELEIRYPEDLEFVYETVQNIYSAKKE 540 

..o Q^^'^^^^nvepqeishpdvgryfseftgthyipnaeleirypedlefvye™ 

Sb.ct: 558 QENLEYNVEPQEISHPDVGRYFSEFTGTHYIPNAELEIRYPEDLEFVYBTVQN^Y^^^^ 617 
Query: 541 NIDE 544 
NIDE 

Sbjct: 618 NIDE 621 

Pedant information for DKF2phtes3_8g5, frame 3 
Report for DKFZphtes3_8g5 . 3 

[LENGTH J 544 

IMW] 63307.22 

(plj 5.82 

^Vor ^>^s^i^'t^^^ -fr'^''' 

fKWJ Alpha_Beta 

(KW] LOW COMPLEXITY 1.84 % 

SEQ MKHYSPTDYVNWLEEYKVRQKAGLEARKIVASFSKRFFSEHVPCNGFSDIENLEGPEIFF 

PRO cccccccccchhhhhhhhhhhhhchhhhhhhhhhhhhhhcrc^ 

SEQ EDELVCILNMEGRKALTWKYYAKKILYYLRQQKILNNLKAFLQQPDDYESYLEGAVYIDQ 

PRO eeeeeeeeeeccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhccc^ 

SEQ ycnplsdislkdiqaqidsivelvcktlrginsrhpslafkagessmimeielqsqvlda 

PRD ccccccccchhhhhhhhhhhhhhhhhhcccccccccceeeecccchh^^ 

SEQ mnyvlydqlkfkgnrmdyynalnlymhqvlirrtgipismsllyltiarqlgvplepvnf 

PRD hhhhhccccccccccchhhhhhhhhhhhhhhhhcc^ 

SEQ pshfllrwcqgaegatldifdyiyidafgkgkqltvkeceyligqhvtaalygwnvkkv 

PRD <=ceeeeeeccccccceeeeeeeeeeeccccceeeee4hhhhhhhhhhhh^^ 

SEQ lqrmvgnllslgkregidqsyqllrdsldlylamypdqvqllllqarlyfhlgiwpeksf 

PRD hhhhhccchhhhhhhhccccccchhhhhhhhhhhccchhhhhhhh^ 

clvlkvldilqhiqtldpgqhgavgylvqhtlehierkkeevgvevklrsdekhrdvcys 


SEQ 
SEG 
PRD 


ehhhhhhhhhhhhhccccccccchhhhhhh^ 

iglimkhkrygyncviygwdptcmmghewirnmnvhslphghhqpfynvlvedgscryaa 
cccchhhhhhhceeeeecccccccchhhhhhhhhhhcccccccccccee^ 
SEQ Qenleynvepqeishpdvgryfseftgthyipnaeleirypedlefvyetvqniysakke 
hhhhhhhhcccccccccceeeeccccccccccchhhhhhccchhhhhhhhhhh^ 


SEQ 
SEG 
PRD 


PRD 


SEQ NIDE 

SEG 

PRD CCCC 


(No Prosite data available for 0KFZphtes3_8g5 . 3) 
(No Pfam data available for DKFZphtes3_8g5.3) 
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DKrZphtes3_8mlO 


group: nucleic acid management 

DKFZphtes3_8mlO encodes a novel 221 amino acid protein with strong similarity to 
polyadenylate-bindlng proteins . 

The poly (A) -binding protein (PABP) binds to the messenger (mRNA) 3 '-poly (A) tail found on most 
eukaryotic mRNAs and together with the poly (A) tail has been implicated in governing the 

stability and the translation of mRNA- 

The new protein can find application in modulation of mRNA translation and 
processing/stability. 


strong similarity to polyadenylate-binding protein 

frame shift at Bp 707-710 

Sequenced by MedlGenomix 

Locus : unknown 

Insert length: 2107 bp 

Poly A stretch at pos. 2052, polyadenylation signal at pos. 2033 


1 CGGAAAGGTC GCGGCTTGTG TGCCTGCGGG CAGCCGTGCC GAGAATGAAC 
51 CCCAGCACCC CCAGCTACCC AACGGCCTCG CTCTACGTGG GGGACCTCCA 
101 CCCCGACGTG ACTGAGGCGA TGCTCTACGA GAAGTTCAGC CCGGCAGGGC 
151 CCATCCTCTC CATCCGGATC TGCAGGGACT TGATCACCAG CGGCTCCTCC 
201 AACTACGCGT ATGTGAACTT CCAGCATACG AAGGACGCGG AGCATGCTCT 
251 GGACACCATG AATTTTGATG TTATAAAGGG CAAGCCAGTA CGCATCATGT 
301 GGTCTCAGCG TGATCCATCA CTTCGAAAAA GTGGAGTGGG CAACATATTC 
351 GTTAAAAATC TGGATAAGTC CATTAATAAT AAAGCACTGT ATGATACAGT 
401 TTCTGCTTTT GGTAACATCC TTTCGTGTAA CGTGGTTTGT GATGAAAATG 
451 GTTCCAAGGG TTATGGATTT GTACACTTTG AGACACACGA AGCAGCTGAA 
501 AGAGCTATTA AAAAAATGAA CGGAATGCTC CTAAATGGTC GCAAAGTATT 
551 TGTTGGACAA TTTAAGTCTC GTAAAGAACG AGAAGCTGAA CTTGGAGCTA 
601 GGGCAAAAGA GTTCCCCAAT GTTTACATCA AGAATTTTGG AGAAGACATG 
651 GATGATGAGC GCCTTAAGGA TCTCTTTGGC AAGTTCGGGC CCGCCTTAAG 
701 TGTGAATTAA TGACCGATGA AAGTGGAAAA TCCAAAGGAT TTGGATTTGT 
751 AAGCTTTGAA AGGCATGAAG ATGCACAGAA AGCTGTAGAT GAGATGAATG 
801 GAAAGGAGCT CAATGGAAAA CAAATTTACG TTGGTCGAGC TCAGAAAAAA 
851 GTGGAACGGC AGACGGAACT TAAGCGCACA TTTGAACAGA TGAAGCAAGA 
901 TAGGATCACC AGATACCAGG TTGTTAATCT TTATGTGAAA AATCTTGATG 
951 ATGGTATTGA TGATGAACGT CTCCGGAAAG CGTTTTCTCC ATTTGGTACA 
1001 ATCACTAGTG CAAAGGTTAT GATGGAAGGT GGTCGCAGCA AAGGGTTTGG 
1051 TTTTGTATGT TTCTCCTCCC CAGAAGAAGC CACTAAAGCA GTTACAGAAA 
1101 TGAACGGTAG AATTGTGGCC ACAAAGCCAT TGTATGTAGC TTTAGCTCAG 
1151 CGCAAAGAAG AGCGCCAGGC TTACCTCACT AACGAGTATA TGCAGAGAAT 
1201 GGCAAGTGTA CGAGCTGTGC CCAACCAGCG AGCACCTCCT TCAGGTTACT 
1251 TCATGACAGC TGTCCCACAG ACTCAGAACC ATGCTGCATA CTATCCTCCT 
1301 AGCCAAATTG CTCGACTAAG ACCAAGTCCT CGCTGGACTG CTCAGGGTGC 
1351 CAGACCTCAT CCATTCCAAA ATAAGCCCAG TGCTATCCGC CCAGGTGCTC 
1401 CTAGAGTACC ATTTAGTACT ATGAGACCAG CTTCTTCACA GGTTCCACGA 
1451 GTCATGTCAA CGCAGCGTGT TGCTAACACA TCAACACAGA CAGTGGGTCC 
1501 ACGTCCTGCA GCTGCTGCTG CTGCTGCAGC TACCCCTGCT GTGCGCACGG 
1551 TTCCACGGTA TAAATATGCT GCGGGAGTTC GCAATCCTCA GCAACATCGT 
1601 AATGCACAGC CACAAGTTAC AATGCAACAG CTTGCTGTTC ATGTACAAGG 
1651 TCAGGAAACT TTGACTGCCT CCAGGTTGGC ATCTGCCCCT CCTCAAAAGC 
1701 AAAAGCAAAT GTTAGGTGAA CGGCTCTTTC CTCTTATTCA AGCCATGCAC 
1751 CCTACTCTTG CTGGGAAAAT CACTGGCATG TTGTTGGAGA TTGATAATTC 
1801 AGAACTTCTT TATATGCTCG AGTCTCCAGA GTCACTCCGT TCTAAGGTTG 
1851 ATGAAGCTGT AGCTGTACTA CAAGCCCACC AAGCTAAAGA GGCTACCCAG 
1901 AAAGCAGTTA ACAGTGCTAC CGGTGTTCCA ACTGTTTAAA ATTGATCAGA 
1951 GACCACGAAA AGAAATTTGT GCTTCACCGA AGAAAAATAT CTAAACATCG 
2001 AGAAACTATG GGAAAAAAAA TTGCAAAATC TAAAATAAAA AATGCAAAAT 
2051 CTAAAATAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
2101 AAAAAGG 


BLAST Results 


Entry HSPOLYAB from database EMBL: 

Human mRNA for polyA binding protein 

Score » 5420, P » O.Oe+00, identities » 1162/1243 
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Medline entries 

No Medline entry 

Peptide information for frame 2 


ORF from 707 bp to 1936 bp; peptide length: 410 
Category: strong similarity to known protein 
Classification: unset 
Prosite motifs: RNP 1 (10-18) 
RNP 1 (112-120) 


1 LMTDESGKSK GFGFVSFERH EDAQKAVDEM NGKELNGKQI YVGRAQKKVE 
51 RQTELKRTFE QMKQDRITRY QWNLYVKNL DDGIDDERLR KAFSPFGTIT 
101 SAKVMMEGGR SKGFGFVCFS SPEEATKAVT EMNGRIVATK PLYVALAQRK 
151 EERQAYLTNE YMQRMASVRA VPNQRAPPSG YFMTAVPQTQ NHAAYYPPSQ 
201 lARLRPSPRW TAQGARPHPF QNKPSAIRPG APRVPFSTMR PASSQVPRVM 
251 STQRVANTST QTVGPRPATyV AAAAATPAVR TVPRYKYAAG VRNPQQHRNA 
301 QPQVTMQQLA VHVQGQETLT ASRLASAPPQ KQKQMLGERL FPLIQAMHPT 
351 LAGKITGMLL EIDNSELLYM LESPESLRSK VDEAVAVLQA HQAKEATOKA 
401 VNSATGVPTV 

BLAST? hits 

No BLASTP hits available 

Alert BLASTP hits for DKF2phtes3_8ml0, frame 2 
PlR:DNHUPA^polyadenylate-binding protein - human, N = 1, Score = 1931, 

PIR: 148718 poly (A) binding protein - mouse, N = 1, Score « 1928 P « 
3.6e-199 ' 

>PIR:DNHOPA polyadenylate-binding protein - human 
Length « 633 

HSPs: 

Score = 1931 (289.7 bits). Expect = 1.7e-199, P = 1.7e-199 
Identities = 384/415 (92%), Positives « 394/415 (94%) 

Query: 1 LMTDESGKSKGFGFVSFERHEDAQKAVDEMNGKELNGKQIYVGRAQKKVERCyTELKRTFE 60 

+MTDESGKSKGFGFVSFERHEDAQKAVDEMNGKELNGKQIYVGRAQKKVERQTELKR FE 
SbDCt: 219 VMTDESGKSKGFGFVSFERHEDAQKAVDEMNGKELNGKQIYVGRAQKKVERQTELKRKFE 278 

Query: 61 QMKQDRITRYQWNLYVKNLDDGIDDERLRKAFSPFGTITSAKVMMEGGRSKGFGFVCFS 120 

QMKQDRITRYQ VNLYVKNLDDGIDDERLRK FSPFGTITSAKVMMEGGRSKGFGFVCFS 
SbDCt: 279 QMKQORITRYQGVNLYVKNLDDGIDDERLRKEFSPFGTITSAKVMMEGGRSKGFGFVCFS 338 

Query: 121 SPEEATKAVTEMNGRIVATKPLYVALAQRKEERQAYLTNEYMQRMASVRAVPN Q 174 

SPEEATKAVTEMNGRIVATKPLYVALAQRKEERQA+LTN+YMQRMASVRAVPN q 
Sbjct: 339 SPEEATKAVT EMNGRIVATKPLYVALAQRKEERQAHLTNQYMQRMASVRAVPNPVINPYQ 398 

Query: 175 RAPPSGYFMTAVPQTQNHAAYYPPSQIARLRPSPRWTAQGARPHPFQNKPSAIRPGAPRV 234 

APPSGYFM A+PQTQN AAYYPPSQ+A+LRPSPRWTAQGARPHPFQN P AIRP APR 
Sbjct: 399 PAPPSGYFMAAIPQTQNRAAYYPPSQVAQLRPSPRWTAQGARPHPFQNMPGAIRPAAPRP 458 

Query: 235 PFSTMRPASSQVPRVMSTQRVANTSTQTVGPRPAAAAAAAATPAVRTVPRYKYAAGVRNP 294 

PFSTMRPASSQVPRVMSTQRVANTSTQT+GPRPAAAAAAA TPAVRTVP+YKYAAGVRNP 
Sbjct: 459 PFSTMRPASSQVPRVMSTQRVANTSTQTMGPRPAAAAAAA-TPAVRTVPQYKYAAGVRNP 517 

Query: 295 QQHRNAQPQVTMQQLAVHVQGQETLTASRLASAPPQKQKQMLGERLFPLIQAMHPTLAGK 354 

QQH NAQPQVTMQQ AVHVQGQE LTAS LASAPPQ+QKQMLGERLFPLIQAMHPTLAGK 
Sbjct: 518 QQHLNAQPQVTMQQPAVHVQGQEPLTASMLASAPPQEQKQMLGERLFPLIQAMHPTLAGK 577 

Query: 355 ITGMLLEIDNSELLYMLESPESLRSKVDEAVAVLQAHQAKEATQKAVNSATGVPTV 410 

ITGMLLEIDNSELL+MLESPESLRSKVDEAVAVLQAHQAKEA QKAVNSATGVPTV 
Sbjct: 578 ITGMLLEIDNSELLHMLESPESLRSKVDEAVAVLQAHQAKEAAQKAVNSATGVPTV 633 

Score = 315 (47.3 bits). Expect = 1.9e-27, p 1.9e-27 
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Identities = 71/163 (43%), Positives « 102/163 (62%) 


Query: 

1 

LMTDESGKSKGFGFVSFERHEDAQKAVDEMNGKELNGKQIYVGRAQKKVERQTELKRTFE 

60 



++ DE+G SKG+GFV FE E A++A+++MNG LN ++++VGR + + ER+ EL + 


Sbjct: 

130 

WCDENG-SKGYGFVHFETQEAAERAIEKMNGMLLNDRKVFVGRFKSRKEREAELGARAK 

188 

Query: 

61 

QMKQDRITRYQVVNLYVKNLDDGIDDERLRKAFSPFGTITSAKVMM-EGGRSKGFGFVCF 

119 



+ N+Y+KN + +DDERL+ F P S KVM E G+SKGFGFV F 


Sbjct: 

189 

EF TNVYIKNFGEDMDDERLKDLFGP— ALSVKVMTDESGKSKGFGFVSF 

235 

Query: 

120 

SSPEEATKAVTEMNGRI VATKPLYVALAQRKEERQAYLTNEYMQ 1 63 




E+A KAV EMNG+ + K +YV AQ+K ERQ L ++ Q 


Sbjct: 

236 

ERHEDAQKAVDEMNGKELNGKQI YVGRAQKKVERQTELKRK FEQ 279 


Score 

= 214 

(32.1 bits). Expect = 1.9e-14, P = 1.9e-14 


Identities ■= 50/150 (33%), Positives = 87/150 (58%) 


Query: 

8 

KSKGFGFVSFERHEDAQKAVDEMNGKELNGKQIYVGRAQKKVERQTELKRTFEQMKQDRI 

67 



+S G+ +V+F++ DA++A+D MN + GK + + +Q R L+++ 


Sbjct: 

50 

RSLGYAYVNFQQPADAERALDTMNFDVIKGKPVRIMWSQ RDPSLRKS 

96 

Query: 

68 

TRYQVVNLYVKNLDDGIDDERLRKAFSPFGTITSAKVMMEGGRSKGFGFVCFSSPEEATK 

127 



V N+++KNLD ID++ L FS FG I S KV+ + SKG+GFV F + E A + 


Sbjct: 

97 

GVGNIFIKNLDKSIDNKALYDTFSAFGNILSCKVVCDENGSKGYGFVHFETQEAAER 

153 

Query: 

128 

AVTEMNGRI VATKPLYVALAQRKEERQAYL 157 




A+ +MNG ++ * ++V + ++ER+A L 


Sbjct: 

154 

AIEKMNGMLLNDRKVFVGRFKSRKEREAEL 183 


Score 

= 120 

(18.0 bits). Expect = 4.8e-04, P = 4.8e-04 


Identities - 

30/99 (30%), Positives = 54/99 (54%) 


Query: 

70 

YQWNLYVKNLDDGI DDERLRKAFS PFGTITSAKVM— MEGGRSKGFGFVCFSS PEEATK 

127 



Y + +LYV +L + + L + FSP G I S +V M RS G+ +V F P +A + 


Sbjct: 

8 

YPMASLYVGDLHPDVTEAMLYEKFS PAGPILS I RVCRDMI TRRSLGYAYVNFQQPADAER 

67 

Query: 

128 

AVTEMNGRI VATKPLYVALAQRKEE-RQAYLTNEYMQRM 1 65 




A+ MN ++ KP+ + +QR R++ + N +++ + 


Sbjct: 

68 

ALDTMNFDVIKGKPVRIMWSQRDPSLRKSGVGNIFIKNL 106 




Peptide information for frame 3 



ORF from 45 bp to 707 bp; peptide length: 221 
Category: strong similarity to known protein 
Classification: unset 
Prosite motifs: RNP_1 (138-146) 


1 MNPSTPSYPT ASLYVGDLHP DVTEAMLYEK FSPAGPILSI RICRDLITSG 

51 SSNYAYVNFQ HTKDAEHALD TMNFDVIKGK PVRIMWSQRD PSLRKSGVGN 

101 IFVKNLDKSI NNKALYDTVS AFGNILSCNV VCDENGSKGY GFVHFETHEA 

151 AERAIKKMNG MLLNGRKVFV GQFKSRKERE AELGARAKEF PNVYIKNFGE 

201 DMDDERLKDL FGKFGPALSV N 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_8ml0, frame 3 

SWISSPR0T:PAB1_HUMAN POLYADENYLATE-BINDING PROTEIN 1 (POLY (A) BINDING 
PROTEIN 1) (PABP 1),, N = 1, Score = 1039, P = 5.7e-105 

PIR: 148718 poly (A) binding protein - mouse, N = 1, Score =» 1031, P « 
4e-104 

PIR:DNHUPA pel yadenylate -binding protein - human, N = 1, Score « 1009, 
P - 8.7e-l02 


>SWISSPR0T:PAB1_HUMAN POLYADENYLATE-BINDING PROTEIN 1 (POLY (A) BINDING 
PROTEIN 1) (PABP 1) . 
Length - 636 

HSPs: 
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Score « 1039 (155.9 bits), Expect = 5.7e-105, P - 5 7e-105 
Identities « 199/220 (90%), Positives = 205/220 (93%) 


Query: 
Sbjct: 
Query; 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 


1 MNPSTPSYPTASLYVGDLHPDVTEAMLYEKFSPAGPILSIRICRDLITSGSSNYAYVNFO 60 

MNPS PSYP ASLYVGDLHPDVTEAMLYEKFSPAGPILSIR+CRD+IT S YAYVNFO 
1 MNPSAPSYPMASLYVGDLHPDVTEAMLYEKFSPAGPILSIRVCRDMITRRSLGYAYVNFQ 60 

61 HTKDAEHALDTMNFDVIKGKPVRIMWSQRDPSLRKSGVGNI FVKNLDKSINNKALYDTVS 120 

^1 ^r..^^^ ALDTMNFDVIKGKPVRIMWSQRDPSLRKSGVGNIF+KNLDKSI+NKALYDT S 

61 QPADAERALDTMNFDVIKGKPVRIMWSQRDPSLRKSGVGNIFIKNLDKSIDNKALYDTFS 120 

^^Sfii^^^^'^V^^ENGSKGYGFVHFETHEAAERAIKKMNGMLLNGRKVFVGQFK^ 180 
101 WCDENGSKGYGFVHFET EAAERAI+KMNGMLLN RKVFVG-f FKSRKERE 

121 AFGNILSCKVVCDENGSKGYGFVHFETQEAAERAIEKMNGMLLNDRKVFVGRFKSRKERE 180 

181 AELGARAKEFPNVYIKNFGEDMDDERLKDLFGKFGPALSV 220 

AELGARAKEF NVYZKNFGEDMDDERLKDLFGKFGPALSV 
181 AELGARAKEETNVYIKNFGEDMDDERLKDLFGKFGPALSV 220 


Score = 275 (41.3 bits). Expect - 4.1e-23, P = 4,le-23 
Identxties - 71/233 (30%), Positives - 120/233 (51%) 


Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct : 


2 NPSTPSYPTASLYVGDLHPDVTEAMLYEKFSPAGPILSIRICRDLITSGSSNYAYVNFOH 61 
+PS ++++ +L + LY+ FS G XLS ++ D S + + O 

90 DPSLRKSGVGNIFIKNLDKSIDNKALYDTFSAFGNILSCKVVCDENGSKGYGFVHFETQE 149 

62 TKD-AEHALDTJWFDVIKGKPVRIMW-SQRDPSL-.RKSGVGNIFVKN^^^ 117 

150 AAERAIEKMNGMLLNDRKVFVGRFKSRKERkELGALKEFTNVYIKNFGEDM^ 209 

118 TVSAFGNILSCNVVCDENG-SKGYGFVHFETHEAAERAIKKMNGMLLNGRKVFVGQFKSR 176 
o.n ^^""^ SKG+GFV FE HE A++A+ +MNG LNG++++VG-f + + 

210 LFGKFGPALSVKVMTDESGKSKGFGFV^SFERHEDAQKAVDEMNGKELNGKQIYVGRAQKK 269 

177 KEREAELGARAKEFP NVYIKNFGEDMDDERLKDI.FGKFGPALS 219 

* N+Y+KN + +DDERL+ F FG S 

270 VERQTELKRKFEQMKQDRITRYQGVNLYVKNLDDGIDDERLRKEFSPFGTITS 322 


Score = 227 (34.1 bits), Expect = 6.3e-18, P = 6.3e-18 
Identities « 57/187 (30%), Positives = 101/187 (54%) 


. Query: 
Sbjct: 
Query: 
Sbjct; 
Query; 
Sbjct: 
Query: 
Sbjct: 


12 SLYVGDLHPDVTEAMLYEKFSPAGPILSIRICRDLITSGSSNYAYVNFQHTKDAEHALDT 71 
D+ + L + F GP LS+++ D + S + +V+F+ +DA+ A+D 
192 NVYIKNFGEDMDDERLKDLPGKFGPALSVKVMTDE-SGKSKGFGFVSFERHEDAQKAVDE 250 

72 ;jJJ^°^f ^^PVRIMWSQR DPSLRKSGVGNIFVKNLDKSINNKA 114 

Kw -I- t»K + + +Q+ D R GV N++VKNLD T+++ 

251 MNGKELNGKQIYVGRAQKKVERQTELKRKFEQMKQDRITRYQGV-NLYVKNLDDGIDDER 309 

4 

369 


115 I'YDTVSAFGNILSCNVVCDENGSKGYGFVHFETHEAAERAIKKMNGMLLNGRKVFVGQFK 17 

L S FG I S V+ + SKG+GFV F + E A +A+ +MNG ++ + ++v + 
310 LRKEFSPFGTITSAKVMMEGGRSKGFGFVCFSSPEEATKAVTEMNGRIVATKPLYVALAQ 


175 SRKEREAEL 183 

++ER+A L 
370 RKEERQAHL 378 


Score « 100 (15.0 bits). Expect - 2.3e-02, P = 2.3e-02 
Identities - 26/99 (26%), Positives « 53/99 (53%) 

Query: 

Sbjct: 
Query: 
Sbjct: 


® 3^^?^XX^^J'«P^VTEAMLYEKFSPAGPILSIRICRDLITSG-SSNYAYVNFQHTKDAE 66 
Y +LYV +L + + L ++FSP G I S ++ ++ G S + +V F ++A 

291 YQGVNLYVKNLDDGIDDERLRKEFSPFGTITSAKV—MMEGGRSKGFGFVCFSSPEEAT 347 

67 HALDTMNFDVIKGKPVRIMWSQRDPSLRKSGVGNIFVKNL 106 
A+ MN ++ KP+ + +QR R++ + N +++ + 
348 KAVTEMNGRIVATKPLYVALAQRKEE-RQAHLTNQYMQRM 386 


Pedant information for DKr2phtes3_8inlO, frame 2 
Report for DKFZphtes3_8inl0.2 


(LENGTH I 
[MW] 
[pl] 
[HOMOL] 


1) (PABP 1) . 0.0 


409 

45235.68 
10.08 

SWISSPR0T:PAB1_HUMAN POLYADENYLATE-BINDING PROTEIN 1 (POLY (A) BINDING PROTEIN 
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[FUNCAT] 

cerevisiae, 

I FUNCAT 1 

[FUNCAT] 

[FUNCAT} 

YER165wJ le- 

[ FUNCAT 1 

le-15 

[FUNCAT) 

[ FUNCAT J 

[FUNCAT] 

I FUNCAT J 

(FUNCAT J 

[FUNCAT] 

(FUNCAT I ■ 

(FUNCAT) 

(FUNCAT) 

(FUNCAT) . 

(FUNCAT) 

26-05 

(FUNCAT) 

(FUNCAT] 

repair) 

(FUNCAT) 

(BLOCKS] 

[SCOP] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

(PIRKW J 

(PIRKW) 

(PIRKW) 

[PIRKW] 

[PIRKW] 

(SUPFAM) 

(SUPFAMJ 

(SUPFAM) 

(SUPFAM] 

(SUPFAM) 

[PROSITE] 

[PFAM] 

(KW) 

(KW) 

IKW) 


04.05.05 mrna processing (5 '-end, 3 '-end processing and mrna degradation) [S. 
YER165W] le-54 

30.03 organization of cytoplasm [S. cerevisiae, YER165wl le-54 
30.10 nuclear organization (S. cerevisiae, YER165w] le-54 

05.04 translation (initiation, elongation and termination) (S. cerevisiae, 

54 

04.05.99 other mrna-transcription activities [S. cerevisiae, YNL016w) 

11.01 stress response [5, cerevisiae, YGR159c) le-12 

04.01.04 rrna processing (S. cerevisiae, YGRl59c) le-12 

04.99 other transcription activities [S. cerevisiae, YNL175C) 4e-09 

98 classification not yet clear-cut (S. cerevisiae, YPR112c) 5e-08 
03.19 recombination and dna repair (S. cerevisiae, YHR086w) 3e-07 
03.13 meiosis (S. cerevisiae, YHR086w] 3e-07 

04.05.03 mrna processing (splicing) (S. cerevisiae, YHR086w) 3e-07 

04.07 rna transport (S. cerevisiae, YOL123w HRPl - CF lb) 9e-07 

30.13 organization of chromosome structure (S. cerevisiae, YCLOllc] 3e-06 

99 unclassified proteins [S. cerevisiae, YGR250c) 8e-06 

06.04 protein targeting, sorting and translocation [S. cerevisiae, YDR432w) 

08.01 nuclear transport [S, cerevisiae, YDR432w) 2e-05 

11.04 dna repair (direct repair, base excision repair and nucleotide excision 

[S. cerevisiae, YFR023wj 3e-05 

03.01 cell growth [S. cerevisiae, YBR212w) 3e'04 

BL00030B Eukaryotic RNA-binding region RNP-1 proteins 

dlsxl 4.34.7.1.3 Sex-lethal protein ((Drosophila melanogaster) le-17 

nucleus 0.0 

duplication 0.0 

RNA binding 0.0 

nucleolus 2e-09 

tandem repeat 2e-09 

single-stranded DNA binding 3e-06 

DNA binding 5e-13 

phosphoprotein 6e-10 

ribosome 3e-08 

mitochondrion 3e-08 

alternative splicing 9e-ll 

chloroplast 2e-19 

transcription regulation 2e-07 

protein biosynthesis 3e-08 

nucleolin 6e-10 

glycine-rich RNA-binding protein 2e-07 

unassigned ribonucleoprotein repeat-containing proteins 2e-19 
polyadenylate-binding protein 0.0 
ribonucleoprotein repeat homology 0.0 

RNP_1 2 

RNA recognition motif, (aka RRM, RED, or RNP domain) 

Irregular 

3D 

LOW COMPLEXITY 5 . 62 % 


SEQ MTDESGKSKGFGFVSFERHEDAQKAVDEMNGKELNGKQIYVGRAQKKVERQTELKRTFEQ 

SEG 

isxl- 

SEQ MKQDRITRYQVVNLYVKNLDDGI DDERLRKAFSPFGTITSAKVMMEGGRSKGFGFVCFSS 

SEG 

Isxl- CEEEECCCTTTTHHHHHHHHTTTTCCCCCEEECTTTCTTTEEEECTTT 

SEQ PEEATKAVTEMNGRIVATKPLYVALAQRKEERQAYLTNEYMQRMASVRAVPNQRAPPSGY 

SEG 

1 sxl- HHHHHHHHHHHTTTCCCCCCCBCCBCC 

SEQ FMTAVPQTQNHAAYYPPSQIARLRPSPRWTAQGARPHPFQNKPSAIRPGAPRVPFSTMRP 

SEG 

Isxl- 

SEQ ASSQVPRVMSTQRVANTSTQTVGPRPAAAAAAAATPAVRTVPRYKYAAGVRNPQQHRNAQ 

SEG xxxxxxxxxxxxxxxxxxxxxxx 

Isxl- 

SEQ PQVTMQQLAVH VQGQETLTASRLAS APPQKQKQMLGERL FPLIQAMHPTLAGKITGMLLE 

SEG 

Isxl- 

SEQ IDNSELLYMLESPESLRSKVDEAVAVLQAHQAKEATQKAVNSATGVPTV 

SEG 

Isxl- 
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PS00030 
PS00030 


Prosite for DKFZphtes3_8ml0.2 


9->17 
111->119 


RNP 
RNP" 


PDOC00030 
PDOC00030 


Pfam for DKFZphtes3_8ml0.2 


HMM^NAME 

Query 

HMM 

Query 


RNA recognition motif, (aka RRM, RBD, or RNP domain) 

*IYVGNLPWDtTEEDLrDlFsQFGpIvsIrMMrDReTGRSRGFAFVEFED 
+YV+NL+ +++E LR +FS+FG I+S+++M+ E GRS+GF+FV F + 

74 LYVKNLDDGIDDERLRKAFSPFGTITSAKVMM— EGGRSKGFGFVCFSS 

EEDAe JcAIdeMNGmeFmGRrlRV* 
+E+A+KA+ EMNG+++ ++++V 
121 PEEATKAVTEMNGRIVATKPLYV 143 


120 


Pedant information for DKFZphtes3_8mlO, frame 3 


Report for DKFZphtes3_8mlO. 3 


(LENGTH] 
{MW] 
(pl] 
[HOMOLJ 
1) {PABP 1) 
fFUNCAT] 
cerevisiae, 
{FUNCAT} 
[FUNCATJ 
YER165W] le-64 
[FUNCAT] 
[FUNCATJ 
[FUNCATJ 
repair) 
( FUNCAT 1 
2e-19 
[FUNCAT) 
[FUNCATJ 
[FUNCATJ 
[FUNCAT) 
(FUNCATJ 
[FUNCATJ 
[FUNCAT) 
(FUNCAT) 
[FUNCAT] 
(FUNCAT] 
[FUNCAT) 
3e-04 
[FUNCAT) 
[BLOCKS] 
[BLOCKS] 
[SCOP] 
(SCOP) 
(SCOP J 
[PIRKWJ 
(PIRKWJ 
(PIRKWJ 
[PXRKW] 
[PIRKWJ 
(PIRKW) 
(PIRKW) 
[PIRKW] 
(PIRKWJ 
(PIRKWJ 
(PIRKW) 
[PIRKW] 
(PIRKW) 
[PIRKW] 
[SUPFAMJ 
[SUPFAMJ 
[SUPFAMj 
(SUPFAM) 


235 

26308.08 
8.95 

^^SWISSPROT:PAB1_HOMAN POLYADENYLATE-BINDING PROTEIN 1 (POLY (A) BINDING PROTEIN 
YER?652j'?e-6r^ processing (S'-end, 3*-end processing and mrna degradation) (S. 

ll'nl cytoplasm (S. cerevisiae, YER165w] le-64 

05.04 translatxon (initiation, elongation and termination) (s. cerevisiae, 

30.10 nuclear organization - [s. cerevisiae, YER165w) le-64 

w!^''!!l^^"f^^?ll.^'''^ "^"^ "P^^^ cerevisiae, YFR023wJ le-24 

|s:'cer:ei:i^:'%^S23e? i:?24'' ^^'^ ^^"'^'"'^ "^^'^ nucleotide excision 
04.05.99 other mrna-transcription activities [s. cerevisiae, YNL016wl 

"'"'''^ processing (splicing) (s. cerevisiae, YOR319wJ 2e-14 
04.01.04 rrna processing [S. cerevisiae, YGR159cJ le-11 

11.01 stress response (S. cerevisiae, yGR159c) le-11 
99 unclassified proteins [S. cerevisiae, YGR250c] le-09 

04.07 rna transport {s. cerevisiae, YOL123w HRPl - CF Ibj le-09 
Qfi if.!f^?''"f^^°'' °^ chromosome structure [S. cerevisiae, YCLOllcJ 8e-09 
93 classification not yet clear-cut [S. cerevisiae, YPR112c] 2e-08 
oj.ij meiosis (S. cerevisiae, YHR086wJ 2e-08 

2^n? ^^^f"" transcription activities fs. cerevisiae, YBR212wJ 3e-08 
03,01 cell growth [s. cerevisiae, YBR212wJ 3e-08 

06.04 protein targeting, sorting and translocation [S. cerevisiae, YDR432w] 


pfnoLnn'^i^f'' transport (s. cerevisiae, YDR432w) 3e-04 

BL00030B Eulcaryotic RNA-binding region rnp-1 proteins 
BL00900D Bacteriophage-type RNA polymerase family proteins signatur 
rf9n^i— o ,^fr^^^^^^ protem [(Drosophila melanogaster) 9e-23 

d2ula 4.34.7.1.2 UlA protein [human (Homo sapiens) 6e-24 
nicleUs le-lio'^"^ Nuclear ribonucleoprotein Al, RNP Al, UP le-13 
duplication le-110 
RNA binding le-110 
nucleolus 4e-10 
tandem repeat 4e-10 
single-stranded DNA binding le-06 
DNA binding 9e-12 
phosphoprotein 4e-10 
mitochondrion 6e-07 
heterotrimer 4e-06 
alternative splicing le-15 
chloroplast 5e-ll 
transcription regulation 3e-09 
GTP binding 2e-06 
helix-destabilizing protein le-07 
nucleolin 4e-10 

glycine-rich RNA-binding protein 2e-07 
yeast HRPl protein 2€-08 
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ISUPFAM] 

ISUPFAM] 

(SUPFAM) 

{PROSITEI 

[PFAM] 

IKWJ 

IKWl 


unassigned ribonucleoprotein repeat-containing proteins 3e-25 
polyadenylate-binding protein le-112 
ribonucleoprotein repeat homology le-112 
RNP_1 1 

RNA"recognition motif, (aka RRM, RBD, or RNP domain) 

All_Beta 

3D 


SEQ 
Ihal- 


SEQ 
Ihal- 


SEQ 
Ihal- 


SEQ 
Ihal- 


ERSRLVCLRAAVPRMNPSTPSYPTASLYVGDLHPDVTEAMLYEKFSPAGPILSIRICRDL 
EEEETTTTTTCHHHHHHHHGGGCCEEEEEEEETT 

ITSGSSNYAYVNFQHTKDAEHALDTMNFDVIKGKPVRIMWSQRDPSLRKSGVGNIFVKNL 
TTTCEEEEEEEEECCHHHHHHHHHHTTEEE-TT— EEEEEEECTTTTCCCCCEEEEECC 

DKSINNKALYDTVSAFGNILSCNWCDENGSKGYGFVHFETHEAAERAIKKMNGMLLNGR 
TTTTCHHHHHHHHGGGCCEEEEEEEETTTTTCEEEEEEECCHHHHHHHH 

KVFVGQFKSRKEREAELGARAKEFPNVYIKNFGEDMDDERLKDLFGKFGPALSVN 


Prosite for DKFZphtes3_8mlO . 3 
PS00030 152->160 RNP_1 PDOC00030 

Pfam for DKFZphtes3_8mlO. 3 


HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 


RNA recognition motif, (aka RRM, RBD, or RNP domain) 

♦lYVGNLPWDtTEEDLrDlFsQFGpIvsIrMMrDReTGRSRGFAFVEFED 
+YVG+L +D+TE +L + FS+ GPI+SIR+ RD T S +A+V+F+ 
27 LYVGDLHPDVTEAMLYEKFSPAGPILSIRICRDLITSGSSNYAYVNFQH 75 

EEDAekAIdeMNGraeFmGRrlRV* 
DAE A+D+MN ++ G+++R+ 
7 6 TKDAEHALDTMNFDVIKGKPVRI 98 

•lYVGNLPWDtTEEDLrDlFsQFGplvsIrMMrDReTGRSRGFAFVEFED 
I+V+NL+ +++ L D S FG I+S++++ D + S+G++FV FE+ 
115 IFVKNLDKSINNKALYDTVSAF6NILSCNWCD— ENGSKGYGFVHFET 161 

EEDAekAIdeMNGmeFmGRrlRV* 
+E+AE+AI +MNGM+++GR++ V 
162 HEAAERAIKKMNGMLLNGRKVFV 184 
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DKFZphtes3_8p7 
group: testes derived 

DKF2phtes3_8p7 encodes a novel 412 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The^new protein can find application in studying the expression profile of testis-specif ic 
unknown 

2 EST hits (both from testis librarys) 

Sequenced by MediGenomix 

Locus : unknown 

Insert length: 2899 bp 

Poly A stretch at pos. 2870, poiyadenylation signal at pos. 2852 

1 CCGACCCGCC CTGGGGTGCT GCGTGCGCTG CCTGCTCCCG CCTGAGGAAA 
51 ACACTGCCCA TGGCGCAAGG CCGGGAGCGC GACGAAGGCC CCCACTCCGC 
101 CGGCGGCGCG TCCTTGTCCG TGAGATGGGT GCAAGGATTC CCTAAGCAGA 
151 ATGTTCATTT GTCAACGACA ACACCATTTG CTACCCTTGT GGGAATTATG 
201 TAATATTTAT TAATATTGAA ACCAAGAAAA AGACTGTACT GCAGTGTAGT 
251 AATGGAATTG TGGGCGTCAT GGCAACTAAC ATCCCCTGTG AAGTTGTGGC 
301 TTTTTCTGAC CGGAAGCTAA AACCTCTCAT CTACGTATAC AGCTTTCCAG 
351 GATTGACCAG AAGGACCAAA TTGAAAGGCA ACATTCTCCT GGACTACACT 
401 TTACTTTCAT TCAGTTACTG TGGCACCTAC CTGGCTAGTT ACTCCTCTCT 
451 CCCAGAATTT GAACTGGCCC TTTGGAACTG GGAATCGAGT ATCATTTTGT 
501 GTAAGAAATC ACAGCCTGGA ATGGATGTGA ACCAAATGTC TTTTAACCCC 
551 ATGAACTGGC GCCAGCTGTG CTTATCAAGT CCAAGTACAG TGAGCGTGTG 
601 GACCATTGAA AGAAGTAACC AGGAGCATTG TTTCAGAGCA AGGTCGGTGA 
651 AATTACCTCT AGAAGATGGG TCATTTTTTA ATGAAACGGA TGTCGTTTTC 
701 CCCCAGTCGT TGCCGAAAGA TCTCATCTAT GGTCCCGTGC TGCCACTGTC 
751 AGCCATTGCC GGGCTGGTAG GCAAAGAGGC AGAGACTTTC CGGCCGAAAG 
801 ATGATCTATA TCCTTTGCTT CACCCGACTA TGCATTGCTG GACTCCAACA 
851 AGTGACTTGT ACATTGGCTG TGAAGAGGGT CATCTTTTAA TGATTAATGG 
901 AGACACCTTG CAAGTGACTG TACTTAATAA GATAGAAGAG GAATCGCCAT 
951 TGGAAGACAG AAGAAATTTT ATCAGTCCAG TAACCTTGGT ATATCAGAAG 
1001 GAGGGCGTGC TGGCTTCTGG AATTGATGGC TTTGTGTATT CTTTTATTAT 
1051 TAAAGATAGA AGTTACATGA TCGAGGATTT TCTTGAGATT GAAAGACCTG 
1101 TAGAACATAT GACATTTTCT CCCAATTATA CAGTGTTGCT GATTCAAACA 
1151 GACAAGGGAT CTGTTTATAT CTACACTTTT GGTAAGGAGC CAACCTTAAA 
1201 TAAAGTCCTA GATGCTTGTG ATGGGAAATT TCAGGCAATT GACTTTATCA 
1251 CACCTGGAAC CCAATACTTC ATGACACTTA CATATTCAGG GGAAATTTGT 
1301 GTTTGGTGGC TGGAGGATTG TGCTTGTGTA AGCAAGATTT ATCTGAATAC 
1351 CCTAGCAACG GTTCTGGCTT GCTGTCCATC CTCCCTCTCT GCAGCCGTGG 
1401 GCACGGAGGA TGGCTCGGTC TACTTCATCA GCGTATATGA TAAGGAATCC 
1451 CCTCAGGTCG TGCACAAGGC CTTTCTCTCG GAATCGTCCG TGCAGCACGT 
1501 CGTGTAAGTC CTTTCTGCCT CCAGGAGCGG CTCCGTGTCA CACCCGTCTG 
1551 TTGAAAATTC TAGTGAAGCC ATCCTTTCTT TTAATTTTAA GTTTTACGTG 
1601 TTTCATTTGT TTTGAATGTT AATATATTCA CACAGTTCAA CACTCAAAAG 
1651 GTACAGAGGG CTGTGTAGTA AAGTACCCCC CATACCCAGG TCTGTCCTTG 
1701 CAGGCAGCCT GGTACCAATT TCTCATGTCT CTCCTGAGAT GTTTTATCCA 
1751 TGAACAAGCA AAACATAATA AGCACTTCTT TTTACTTGTA TCAATGGCCA 
1801 TCATGTGTGT ATAGTGTGCC AGGCACTTCT GCTGTATTAA CTCCATGAGG 
1851 TAAACACTCT TGTTGTCTCT ATTTGACAGG TGAGGAAGAT AAGGCACAAG 
1901 GATTTTAAAT AACTTGCTCA ATAGTACACA GATAGTGAAT GGCAAATGTT 
1951 GGGATTTGAA CCCAGGTAGT TGGGCTGCAG AGTCACTGCC TTTGCTCTTA 
2001 AAAGGAGAAA ACTATGTACA ATGCCTCATT TCTTTTTTCA CTTAATCGTA 
2051 TATCTTGGAG AATGTTTTAT ATCCACACAT AAAGACCAGC CTGATTATTT 
2101 GTATAGCCAC ATAGTATTCC ATTATATGAA TATACTATCA TTTTTTAAAA 
2151 ACGGTATATT AATGAACATT TAGAGTATTT CAAAACTTTT GAAGCAATAC 
2201 TTTTAAGATG ATAATATAGA GACATTAGAT TTGGACTTGT AGGTGCTATC 
2251 ATTATTACTG TTTCTTTTTA ATTTATTATA TTATTAGGTA TTAATAAGAA 
2301 CAGACATTTG TATTCTGCTT TACAGCTTGA GATCACTGTA GCTTGTGGCA 
2351 TGTGATCCTC AAAACACCAG TCAGAAAGGT GTTATTCTTA TCCCTATTAG 
it^^ ACAAATTAGG GAATTCAGGG TTAGAGAGGT GAGGAAAAGC ATTGTCCAAG 
2451 ATTACACATT ACACAGCTAG CACACTGAGG AGCTGGCCCT GCCACTGTGG 
2501 ACTGCCCAGC TCCACCACCC TAGCTCAGTG GGGAAGGATG GATAACCTCC 
2551 TTCCATTTAC CCCCTGCCTT TCTGCACTGT CATTTTTTTG TGCCTTTCCT 
2601 TTCTCAGATC CTCTTATTCT AATTTACATC TTCCCACTTT TTCTAATTTG 
2651 ATAAAGTTGT AGACATGTTT CACTACATTC TTCCTCCCAC TGCCAGGTAC 
2701 CAGACACAGG GTAATGAAAT GTCACACCCA CCACTAATTT GAGAATTGCT 
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27 51 TATTTGCGCT TGAAACATCA AGAAAGCTCT ACCGACAGAC ATGTTTCATT 
2801 CACTTATGAT GAACCAACTG CCCATCTTTA CTGAATCTTC TTGACTGTAT 
2851 TTATTAAAGT TGCAATTTGG AAATAAAAAA AAAAAAAAAA AAAAAAAGG 


BLAST Results 

No BLAST result 

Medline entries 

No Medline entry 


Peptide information for frame 2 


ORF from 269 bp to 1504 bp; peptide length: 412 
Category: putative protein 
Classification: no clue 


1 MATNIPCEVV AFSDRKLKPL lYVYSFPGLT RRTKLKGNIL LDYTLLSFSY 
51 CGTYLASYSS LPEFELALWN WESSIILCKK SQPGMDVNQM SFNPMNWRQL 
101 CLSSPSTVSV WTIERSNQEH CFRARSVKLP LEDGSFFNET DVVFPQSLPK 
151 DLIYGPVLPL SAIAGLVGKE AETFRPKDDL YPLLHPTMHC WTPTSDLYIG 
201 CEEGHLLHIN GOTLQVTVLN KIEEESPLED RRNFISPVTL VYQKEGVLAS 
251 GIDGFVYSFI IKDRSYMIED FLEIERPVEH MTFSPNYTVL LIQTDKGSVY 
301 lYTFGKEPTL NKVLDACDGK FQAIDFITPG TQYFMTLTYS GEICVWWLED 
351 CACVSKIYLN TLATVLACCP SSLSAAVGTE DGSVYFISVY DKESPQVVHK 
401 AFLSESSVQH VV 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_8p7, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_8p7, frame 2 

Report for DKF2phtes3_8p7.2 

(LENGTH] 412 

fMW] 46476.62 

(pl] 4.91 

[KW] Alpha_Beta 

SEQ MATNIPCEWAFSDRKLKPLIYVYSFPGLTRRTKLKGNILLDYTLLSFSYCGTYLASYSS 
PRD cccccceeeeeecccccceeeeeecccccccccccchhhhhhhheeeecccccccccccc 

SEQ LPEFELALWNWESSIILCKKSQPGMDVNQMSFNPMNWRQLCLSSPSTVSVWTIERSNQEH 
PRD cchhhhhhhhccccceeeccccccccceeeccccccceeeeeccccceeeeeeeecchhh 

SEQ CFRARSVKLPLEDGSFFNETDVVFPQSLPKDLIYGPVLPLSAIAGLVGKEAETFRPKDDL 
PRD hhhhhhhcccccccccccccccccccccccccccccccceeeeeeccccccccccccccc 

SEQ YPLLHPTMHCWTPTSDLYIGCEEGHLLMINGDTLQVTVLNKIEEESPLEDRRNFISPVTL 
PRD cccccccccccccccceeeecccceeeecccceeeeeehhhhhcccccccccccccccee 

SEQ VYQKEGVLASGIDGFVYSFIIKDRSYMIEDFLEIERPVEHMTFSPNYTVLLIQTDKGSVY 
PRD eeeceeeeecccceeeeeeeeeccchhhhhhhhhhcccceeeccccceeeeeecccccee 

SEQ lYTFGKEPTLNKVLDACDGKFQAIDFITPGTQYFMTLTYSGEICVWWLEDCACVSKIYLN 
PRD eeeccccccchhhhhcccccceeeeeccccceeeeeeeccceeeeeeecceeeeeeeehh 

SEQ TLATVLACCPSSLSAAVGTEDGSVYFISVYDKESPQVVHKAFLSESSVQHVV 
PRD hhhhhhhccccccceeeeccccceeeeeeeccccccchhhhhhhcccccccc 


(No Prosite data available for DKFZphtes3_8p7 .2) 
(No Pfam data available for DKFZphtes3_8p7.2) 
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DKFZphtes3_9e22 


group: testes derived 

DKFZphtes3 9e22 encodes a novel 227 amino acid protein with weak partial similarity to Ring- 
linger proteins. j ^ 

For the novel protein, Pfam, but not Prosite predicts a C3HC4 type RING finger motife 
NO informative BLAST results; No predictive prosite, pfam or SCOP motife 

genes^" ^^""^ application in studying the expression profile of testis-specific 


similarity to zinc finger proteins 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 1318 bp 

Poly A stretch at pos, 1308, no polyadenylation signal found 


1 GCTCCCCCGG CTTTCGGAGC CCGGGGGCGG CCTGTGGCGC GCGGAGCCCG 
51 CGCCGGACTG CGCCTCTTTG GACCTTGAGG GGAAACATGC GTTTGCCTTG 
101 GATCGTTTGA AATTCTAAGT TTGGGATCCC CGCCCGCCCG CCTGCCTCTT 
151 CCGCCCCGCG GGTTTTTTCC TTTTTTCCTT TTGCTTTTTT TCCTTTTCTC 
201 CCTCCGGGTC TCCTTTTTGA CTCCCTCCCC CTTTATGCTC GCCCAGCCCT 
251 CCCCCTGCTG CTGAGAAGTG GGGGAGGGTC TCGGCCTCCA GGTTCCCGCC 
301 . CCACCGGGGC CCGGGCGAGC ATGGGGGGCA AGCAGAGCAC GGCGGCCCGC 
351 TCCCGGGGCC CCTTCCCGGG GGTCTCCACC GATGACAGCG CCGTGCCGCC 
401 GCCGGGAGGG GCGCCCCATT TCGGGCACTA CCGGACGGGC GGCGGGGCCA 
451 TGGGGCTGCG CAGCCGCTCG GTCAGCTCGG TGGCAGGCAT GGGCATGGAC 
501 CCCAGCACGG CCGGGGGGGT GCCCTTTGGC CTCTACACCC CCGCCTCCCG 
551 GGGCACCGGC GACTCCGAGA GGGCGCCCGG CGGCGGAGGG TCTGCGTCCG 
601 ACTCCACCTA TGCCCATGGC AATGGTTACC AGGAGACGGG CGGCGGTCAC 
651 CATAGAGACG GGATGCTGTA CCTGGGCTCC CGAGCCTCGC TGGCGGATGC 
701 TCTACCTCTG CACATCGCAC CCAGGTGGTT CAGCTCGCAT AGTGGTTTCA 
751 AGTGCCCCAT TTGCTCCAAG TCTGTGGCTT CTGACGAGAT GGAAATGCAC 
801 TTTATAATGT GTTTGAGCAA ACCTCGCCTC TCCTACAACG ATGATGTGCT 
851 GACTAAAGAC GCGGGTGAGT GTGTGATCTG CCTGGAGGAG CTGCTGCAGG 
901 GGGACACGAT AGCCAGGCTG CCCTGCCTGT GCATCTATCA CAAAAGCTGC 
951 ATAGACTCGT GGTTTGAAGT GAACAGATCT TGTCCGGAAC ACCCTGCGGA 
1001 CTGACCTGCG GGCTTGCTTG CTGACTCCTC TCAAAGGGAC AGAGCGCCCC 
1051 TGCTCCAGGG AGGAGGCTCA CCGGACCCTG GGGCAGAGCT GAGCTTGGGA 
1101 CACCAGCGGG AACAGGGCAC CCCTTCTGCA CTGACTTCCA GATCATGGTT 
1151 CTCCCTTCCT CCCTGAGGAC ACCAAATTGG ATGAGAGCAA GTTTGAGAGA 
1201 AGAATGAATC AACTGCTATC CTTCCCCTCA CCCCTCAGCC CAGGAGGGAA 
1251 AGGGCATTTT CTTTTTCATC TTTGAAAGGC ATTGTGGGTC TGTCTTTAAA 
1301 GTGTTTACAA AAAAAAAA 


BLAST Results 


No BLAST result 


Medline entries 


No Medline entry 


Peptide information for frame 3 


ORF from 321 bp to 1001 bp; peptide length: 227 
Category: similarity to known protein 
Classification: unclassified 


1 MGGKQSTAAR SRGPFPGVST DDSAVPPPGG APHFGHYRTG GGAMGLRSRS 
51 VSSVAGMGMD PSTAGGVPFG LYTPASRGTG DSERAPGGGG SASDSTYAHG 
101 NGYQETGGGH HRDGMLYLGS RASLADALPL HIAPRWFSSH SGFKCPICSK 
151 SVASDEMEMH FIMCLSKPRL SYNDDVLTKD AGECVICLEE LLQGDTIARL 
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201 PCLCIVKKSC IDSWFEVNRS CPEHPAD 

BLASTP hits 

NO BLASTP hits available 

Alert BLASTP hits for DKF2phtes3_9e22, frame 3 

TREMBL:AF078823_1 product: "RING-H2 finger protein RHA2b"; Arabidopsis 
thaliana RING-H2 finger protein RHA2b mRNA, complete cds., N = 1, Score 
= 111, P = 2.8e-06 

TREMBL:AF078822_1 product: "RING-H2 finger protein RHA2a"; Arabidopsis 
thaliana RING-H2 finger protein RHA2a mRNA, complete cds., N = 1, Score 
^ 112, P = 6.6e-06 

TRCMBL:AC004138_14 gene: "T17M13.17"; Arabidopsis thaliana chromosome 
II BAG T17M13 genomic sequence, complete sequence., N = 2, Score = 123, 
P « 1.4e-05 

PIR:T02286 hypothetical protein T13D8.23 - Arabidopsis thaliana, N « 1, 
Score = 142, P = 8.8e-08 


>PIR:T02286 hypothetical protein T13D8.23 - Arabidopsis thaliana 
Length = 327 

HSPs: 

Score « 142 (21.3 bits). Expect « 8.8e-0B, P = 8.8e-08 
Identities « 24/57 (42%), Positives = 30/57 (52%) 

Query: 166 SKPRLSYNDDVLTKDAGECVICLEELLQGDTIARLPCLCIYHKSCIDSWFEVNRSCP 222 

S P + LT D +C +C+EE + G LPC lYHK CI W +N SCP 

Sbjct: 206 SLPSVKITPQHLTNDMSQCTVCMEEFIVGGDATELPCKHIYHKDCIVPWLRLNNSCP 262 


Pedant information for DKF2phtes3_9e22, frame 3 


Report for DKFZphtes3_9e22 .3 


[LENGTH) 227 

[MW] 23782.62 

{pU 6.18 

(HOMOLJ PIR:T02286 hypothetical protein T13D8.23 - Arabidopsis thaliana 2e-08 

(FUNCATl 99 unclassified proteins (S. cerevisiae, YDR313c] 4e-06 

[FUNCAT] 30.07 organization of endoplasmatic reticulum (S. cerevisiae, YOL013c) 

0.001 

{FUNCAT) 06.13 proteolysis [S. cerevisiae, YOL013c] 0.001 

(PFAM] Zinc finger, C3HC4 type (RING finger) 

(KWJ Irregular 


SEQ MGGKQSTAARSR6PFPGVSTDDSAVPPPGGAPHFGHYRTGGGAMGLRSRSVSSVAGMGMD 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccceeeccccccc 

SEQ PSTAGGVPFGLYTPASRGTGDSERAPGGGGSASDSTYAHGNGYQETGGGHHRDGMLYLGS 

PRD ccccccccccccccccccccccccccccccccccccccccccccccccccccccceeech 

SEQ RASLADALPLHIAPRWFSSHSGFKCPICSKSVASDEMEMHFIMCLSKPRLSYNDDVLTKD 

PRD hhhhhhhhceeecccccccccccccccccccchhhhhhhhhhhhcccccccccccccccc 

SEQ AGEC V I CLEELLQGDT I ARL PCLC I YHKSC I DSWFEVNRSC PEHPAD 

PRD cceeeeeecccccccccccccceeeeeeccchhhhhhhhcccccccc 


(No Prosite data available for DKFZphtes3_9e22 . 3) 


Pfam for DKFZphtes3_9e22 .3 


HMM_NAME Zinc finger, C3HC4 type (RING finger) 

HMM *CPICFcTFQlDyPWPFdePmMlPCgHsFCypCIrrW CPmC* 

C IC L+++ D++ LPC+ ++ ++CI +W CP+ 

Query 184 CVIC LEELLQGDTIARLPCLCIYHKSCIDSWFEVNRSCPEH 224 
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group: testes derived 

DKFZphtes3_9i20 encodes a novel 205 amino acid protein with similarity to human KIAA0336 g( 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes. 


unknown 

complete cDNA, complete cds, EST hits 
Sequenced by DKFZ 

Locus: /map«"44.1 cR from top of Chrl7 linkage group- 
Insert length: 2509 bp 

Poly A stretch at pos. 2499, polyadenylation signal at pos. 2481 


1 CTCGCCGAGA TGACCTGGGC ACCTCTGCGT TGAATCGGCA AATACTGATC 
51 AAGCCGCATT TATTCTGCTC TCAGGAACTC TAAGTCTAGC AGAGAAGATG 
101 AGGCGGTAGA AGTTCATCAA TGGCTTGGCT GGAGGACAAG CAAATTGAGG 
151 ACATTGGCAA CGGAGTGATC AAAATGATAG ATCATGAGGC CTAAAATGAA 
201 TAAGGAAAGA AGAGAAGTGG CAGAGGCTGA GAACAGAAAG AGAGGGTGGA 
251 GGGGCTGTAA ATCTTGAAGA TTAGGGTATA ATATGAGTAT ATGGGTAAGA 
301 ATTGGAAGAA TTGTGTAGGA GGCAGTAGTC AAAAAGTAGA AGCAGTTTGG 
351 AAGAGTAGTT ACAAATATCA AGAGCCAGGT GGCTAAAAGG TGGAGCTATA 
401 GGTCATTGAA GCTCAAGAAA CTGAGTCTCT AGGGCATTGG TTAAGTCATC 
451 TGTCTAGACT TCAAAGTTGT CTAGGATGAT AATTCAGAAG ACTGATCTGT 
501 GCCAAAGTCA CAGGTTTTTC ACGACTGAAA ACAACATAGC AAAATAAGCC 
551 AAGATGTCTG TGGATCCAAT GACCTACGAG GCCCAGTTCT TTGGCTTCAC 
601 GCCACAAACG TGCATGCTTC GGATCTACAT TGCATTTCAA GACTACCTAT 
651 TTGAAGTGAT GCAGGCCGTT GAACAGGTTA TTCTGAAGAA GCTGGATGGC 
701 ATCCCAGACT GTGACATTAG CCCAGTGCAG ATTCGCAAAT GCACAGAGAA 
751 GTTTCTTTGC TTCATGAAAG GACATTTTGA TAACCTTTTT AGCAAAATGG 
801 AGCAACTGTT TTTGCAGCTG ATTTTACGTA TTCCCTCAAA CATCTTGCTT 
851 CCTGAAGATA AATGTAAGGA GACACCTTAT AGTGAGGAAG ATTTTCAGCA 
901 TCTCCAGAAA GAAATTGAAC AGTTACAGGA GAAGTACAAG ACTGAATTAT 
951 GTACTAAGCA GGCCCTTCTT GCAGAATTAG AAGAGCAAAA AATTGTTCAG 
1001 GCCAAAGTCA AACAGACGTT GACTTTCTTT GATGAGCTTC ATAATGTTGG 
1051 CAGAGATCAT GGGACTAGTG ATTTTAGGGA GAGTTTAGTA TCCCTGGTTC 
1101 AGAACTCCAG AAAACTACAG AACATTAGAG ACAATGTGGA AAAGGAATCG 
1151 AAACGACTGA AAATATCTTA ATTGCTCAGT AGTCAAAAGG AGGAGCCTGT 
1201 CAAAAAGTAG AATCATAAGG ACTGTTCAAA CCATAAGGAC TGTTCAAATC 
1251 ATACCAGTGA CTGTTCAAAC CAACCATACT TTTTATTAGA TTTGCTTTGT 
1301 CAACTCTTTC TTGTATTCTG TGTTTTCCTC TTTTTTGGTC CACTTTGCTG 
1351 AGGTATGAAG TGTACTACTT TGAACTAGGC TGAAGCATCT GAGTCTTCTA 
1401 ATAAGTGGGA AGGGATCCAA CAAAGAAGCC ATGACCAGTT AAAGATATTT 
1451 GCAGAGTTAC ACCTTGGTCA TAAGTCCTTT GTGACCTTGA TTATTTTGGC 
1501 TTACTCTTTG GATGAGACCA GACAAGAAAA GGATTAAACG GGTGGCTCCT 
1551 TTAATATTAT TATTATTGTT TTTGAGACAA GGTCCCTTTC TGTCACCCAG 
1601 GTTAGAGTAG ATTTCAGTGG CACAATCTTG GCTCACTGCA ACCTCTGTGT 
1651 CCTGGGCTCA AGTGATCCTC CTGCCTCAGC CTCCCAAGTA GCTAGGACCA 
1701 CAGGTGCGTG TCACCATGCT TGGCTAATTT TTTTGCAGAA ACGAGGCCTC 
1751 ACTATATTGT CCAGGCTGAG TGGCTCTTTT ATTAACCAGT CATTACACTG 
1801 CGGAACAGCC AACATAGAGT ACTTGCTCTC GTCCTGTGAA TTTTCTTTCA 
1851 TGAGGGAGTC AATATGTAGT GGAAAGAAGC ATGTAGCAAA AAAGACAACC 
1901 TTGATCTTTA ATAAAAAAGA AGTTGGTTTA TTTCCAAAAT AAATCCCCTG 
1951 ACAAAAAACC TGGTGATGTT AAGCAATTGA CTGTCTTAGA GTCCAGCAGA 
2001 AGACCTTAGA CAAAAAAAGC AGAACCCACT GGAGTAGAAA AGGAAGCATG 
2051 TAGCATATAC TCAGTAGTGA AATTTAATTT TACTGACTGT TAGGTATCTA 
2101 TGCCAATTTG TTTTCATACT TCAGTTGGTT TTGGAATCTG CCTTATACCT 
2151 AATATTTATT TATTCACACT CATAAGCATC AAATATTTAA TGCCCTCAGT 
2201 GGGAAATTTG TGTTTAAACT CAATGGAATC TAATATTTCT TTATGTCGTT 
2251 AGTCCCTGTA AAATGTTAGG TCACCCAAGG AAAGGGGAGA AATAGCAATG 
2301 GTTGTTCCTA AGGTATTGCT TGCCCTCCAT GTCTTCCTAA AGAGCAGAAC 
2351 TTGGAGTTTC TCCTTTATGT AGAG7VAGAAG TAACTTAGGG TGTATTTGCA 
2401 ATGAAATATT CATAGATATT GAAAGCTTGT GTTTACATGA AATATGTTTA 
2451 TTATCAAGAA GTCCTTTTTC CAATTCTGTA CATTAAATAT ATGTGTTTTA 
2501 AAAAAAAAA 


BLAST Results 
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Entry AC004148 from database EMBL: 

Homo sapiens chromosome 17, clone HCIT524C5, complete sequence. 
Score = 5245, P = O.Oe+00, identities « 1049/1049 
3 exons 

Entry HS556361 from database EMBL: 
human STS TIGR-AOOaNag. 

Score = 1005, P « 1.3e-39, identities = 201/201 

Entry HSG043 from database EMBL: 
human STS SHGC-36031. 
Score » 955, P = 2.8e-37, identities « 205/215 


Medline entries 


No Medline entry 


Peptide information for frame 2 


ORF from 554 bp to 1168 bp; peptide length: 205 
Category: putative protein 
Classification: no clue 


1 MSVDPMTYEA QFFGFTPQTC MLRIYIAFQD YLFEVMQAVE QVILKKLDGI 
51 PDCDISPVQI RKCTEKFLCF MKGHFDNLFS KMEQLFLQLI LRIPSNILLP 
101 EDKCKETPYS EEDFQHLQKE lEQLQEKYKT ELCTKQALLA ELEEQKIVQA 
151 KLKQTLTFFD ELHNVGRDHG TSDFRESLVS LVQNSRKLQN IRDNVEKESK 
201 RLKIS 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKF2phtes3_9i20, frame 2 

TREMBLNEW:HSAB2334_1 gene: "KIAA0336"; Human mRNA for KIAA0336 gene, 

complete cds., N b~1. Score b io7, P - 0.0081 


>TREMBLNEW:HSAB2334_1 gene: "KIAA0336"; Human mRNA for KIAA0336 gene, 
complete cds. 

Length « 1,583 

HSPs: 

Score = 107 (16.1 bits). Expect = 8,2e-03, P = 8-le-03 
Identities » 42/140 (30%), Positives = 76/140 (54%) 

Query: 65 EKFLCFMKGHFDNLFSKMEQLFLQLILRIPSNILLPEDKCKETPYSEED FQHLQKE 120 

EK CF+K H +NL +EQ +L R ILL +D ++P + D + L+++ 

Sbjct: 796 EKEKCFIKEH-ENLKPLLEQK— ELRDRRAELILL-KDSLAKSPSVKNDPLSSVKELEEK 851 

Query: 121 IEQLQE--KYKTELCTKQALLAELEEQKIVQAKLKQTLTFFDELHNVGRDHGTSDFRESL 178 

IE L++ K K E K L+A ++ +K + + K+T T +EL ++ + S + 

Sbjct: 852 lENLEKECKEKEEKINKIKLVA-VKAKKELDSSRKETQTVKEELESLRSEK— DQLSASM 908 

Query: 179 VSLVQNSRKLQNIRDNVEKESKRLKI 204 

L+Q + +N+ EK+S++L + 
Sbjct: 909 ROLZQGAESYKNLLLEYEKQSEQLDV 934 


Pedant information for DKFZphtes3_9i20, frame 2 


Report for DKFZphtes3_9i20.2 


(LENGTH) 205 

(MWl 24140.13 

[pll 5.51 

[KWI All Alpha 

IKW) COILED COIL 18.05 % 
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SEQ MSVDPMTYEAQFFGFTPQTCMLRIYIAFQDYLFEVMQAVEQVILKKLDGIPDCDISPVQI 

PRD cccccchhhhhhcccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccc 

COXLS » 

SEQ RKCTEKFLCFMKGHFDNLFSKMEQLFLQLILRIPSNILLPEDKCKETPYSEEDFQHLQKE 

PRD cccchhhhhhhcccccchhhhhhhhhhhhhhhcccceeeccccccccccchhhhhhhhhh 

^^^^^ CCCCCCCCCC 

SEQ lEQLQEKYKTELCTKQALLAELEEQKIVQAKLKQTLTFFDELHNVGRDHGTSDFRESLVS 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccchhhhhhhh 

COI LS ccccccccccccccccccccccccccc 

SEQ LVQNSRKLQNIRDNVEKESKRLKIS 

PRD hhcccchhhhhhhhhhhhhhhcccc 

COILS 


(No Prosite data available for DKFZphtes3_9i20 . 2 J 
(No Pfam data available for DKFZphtes3_9i20 .2) 
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DKFZphtes3__9k22 


group: testes derived 

DKF2phtes3_9k22 encodes a novel 304 amino acid protein with partial similarity to X. leavis 
katanin p80. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 


similarity to C-terminus of katanin p80 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 2676 bp 

Poly A stretch at pos. 2665, no poiyadenylation signal found 


1 CTCTCTAGGC TGCCGGGCGC TGGTCGTCAG CGCCGAGGCT GGGCTGAGGC 
51 GCCGCGGTAC CATGAGGCGC CGGTACTTAA GAGATTATGG CATCAGAAAC 
101 CCACAATGTT AAAAAACGGA ACTTTTGTAA TAAGATTGAG GATCATTTCA 
151 TTGATCTTCC TAGAAAAAAG ATCTCTAATT TCACTAATAA GAACATGAAG 
201 GAGGTTAAGA AATCTCCAAA ACAGTTGGCT GCTTACATAA ATAGAACAGT 
251 TGGACAAACT GTGAAAAGCC CAGATAAACT TCGTAAAGTG ATCTATCGCA 
301 GAAAGAAAGT TCATCATCCC TTTCCAAATC CTTGTTACAG AAAAAAACAG 
351 TCCCCTGGAA GTGGGGGCTG TGACATGGCA AATAAAGAAA ATGAACTGGC 
401 TTGTGCAGGC CACCTGCCTG AAAAATTACA CCATGATAGT CGAACATATT 
451 TGGTTAACTC CAGTGATTCT GGTTCTTCAC AGACAGAAAG CCCATCATCA 
501 AAATATAGTG GGTTTTTTTC TGAGGTTTCT CAGGACCATG AAACAATGGC 
551 CCAAGTTTTG TTCAGCAGGA ATATGAGATT GAATGTAGCT TTAACTTTCT 
601 GGAGAAAGAG AAGTATAAGT GAACTTGTAG CTTATTTGTT GAGGATAGAA 
651 GATCTTGGCG TTGTGGTAGA TTGCCTTCCT GTGCTCACCA ATTGTTTACA 
701 GGAAGAAAAA CAATATATCT CACTTGGCTG CTGTGTTGAC TTGTTGCCTC 
751 TAGTAAAGTC ACTACTTAAA AGCAAATTTG AAGAATATGT TATAGTTGGT 
801 TTAAACTGGC TTCAAGCAGT CATTAAAAGG TGGTGGTCAG AACTATCATC 
851 CAAAACAGAA ATTATAAATG ATGGAAATAT TCAAATTTTA AAACAACAAT 
901 TAAGTGGATT ATGGGAACAG GAAAACCATC TTACTTTGGT TCCAGGATAT 
951 ACTGGTAATA TAGCTAAGGA TGTAGATGCT TATTTATTAC AGTTACATTG 
1001 AGAGATTTCA TCTACTAAAG AGCATTTGGT TTTTCAAAAC ATCCCTGAAC 
1051 TGTATAATTT ACAAAAAAAA AAGTCTCGTC TGAGAACTGT GAACTGTGGA 
1101 AGAAATCAAA ACTATTTTTT CTTTTAAAAA GCCACGTAAT GAAACCACTA 
1151 ATGAAATCCC AGCAATCTGC TTCACATTGA AGTGGAAAAA TATCCAAAAG 
1201 GAGCAGCTTC AATTTCATTG AGGTGAAAGT GCACTATGAA GATTGTTCAC 
1251 CTTTGCTGCA TTTGGGAGTT ATATGGTTAT TTGGTAACAT TAAGAACTAC 
1301 TGGATTTTAA TGCAATCCTG CATAAAAATA TAATTTATAC TATGTGAAAA 
1351 AATAAGACAG GACTTACCAC TAGGAACCAC CAAGACCAAT CATCATTAAC 
1401 TTTTTTAAGA TTGTGTTTTA TTAAAAAAAA AAAACACTTA AATGTGTGCA 
1451 GCTATTTTCT TATGTTGAAA AGACTGAAAG TTTAAAACAT GAAAAAAATC 
1501 AATATTAAAC ATTTTTTGTT CACACTGAGA TACTGTGTAT GTAAAATGCC 
1551 TTAATTATTA ATAAGCCAAT GTGTTATGAT ACCAATATCT GTTTTAAAAA 
1601 ACTAAAACCA ACCATGCTTC TGGCATGATA AAATCATGGA ATTAAATCAG 
1651 GGGTTTACAT TCTTGTAGAG TGTTCTTGAA ACACTCTCTG CACCATTTTT 
1701 AAAACTTGAG AATAGTTTTA GTATCTCTGA TATTTTTTGC CAGAATCATC 
1751 ATGTCATGTA TGAATGTGTT ATCCCTATCT AAGGAAAAAG GTGAATATGT 
1801 TTTTGTATGA ATGTTTAACT GGAAATGTCC ATGGACTTGG CTAATTTATA 
1851 TTTACTTTTT ATTGTACATA GATTTCTAAT ATTTTTCATT CCTGTATCAT 
1901 TTAAACTTCC TTCATTTGAG TAT^TTCACT AAATATTTCT ATTTTTTTGC 
1951 TTTTTTAAAT TCTGATTTTA TATGAATTCT AATTCTTTTT CACTACATAT 
2001 GTTTTAAAGA GTTACATACA GTGATTTAGA ATGGTTTACA GTTAATGCTG 
2051 ATCTTGTATT TTAAATTCCA ACACTTTGTG TCACTACCTC CTCTAATGGT 
2101 TAGTATGATA TGCTAGCAGA CTGTATGAGG TCTTTTTTTA AAATACCACT 
2151 TTTAGTGTCA GTGAACCAAA TTCTGGAATG TCTTAACAGC TCTAAATCTT 
2201 ACTTGTCTTG AAAATGATTG GGGTTTAATA CCACTGCTGG TGGTTCACAC 
2251 ATCATCCCAT CCTTAATATG CCTGACAGGC ATCTGAGCAA AGGTTTTTAG 
2301 TAATTGAATT TCTCTGCAGT AGTCCTTCAA GCACTTGAAT GTAAACCTTT 
2351 AGCATTTATT CGTTTAATGA CTACTGATAC GAATCTCAAG CAGATTTCTT 
2401 GCTCTTAAAA GTTATGTTTC ACTGAGTTCT GGTTTTGTGT AGCTATATTT 
2451 TATATAGCTA GATATTCCTC ACAGTGAACA TGAATTGTAA TAATTGGTTA 
2501 TTTCCTTAAG TCTTTAGATT ATAATAATTT CAGATTATTG CACGTCTGTG 
2551 ATTTGAGAGG TGAGTTATTT AAGAGGCCAG TTTTCAGGAC ATGGGAATTT 
2601 GAATTGTAAA CCTGTTATCT CTGTGAAACT TTTAACATGA TAAAATATAA 
2651 CCTTTCTTTG TGCTTAAAAA AAAAAA 
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BLAST Results 


Entry HS541354 from database EMBL: 
human STS WI-11840. 
Score = 1267, P « 7.ie-50, identities = 271/281 


Medline entries 


98227670: 

Katanin, a ndcrotubule-severing protein, is a novel AAA ATPase 
that targets to the centrosome using a WD40'Contalning subunit. 


Peptide information for frame 3 


ORF from 87 bp to 998 bp; peptide length: 304 
Category: similarity to known protein 
Classification: unclassified 


1 MASETHNVKK RNFCNKIEDH FIDLPRKKIS NFTNKNMKEV KKSPKQLAAY 
51 INRTVGQTVK SPDKLRKVIY RRKKVHHPFP NPCYRKKQSP GSGGCDMANK 
101 ENELACAGHL PEKLHHDSRT YLVNSSDSGS SQTESPSSKY SGFFSEVSQD 
151 HETMAQVLFS RNMRLNVALT FWRKRSISEL VAYLLRIEDL GWVDCLPVL 
201 TNCLQEEKQY ISLGCCVDLL PLVKSLLKSK FEEYVIVGLN WLQAVIKRWW 
251 SELSSKTEII NDGNIQILKQ QLSGLWEQEN HLTLVPGYTG NIAKDVDAYL 
301 LQLH 

BLASTP hits 

NO BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_9k22, frame 3 

trembl:af056021_1 product: "pSO katanin"; Xenopus laevis p80 katanin 

mRNA, partial cds., N => 1, Score « 146, P « 1.2e-07 

TREMBL:AF052432_l product: "katanin p80 subunit"; H<mo sapiens katanin 

p80 subunit mRNA, complete cds,, N = 1, Score = 150, P = 1.2e-07 

TREMBL:AF052433^1 product: "katanin p80 subunit"; Strongylocentrotus 
purpuratus^katanin p80 subunit mRNA, complete cds., N » 2, Score = 146, 


>TREMBL:AF052432_l product: "katanin p80 subunit"; Homo sapiens katanin p80 
subunit mRNA, complete cds. 
Length = 655 


HSPs: 


Score - 150 (22.5 bits). Expect - 1.2e-07, P - 1.2e-07 
Identities - 35/105 (33%), Positives » 55/105 (52%) 


Query: 145 SEVSQDHETMAQVLFSRNMRLNVALTFWRKRSISELVAYLLRIEDLGVVVDCLPVLTNCL 204 

S++ + H+TM VL SR+ L+ W I V + I DL VWD L N + 
Sbjct: 489 SQIRKGHDTMCWLTSRHKNLDTVRAVWTMGDIKTSVDSAVAINDLSVWDLL NIV 544 

Query: 205 QEEKQYISLGCCVDLLPLVKSLLKSKFEEYVIVGLNWLQAVIKRW 24 9 

L C +LP ++ LL+SK+E YV G L+ +++R+ 
Sbjct: 545 NQKASLWKLDLCTTVLPQIEKLLQSKYESYVQTGCTSLKLILQRF 589 

Pedant information for DKFZphtes3_9k22, frame 3 


Report for DKF2phtes3_9k22 .3 


[LENGTH) 304 

[MW] 34767.24 

IpH 9.18 

IKW] All_Alpha 
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[KW] LOW_COMPLEXITY 3 . 95 % 

SEQ MASETHNVKKRNFCNKIEDHFIDLPRKKISNFTNKNMKEVKKSPKQLAAYINRTVGQTVK 

SEG 

PRD ccccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhccccc 

SEQ SPDKLRKVIYRRKKVHHPFPNPCYRKKQSPGSGGCDMANKENELACAGHLPEKLHHDSRT 

SEG 

PRD ccchhhhhhhhhhhcccccccccccccccccccccccccchhhhhhccccccccccccce 

SEQ YLVNSSDSGSSQTESPSSKYSGFFSEVSQDHETMAQVLFSRNMRLNVALTFWRKRSISEL 

SEG 

PRD eeecccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ VAYLLRIEDLGVVVDCLPVLTNCLQEEKQYISLGCCVDLLPLVKSLLKSKFEEYVIVGLN 

SEG xxxxxxxxxxxx 

PRD hhhhhhhhhcceeeeeeccchhhhhhhhceeeccceeeehhhhhhhhhhhheeeeeeehh 

SEQ WLQAVIKRWWSELSSKTEIINDGNIQILKQQLSGLWEQENHLTLVPGYTGNIAKDVDAYL 

SEG 

PRD hhhhhhhhhhhhcccceeeeccccccccccccchhhhhhhhhhccccccccchhhhhhhh 

SEQ LQLH 

SEG 

PRD hccc 

(No Prosite data available for DKFZphtes3_9k22 . 3) 
(No Pfam data available for DKFZphtes3_9k22 . 3) 
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Localization 

None 

None 

Mitochondria 

Endoplasmic 
Reticulum 

Nucleus 

Cytosol + 
Nucleus 

Endoplasmic 
Reticulum 

Cytosol + 
Nucleus 

Nucleus 

Localization 
Predicted 

"seer 
pathway" 

"no 

predict" 

"mitochondri 
a" 

"no 

predict" 

"no 

predict" 

"nucleus" 

"mitochondri 
a" 

"no 

predict" 

"nucleus / 

nuclear 

envelope" 

ChromLocation 
STS 

512.1 CR from 
top of ChrlO 
linkage 
group 


\o 


238.7 cR from 
top of Chr20 
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group 
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VO 
rH 

m 

rH 
\D 

o 



Similarity 

similar to: kinesin 
like proteins 

similar to: Drosophila 
chromatin protein 

similar to: 
acyltraneferase 

unknown 

similar to: MG21 
contains three 
conserved protein 
motifs present in GTP- 
binding proteins, but 
these are not conserved 
in 2_2a3.1 

similar to: origin 
recognition complex 

similar to: protein 
involved in energy 
metabolism 

Unknown, contains 2 WD- 
4 0 repeats, which are 
typical for the beta- 
transducin subunit of 
G-proteins 

similar to: RNA 
helicase 

ProteinGroup 

transport and 
traffic 

differentiatio 
n & 

development 

signaling & 
communication 
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1 

differentiatio 
n & 
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cell cycle 

metabolism 
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communication 

nucleic acid 
management 
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Localization 

Nucleus 

Endoplasmic 
Reticulum ' 

Cytosol 

Golgi 

Nuclear 
envelope 

Nucleus 

Cytosol 

Cytosol 

Cytosol 4 
Nucleus 

Cytosol + 
JJucleus 

Localization 
Predicted 

"cytoslceleto 
n / plasma 
membrane" 

"no 

predict" 

"no 

predict" 

"no 

predict" j 

"nucleus / 
nuclear 
envelope " 

"no 

predict" 

"no 

predict" 

"no 

predict" 

"no 

predict" 

"membranes" j 

ChromLocation 
STS 


86 . 2 cR from 
top of Chrl 
linkage 
group 

OI OI OJ 

J 1 1 

< U 

1 1 1 

in VO o) 
T in V 
CD 


87.50 cR from 
top of Chris 
linkage 
group 




m 

fH 

cr 

VO 

200,5 cR from 
:op of Chr3 
Linkage 
J roup 

Similarity 

shares the features of 
mayven and )<elch and 
therefore should be 
involved in the 
organisation of cyto 
skeleton binding to 
membrane proteins 

unknown 

unknown 

unknown 

similar to: DEAD-box 
helicase 

similar to: neuronal 
calcium sensor 

similar to: GTP-binding 
protein 

unknown 

unknown 

similar to: calmodulin- 1 
related protein 

Pro teinG roup 

structure & 
motility 

membrane 
protein 

unknown 

unknown 

nucleic acid 
management 

signaling & 
communication 

signaling & 
communication 

unknown 

anknown i 

signaling & i 
:ommunication : 
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VO 
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Localization 

Endoplasmic 
Reticulum 

Nucleus 

3 
\ 

V4 

OJ 

JC 

o 

Cytosol 

Nucleus 

Endoplasmic 
Reticulum 

other /unknown 

Cytoskeleton 

(microtubules 

) 

Peroxisomes 

Endoplasmic 
Reticulum 

Localization 
Predicted 

"no 

predict" 

"nucleus" 

"seer 
pathway / 
endosomes" 

"membranes'* 

"no 

predict" 

"no 

predict" 

"nucleus" 

"no 

predict" 

"peroxisomes 

"no 

predict" 

ChromLocation 
STS 

171.7 cR from 
top of Chrl4 
linkage 
group 

m 
n 

CN 
CN 

pH 

CN 

tv 

VD 

tr 

CM 


CM 

H 

in 

(N 

cr 

in 

rH 

311.4 cR from 
top of Chrl4 
linkage 
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O o 
Vl CN 
4H Vl 

« O 
U Q> 
NH (7) 

OO o m a* 

O 0 -H 

CN JJ f-H tn 


Similarity 

unknovm 

similar to: protein 
involved in cell cycle, 
DNA repair, maintenance 
of minichromosomes 

unknown 

similar to: sorting 
nexin 7 

unknown 

Similar to: DnaJ 
proteins, but lacks CRR 
domain of these 
proteins . 

unknown 

unknown 

unknown 

unknown 

ProteinGroup 

membrane 
protein 
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signaling & 
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dif ferentiatio 
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Localization 

Cytosol + 
Nucleus 

Cytosol + 
Nucleus 

T3 
C 
0 

x; 
u 

0 

jj 

•H 

E 

Nucleus 

Cytosol + 
Nucleus 

iPlasma 
membrane + 
cell contact 
sites 

Cytosol 

Golgi + 

Plasma 

membrane 

Endoplasmic 
Reticulum 

Golgi + 
plasma 

Localization 
Predicted 

"no 

predict" 

"no 

predict" 

"no 

predict" 

"no 

predict" 

"no 

predict" 

"plasma 
membrane " 

"cytosol or 
nucleus " 

"no 

predict" 

"no 

predict" 

"seer , 
pathway" 

ChromLocation 
STS 
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Prosite Key 

NAME: N-glycosylation site. 
CONSENSUS: N-{P}-(ST|.{P) . 

NAME: Giycosaminoglycan attachment site. 
CONSENSUS: S-G-x-G. 

NAME: Tyrosine sulfation site. 

NAME: cAMP- and cGMP-dependem protein kinase phosphorylation site. 
CONSENSUS: (RKl(2)-x-[ST] . 

NAME: Protein kinase C phosphorylation site. 
CONSENSUS: lST}-x-|RK]. 

NAME: Casein kinase 11 phosphorylation site. 
CONSENSUS: [STl-x(2)-(DE). 

NAME: Tyrosine kinase phosphorylation site. 
CONSENSUS: (RKl-x(2,3HDE]-x(2.3)-Y. 

NAME: N-myristoyladon sitc. 

CONSENSUS: G-{EDRKHPFYW}-a(2HSTAGCN]-{P}. 

NAME: Amidation site. 
CONSENSUS: x-G-[RK]-[RK]. 

NAME: Aspartic acid and asparagine bydroxylation site. 
CONSENSUS: C-x-[DNl-x(4)-[FYl-x-C-x^. 

NAME: Vitamin K-dependem carboxylation domain. 

CONSENSUS: x(12)-E-x(3)-E-x-C-x(6)-[DEN]-x-[LIVMFYl-x(9)-EFYWl. 
NAME : Phosphopantetfae ine atiachmem site . 

CONSENSUS: [DEQGSTALMKRH]-[LIVMFYSTAC]-(GNQ]-{LIVMFYAG)-tDNEKHSl-S-[UVMSTI- 
CONSENSUS: {PCFY}-(STAGCPQUVMFJ-(LIVMATN].[DENQGTAKRHLM3-[LIVMWSTAMUVGSTACR]' 
CONSENSUS: x(2HLIVMFAJ. 

NAME: Acyl carrier protein phosphopaiuetheine domain profile. 

NAME: Prokaryotic membrane lipoprotein lipid attachment site. 

CONSENSUS: {DERK}{6)-[LIVMFWSTAG1(2)-[L!VMFYSTAGCQ1-1AGS1-C. 

NAME: Prokaryotic N-ierminal methylation site. 

CONSENSUS: [KRHEQSTAG]-G-[FYLIVMMST]-(LT].[LIVP]-E-[LIVMFWSTAGJ(14). 

NAME: Prenyl group binding site (CAAX box). 
CONSENSUS: C-{DENQ}-[LIVMl-x>. 

NAME: Protein splicing signature. 

CONSENSUS: [DNEG)-x-(UVFA]-[LIVMY]-[LVASn-H-N-[STC]. 

NAME: Endoplasmic reticulum targeting sequence. 
CONSENSUS: [KRHQSA1-{DENQ]-E-L>. 

NAME: Microbodies C>iermiiial targetii^ signal. 
CONSENSUS: (STAGCN1-[RKH]-(UVMAFY) > . 

NAME: Gram-posidve cocci surface proteins 'anchoriAg* hexapeptide. 
CONSENSUS: L-P-x-T-G-[STGAVDE]. 

NAME: Bipartite nuclear taigeting sequence. 

NAME: Cell attachment sequence. 
CONSENSUS: R-G-D. 

NAME: ATP/GTP-binding site motif A (P-loop). 
CONSENSUS: tAGJ-x(4).G-K-(Sn. 

NAME: Cyclic nucleotkle-bindii^ domam signature 1 . 

CONSENSUS: [UVM]-[Viq-x(2)-G-(DENQTA)-x.[GACJ-x(2)-[LIVMFY](4)-x(2)-G. 
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NAME: Cyclic nucleotide-binding domain signature 2. 

CONSENSUS: [LrVMn-G-E.x-[GAS]-(LIVM]-x(5.11)-R-[STAQ]-A-x.lLIVMA)-x-(STACVJ. 
NAME: cAMP/cGMP binding motif. 
NAME: EF-hand calcium-binding domain 

NAME: Actinin-type actin-binding domain signanire I 
CONSENSUS: lEQl-x(2HATV).[FY]-x(2)-W.x-N. 

NAME: Aconin-type actin-binding domain signanire 2. 

rm^PM^u^*' fL|VJ2-x-[SGN]-(LIVM]-[DAGHE]-[SAG]-x.[DNEAGJ-IUVM].x-[D^^^ 
CONSENSUS: [UVM]-x.[LM)-(SAG]-[LIVM]-[LIVMT|-W-x-IUVMJ(2). 

NAME: Anaphylatoxin domain signature. 

CONSENSUS- |^|5^1■^^■^g•^AP^*<7•«HGA!^^>EQR]<:^OASTDEQL].^^^ 

NAME: Anaphylatoxin domain profile. 


NAME: Apple domain. 

CONSENSUS: C-x(3)-[LIVMFY)-x(5)-[LIVMFYl-x(3)-[DENQ]-[UVMFY]-x(J0)-C-x(3>-C-T- 
x(4)<:-x-fLIVMFY]F-x-tFY]-x(13,14)-C-x4LIVMFY)-[lUCJ.x-[ST|-x(14 15)- 
S-G-x-[Sn-[UVMFy]-x(2)-C. 


CONSENSUS 
CONSENSUS 


NAME: Band 4. 1 family domain signature 1 

rnl^^cMCMo^ ^:fLlVj-x(3)-[KRQ]-x-[LIVM]-x(2).[QH]-x(0,2HUVMF]-x^ 
CONSENSUS: x(3.5)-F-(FY].x(2)-[DENS], 

NAME: Band 4. 1 family domain signature 2. 

f"^-^(9)-IDENQSTvi-[SA3.x(3).[FY].[LIVMl-x(2HACVl-x(2HU^-x(2 
CONSENSUS: tFY]-G-x-[DENQST]-(UVMFYS]. 

NAME: Band 4. 1 family domain profile. 

NAME: Clq domain signature. 

CONSENSUS: F-x(5)-tND)-x(4HFYWLl-x(6)-F-x(5)-G-x-Y-x-F-x-(FY3. 

NAME: C-terminal cystine knot signamre. 

CONSENSUS: C-C.x(I3)-C-x(2).[GN]-x(12H:-x^>x(2.4H:. 

NAME: C-tenninal cystine knot profile. 

NAME: CUB domain profile. 

NAME: E>cath domain profile. 

NAME: EGF-likc domain signanire 1. 
CONSENSUS: C-x-C-x(5)-G-x(2)-C, 

NAME: EGF-like domain signature 2. 
CONSENSUS: C.x-C-x(2)-[GPl-[FYW)-x(4.8)-C. 

NAME: Calcium-binding EGF-like domain pattern signature 

CONSENSUS: n>EQNl-x-IDEQNK2)-C-x(3.14H:.x(3,7)-C.x-(DNJ-x(4HFY].x-C. 

NAME: Laminin-typc EGF-like (LE) domain signature 

CONSENSUS: C-x(1.2K:-x(5H5-x(2)-C-x(2)-C.x(M)-[FYW]-x(3.15).C. 

NAME: Coagulation factors 5/8 type C domain (FA58C) signature 1 

con™; ^S^-^'>^-^--^^^HOsium^ 

NAME: Coagulation factors 5/8 type C domain (FA58C> signature 2 
CONSENSUS: P.x(8. 10)-(LM].R-x-(GEl-[UVP]-x-G-C. 

NAME: Forkhead-associated (FHA) domain profile. 

NAME: Fibrinogen beta and gamma chains C-terminal domain signature 
CONSENSUS: W-W-[LIVMFYW]-xaK:-xaHGSAl-x{2)-N<}. 

NAME: Type I fibronectin domain. 


1025 


wo 01/12659 PCT/IBOO/01496 
CONSENSUS: C-x(6,8HLFY]-x(5)-[FYWJ-x-IRKJ-x(8,I0)-C-a-C-x<6,9K:. 
NAME: Type U fibroneccin collagen-binding domain. 

CONSENSUS: C-x(2)-P-F-x-[FYWIJ-xC7K-x(8.10)-W-C.x(4)-(DNSR)-(FYW]-x(3.5)-[FyW].x- 
CONSENSUS: [FYWll-C. 

NAME: Hcmopexin domain signature. 

CONSENSUS: [LIFATl>x(3)-W.x(2,3)-(PE3-x(2)-(LIVMFY]-[DENQS)-[STA]-[AV].(LIVMFY). 

NAME: Kringle domain signature. 
CONSENSUS: [FYl-C-R-N-P-fDNR]. 

NAME: Kringle domain profile. 

NAME: LDL-receptor class A (LDLRA) domain signanire. 

CONSENSUS: C-EVILMA}-x(5)-C-[DNH]-x(3)-[ DENQHT]-C-x(3.4)-[STADE]-[DEH]-[DE]-x( 1 ,5)- 
CONSENSUS: C. 

NAME : LDL-receptor class A (LDLRA) domain profile . 
NAME: C-typc lectin domain signature. 

CONSENSUS: C-(LIVMFYATGI-x(5,12)-(WLl-x-[DNSRl-x(2)-C-x(5,6>.[FYWLIVSTA]-[UVMSTA)- 

CONSENSUS: C. 

NAME: C-type lectin domain profile. 
NAME: Link domain signature. 

CONSENSUS: C-x(l5)«A-x(3,4)-G-x(3)-C-x(2).G-x(8.9)-P-x(7>C. 
NAME: Osteonectin domain signature I. 

CONSENSUS: C-x-[DN]x(2)-C-x(2)-G-(KRH]-x-C-x(6,7)-P-x-C-x-C-x(3,5)-C-P. 

NAME: Osteonectin domain signamre 2. 
CONSENSUS: F-P-x-R-IIMl-x-D-W-L-x-[NQl. 

NAME: Somatomedin B domain signature. 

CONSENSUS: C-x-C-x(3)-C-x(5)-C-C-x-[DN]-[FY)-x(3)-C. 

NAME: Thyroglobulin type-1 repeat signature. 

CONSENSUS: [FYWHP]-x-P-x-C-x(3,4)-G-x-fFYW]-x(3)-Q-C-x(4.l0)-C-{FYWl-C-V-x(3,4)- 
CONSENSUS: [SG]. 

NAME: P-iype 'Trefoil' domain signature. 

CONSENSUS: R-x(2)-C-x-lFYPSTJ-x(3,4)-ISTl-x(3)<:-x(4)-C-C-[FyWHl. 
NAME: Cellulose-binding domain, bacterial type. 

CONSENSUS: W-N-CSTAGRl-{STDN)-(LIVM]-x(2HGSn-x-[GST)-x(2)-ILIVMFTl-CGA]. 
NAME: Cellulose-binding domain, fungal type. 

CONSENSUS: C-G^-x(4.7)-G-x(3).C-x(5K-x(3,5)-[NHG].x.[FYWMl-x(2HK:. 

NAME: Chitin recognition or binding domain signature. 
CONSENSUS: C-x(4.5>C-C-S-x(2)-G-x-C-G-x(4)-[FYW)-C. 

NAME: Barwin domain signature 1. 
CONSENSUS: C-G-[KR]-C-L-x-V-x-N. 

NAME: Barwin domain signamre 2. 
CONSENSUS: V-[DN1-Y-(EQ]-F-V-[DN]-C. 

NAME: BIR repeat. 

CONSENSUS: (HKEPILVYl-x(2)-R-x(3»7)-[FYWl-x(l 1 . 14HSTAN1-G-[LMF|-X-[FYHDA1-X(4)- 
CONSENSUS: tDESL]-X(2,3)-C-X(2)-C-X(6HWA3-X(9)-H-X(4)-[PRSD]-X<:-X(2)-tL^^ 

NAME: WAP-typc 'four-disulfide core' domain signamre. 
CONSENSUS: C-x-{C}.[DN]-x(2)-C-x(5)-C-C. 

NAME: Phorfool esters / diacylglycerol binding domain. 

CONSENSUS: H-x.[LIVMFYW]-x(8JIK-A(2K-x(3)-[LIVMFa-x(5J0K-x(2)-C-x(4)-W 
CONSENSUS: x(2)-C-x(5.9)-C. 

NAME: C2 domain signature. 

CONSENSUS: [ACG]-x(2).L-x(2.3).D-x(1.2)-[NGSTUF]-[GTMR]-x-(STAP]-D.lPAWFY). 
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NAME: C2-domain profile. 
NAME: CAP-Gly domain signature. 

rnMcn'^^^^: ^-'^^^•^0>-fPYW]-x-G-(LIVM]-x-[LIVMFY].x(4).G-K-[NH]-x-G-[STAR]-x(2)-G- 
CUNScNSUS: x(2)-[LYJ-F. 

NAME: Ly-6 / u-PAR domain signature. 

CONSENSUS: [EQR]-C-[LIVMFYAH).x-C-x(5.8)-C-x(3.8)-[EDNQSTVJ-C-{C}-x(5)-C- 
CONSENSUS: x(12,24)-C. * i v / 

NAME: MAM domain signature. 

CONSENSUS: G-x-[LIVMFY](2Vx(3)-(STA].x(10,ll)-[LV]-x(4)-[LIVMF]-x(6.7)-C.[LIVM]-x. 
CONSENSUS: F-x-lLIVMFYl-xOHGSC]. J v . ; li-i mj x 

NAME: MAM domain profile. 

NAME: PH domain profile. 

NAME: Phosphotyrosine interaction domain (PID) profile. 
NAME: Src homology 2 (SH2) domain profile. 
NAME: Src homology 3 (SH3) domain profile. 
NAME: VWFC domain signanins. 

CONSENSUS: C-x(2.3)-C.x-C-x(6, 14)^-x(3,4)-C-x(2. IOK-x(9. !6)-C^-x(2.4)-C. 
NAME: WW/rsp5/WWP domain signature. 

CONSENSUS: W.x(9,n)-[VFY)-{FYWJ-x(6.7)-tGSTOEJ-[GSTQCR)-[FYW]-x(2)^^ 
NAME: WW/rsp5/WWP domain profile. 
NAME: ZP domain signature. 

NAME: S-layer homology domain signature. 

™™cH!' P-VFYD.x-[DA]-x(2.5HDNGSATPHYJ-{WYFPDAJ-x(4)-[UV]-x(2)-[GTALV]- 

«W.6)-[UVFYC]-x(2)-O-x-[PGSTAJ-x(2.3)-0^A].x.(PGAV)-x(3./0HUVMA]^ 
CONSENSUS: [STICRl-(RY]-x-(EQJ-x4STALIVM). i v . "M" m^j 

NAME: 'Homeobox' domain signature. 

™™c.^f IUVMFYG]-(ASLVRJ.xa).[IJVMSTACN]-x-[LIVM]-x(4HUV]-(RKNQESTAIY]^ 
CONSENSUS: a-IVFSTNKH]-W-(i^C]-x-(ND(?rAH]-x(S)-lRKNAlMWl. 

NAME: 'Homeobox' domain profile. 

NAME: 'Homeobox' antennapedia-type protein signanire 
CONSENSUS: [tIVMFE]-[FYl-P-W-M-[KRQTA]. 

NAME: 'Homeobox' ei^railed-iype protein signature 
CONSENSUS: LM-A-Q-G-L-Y-N. 

NAME: 'Paired box' domain signature. 
CONSENSUS: R-P-C-x{ll)-C-V.S. 

NAME: 'POU' domain signature 1. 

CONSENSUS: [RKQ]-R-[UM].x.[LFl^-[LIVMFYl-x.Q.x-[DNQ]-V-G. 

NAME: 'POU' domain signature 2. 

CONSENSUS: S-Q-IST|-rrA)-MSC)-R-F-E.x-{LSQJ-x-CLIl.[ST|. 
NAME: Zinc finger, C2H2 type, domain. 

CONSENSUS: C-x(2,4H:-x(3)-ILIVMFYWC)-x(8)-H-x(3.5)-H. 

NAME: Zinc finger, C3HC4 type (RING finger), signamre. 
CONSENSUS: C-x-H-x-[LIVMFY3-C-x(2)-C-(LIVMYA]. 

NAME: Nuclear hoimones receptors DNA-binding region signature 
CONSENSUS: C-xt2H:-x-[DEJ.x(5)-[HNl.[FY]-x(4)-C.xa)-C-x(2>.F-F-x-R. 

NAME: GATA-type zinc finger domain. 

CONSENSUS: C.x-[DN]-C-x(4,5HST].x(2).W-[HR1-IRK1.x(3HGNI.x<3,4K:-N-[AS]-C. 
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NAME: Poly(ADP-ribose) potymeiase zinc finger domain signature. 
CONSENSUS: C-[KRl-x-C-x(3)-I-x-K-x(3)-[RGl-x(16.18)-W-(FYHJ-H-x(2)-C. 

NAME: Poly(ADP;ribose) polymerase zinc finger domain profile. 

NAME: Fungal Zn(2)-Cys(6) binuclear cluster domain signature. 

CONSENSUS: [GASTPVl<:-x(2)<:.[RKHSTACW]-x(2)-[RKHQ]-x(2)-C.x(5,12)-C-x(2K-x(6,8)- 
CONSENSUS: C. 

NAME: Fungal Zn(2)-Cys(6) binuclear cluster domain profile. 

NAME: Prokaiyotic dksA/traR C4-type zinc finger. 
CONSENSUS: C.[DES]-x-C-x(3)-I-a(3).R-x(4)-P-x(4H:-x(2K. 

NAME: Coppcr-fist domain signature. 

CONSENSUS: M-(LIVMF](3)-x(3)-K-[MYl-A<:-x(2K:4-(KRhx-H4KR]-x(3K'X-H-x(8)- 
CONSENSUS: (KR]-x-(KRl-G-R-P. 

NAME: Copper fist DNA binding domain profile. 

NAME: Lftjcinc zipper pattern. 
CONSENSUS: L-x(6)-L-x(6)-L-x(6)-L. 

NAME: bZIP transcription factors basic domain signature. 

CONSENSUS: [KR]-x(U3)-CRKSAQ]-N-x(2HSAQ](2>-x-[RKTAENQKx-R-x-[RK]. 

NAME: Myb DNA-binding domain repeat signature 1. 
CONSENSUS: W-[STl-x(2)-E-{DE]-x(2)-[LIV]. 

NAME: Myb DNA-binding domain repeat signature 2. 

CONSENSUS: W-x(2)-IU].[SAG]-x(4,5)-R-x(8)-fYWl-x(3HLIVMl. 

NAME: Myc-type, 'helix-loop-helix* dimerization domain signature. 

CONSENSUS: (DENSTAP)-K-arVMWAGSN}-{FYWCPHiai}-(IJVTI-ILIV]-x(2HSTAV14UVMSTAQ 
CONSENSUS: [VMFYH)-[LIVMTAJ-{P}.{P}-[UVMSR1. 

NAME: p33 mmor antigen signature. 
CONSENSUS: M-C-N-S-S-C-M-G-G-M-N-R-R. 

NAME: CBF-A/NI^-YB subunit signature. 

CONSENSUS: C-V-S-E-x-l-S-F'[LIVM>T-[SG]-E-A-[SC]-[DE]-[KRQ]-C. 
NAME: CBF-B/NF-YA subunit signature. 

CONSENSUS: Y-V-N-A-K-Q-Y-x-R-I-L-K-R-R-x-A-R-A-K.L.E. 
NAME: 'Cold-shock* DNA-binding domain signaniie. 

CONSENSUS: IFYl-G-F-I-x(6J)-[DERHLIVM]-F-x-H-x-[STKRJ-x-(UVMFY]. 

NAME: CTF/NF-I signature. 

CONSENSUS: R-K-R-K-Y-F-K-K-H-E-K-R. 

NAME: Ets-domain signature 1. 

CONSENSUS: L-[FYW3-[QEDH]-F-[LIl.[LVQK)-x-[Ll]-L. 
NAME: Ets-domain signature 2. 

CONSENSUS: [RKH3-x(2)-M-x-Y-[DENQl-x-[UVM)-{STAGI-R-{STAGMLI]-R-x-Y. 

NAME: Ecs-domain profile. 

NAME: F6rk head domain signature 1. 

CONSENSUS: [KR]-P-(PT<M-[FYLVQH)-S-(FY)-x(2HLIVMl-x(3.4)-[Aq-[LIMl. 

NAME: Fork head domain signature 2. 
CONSENSUS: W-[QKR]-[NS)-S-rUV).R-H. 

NAME: Fork head domain profile. 

NAME: HSF-typc DNA-binding domain signature. 

CONSENSUS: L.x(3)-[FY]-K-H-x-N.x-[STAN]-S-F-[LIVM)-R-Q-L-[NH)-x-Y.x-[FYW]-[Riaq-K. 
CONSENSUS: [UVM]. 

NAME: Tryptophan pentad repeat (IRF family) signanire. 

CONSENSUS: W-x-pNH]-x(5)-(lJVnx-[rV]-P-W-x-H-x(9,10HDE]-x<2)-{LIVF]-F-[KRQ]-x- 


1028 


wo 01/12659 PCT/IBOO/01496 
CONSENSUS: fWRJ-A. 
NAME: LIM domain signature. 

CONSENSUS: C-x(2)-C-x( 15.2! )-IFYWHJ-H.x(2).[CHl-x(2)-C-x(2)-C-x(3)-[LIVMF|. 
NAME: LIM domain profile. 

NAME: NF-kappa-B/ReJ/dorsal domain signature. 
CONSENSUS: F-R-Y-x-C-E-G. 

NAME: MADS-box domain signature. 

SSJ1!™-H5- ' [«K]-x(5> -^x-(D^^-»(3) IKRJ-x(2)-T-[FVl-x-[RK](3)-x(2)-n-IVMJ-^ 

^m!^^^*' KC)-A.,.e.o,IVM1^ST1-x-Ux(4HUVM].x.ILIVM](3Vx(6).[LIVMF1.x(2)- 

CONSENSUS^ [FY]- 

NAME: MADS-box domain profile. 
NAME: T-box domain signature 1. 

CONSENSUS: L-W-x(2)-[FC].x(3.4)-tNT].E.M-(LIV)(2)-T.x(2H5-[RG]-[KRQJ. 
NAME: T-box domain signature 2. 

CONSENSUS: [LIVMYW1-H.(PADHHDENJ.IGSJ-x(3)-G-x(2)-W-M.x(3)-[IVA]-a.F. 
NAME: TEA domain signature. 

CONSENSUS- J5-N■^■*--^-*(2^Y-^x(3)-[TCl.x(3)-R-T.(R^^ 

NAME: Transcription factor TFIIB repeat signature. 
CONSH^SUS- ^l!^^-^^^^^^ 

NAME: Transcription factor TFIID repeat signature. 

CONSENSUS: Y-x-P-x(2)-fIF].x(2)-[LIVMl(2)-x.[KIa^^x(3)-P-(RK<a-x(3).L.[LIVMJ-F-x^ 
CONSENSUS: [STN]-G.[KR]-(LIVM]-x(3H5-[TAGL]-[ICR)-x(7).lAGCJ-x(7)-[UVMJ. 

NAME: TFIIS zinc ribbon domain signamre. 

CONSENSUS: C.x(2)-C-x(9>-[LIVMQSAR].[QHJ-[STQL].(RA].ISACR].x.IDE]-{DETHIH3SEA> 
CONSENSUS: x(6)-C.x(2.5K:-x(3)-(FW]. i Ji n j 

NAME: TSC-22 / dip / bun family signature. 
CONSENSUS: M-D.L-V-K-x-H-L-x(2)-A-V-R-E-E-V-E. 

NAME: Prokaiyotic transcription elongation factors signature 1. 

CONSENSUS: CSThx(2HGS]-x(3)-fLI].x(2).E-L-x(2)-L-x(3.4)-R-x(2).lIV].x(3)>[UV]- 
CONSENSUS: x(6>.G-D.x(2)-E.N-CGSA)-x.Y. ^ r vri j i Movj 

NAME: Prokaryotic transcripdon elongadon factors signature 2. 

CONSENSUS: S-x(2)-S-P-[LIVM)-[AG]-x-ISAG]-rUVMl-[LIVMY]-x(4)-[IX}1.0>E^ 

NAME: DEAD-box subfamily ATP-dependent helicases signature. 
CONSENSUS: (LIVMF)(2)-D-E-A-D-[RKENl-x-[UVMFYGSTNl. 

NAME: DEAH-box subfamily ATP-dcpendeni helicases signature. 
CONSENSUS: (GSAHl-x-[LIVMFJ<3)-D-E-[AUVI-H-[NECR]. 

NAME: Eukaiyotic putative RNA-binding region RNP-1 signature. 
CONSENSUS: tRKl-G-{EDRKHPCG}-lAGSCI]-(FY]-{LIVAJ-x-[FYLMJ. 

NAME: Fibriilarin signature. 

CONSENSUS: {GSn.[LIVMAP]-V.Y-A-[IV)-E^[FY)-fSAJ-x-R-x<2)-R-[DEl. 
NAME: MCM family signature. 

CONSENSUS: CH!VT]-(LVAC](2)-[IVT]-D.(DEHFL]-[DNSn. 
NAME: MCM family doinain. 
NAME: XPA protein signamre 1. 

CONSENSUS: C-x-tDE]-C-x(3)-[LIVMF].x(l,2).D-x(2)-L-x(3)-F.x(4)-C-x(2)-C. 
NAME: XPA protein signature 2. 

CONSENSUS: [UVM](2).T-{KR]-T-E.x-K-x.pE]-Y.[LIVMF)(2H-D-x.p^ 
NAME: XPG protein signature 1 . 

CONSENSUS: IVI)-[KREl-P.x-[FYIL]-V-F-D-G-x{2MPIL]-x-[LVC)-K. 
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NAME; XPG protein signature 2. 

CONSENSUS: (GSJ-(LIVM]-(PER].[FYS]-[LIVM].x.A-P-x.E-A-[DE]-[PASHQS]-[CLM]. 
NAME: Bacterial regulatory proteins, araC family signature. 

CONSENSUS: [KRQ]-[LIVMA]-x(2)-[GSTALIVj.{FYWPGDN}-x(2)-[LIVMSA]-x(4.9)-[LIVMF)- 
CONSENSUS: x(2)-[LIVMSTA]-[GSTACILJ-x(3)-IGANQRF]-(LIVMFYJ-x(4.5)-[LFY]-x(3)- 
CONSENSUS: lFYTVA)-{FYWHCM}.x(3HGSADENQKR]-x-[NSTAPKL]-IPARL]. 

NAME: Bacterial regulatory proteins. araC family DNA-bindii^ domain profile. 

NAME: Bacterial regulatory proteins. arsR family signature. 
CONSENSUS: C-x(2)-D-[UVM)-x(6)-[ST]-x(4)-S4HYR]-(HQI. 

NAME: Bacterial regulatory proteins. asnC family signature. 

CONSENSUS: lGSTAP]-x(2)-(DNEA]-[UVMJ-IGSAl-x(2)-[LIVMFY)-[GN]-[LIVMST)-(STJ-x(6)-R- 
CONSENSUS: [LVT1-x(2)-(UVM1-x(3Hj. 

NAME: Bacterial regulatoiy proteins, crp family signature. 

CONSENSUS: [LIVM]-tSTAG]-[RHNWl-x(2)-(UM]-[GA]-x-IUVMFYAJ.(UVSq-[GA]-x-(S^^ 
CONSENSUS: x(2)-[MSTJ-x-[GSTN]-R-x-[LIVMF]-x(2MLIVMFl. 

NAME: Bacterial regulatory proteins. deoR family signature. 

CONSENSUS: R-x(3)-[LIVM)-x(3)-[LIVM]-x(16,17)-[STA)-x(2)-T-[LIVMA]-[RHl-[KRNA]-D- 
CONSENSUS: [LIVMFl. 

NAME: Bacterial regulatory proteins. gntR family signature. 

CONSENSUS: MVAPKR]-(PILV3-x-[EQTIVMR]-x(2)-[LIVM)-x(3>-[LIVMFYK]-x-[LIVFr]- 
CONSENSUS: [DNGSTKJ IRGTLV]-x-[STAIVP)-[LrVAJ-x(2HSTAGV]-(UVMFYH]-x(2HLMAJ. 

NAME: Bacterial regulatory proteins. iclR fomily signature. 

CONSENSUS: EGA]-x(3)-[DSl-x(2)-E-x(6HCSAl-[LIVMJ-lGSA]-x(2)-[LIVMl-P^-[DN]. 
NAME: Bacterial regulatory proteins, lad family signature. 

CONSENSUS: [UVMl-x-(DE]-[LIVM)-A-x(2)-[STAGV].x.V-[GSTP]-x(2)-[STAG]-[LIVMA>x(2)- 
CONSENSUS: [LIVMFYAN]-[LI VMC] . 

NAME: Bacterial regulatory proteins, luxR family signature. 

CONSENSUS: [GDC^x(2)-[NSTAVY]-x(2)-^VHGSTAJ-x(2)-(L^VMFYWCTI-X^UVMITW 
CONSENSUS: [NST)-[LIVMJ-x(5)-[NRHSA]-fLIVMSTA]-x(2>-[KR}. 

NAME: Bacterial regulatory proteins. lysR family signature. 

CONSENSUS: [NQICRHSTAG]-[LIVMFVTA]-x{2HSTAGLV]-(STAGJ.x(4)-[UVMYCrQR]-[PSTANL 
CONSENSUS: x-n>STAGQVl-[PSTAGNVMF)-[LIVMFA]-[STAGH)-xaHlJVMin-x<2)-[UV^^ 
CONSENSUS: IRKEAVJ-x(2)-(LIVMFYNTAE)-x{3)-[LlMVTl. 

NAME: Bacterial regulatory proteins, marR family signattjrc. 

CONSENSUS: lSTNAl-rLIA]-x-[RNGSl.x(4^(lJVI]-[EIV|-x(2HGESJ-[IJYW]-CUVa-^ 
CONSENSUS: PN]-lRKQG]-[RK]-x(6)-T-x(2HGA]. 

NAME: Bacterial regulatory proteins, merR family signature. 

CONSENSUS: [GSAl-x-IUVMFA]-[ASMl-x(2HSTACLIVl-[GSDENQRl.[LIVCHSTANHK]-x(3)- 
CONSENSUS: [UVM)-[RHFl-x-[YW)-CDEQl.x(2.3)-[GHDNQ].(LIVMF](2). 

NAME: Bacterial regulatory proteins, ictR family signaoire. 

CONSENSUS: G-[UVMFYShx{2.3)-(TS]-[IJVMThx(2).(UVMl-x(5)-(UVQSl.^^ 
CONSENSUS: IGPARl-x-[UVMI1-[FYSTl-x-[HFYl-[FVl.x^DNST)-K-xa^ 

NAME: Transcriptional antiterminators bglG family signature. 
CONSENSUS: {ST]-x-H-x(2)-[FA](2)-[LIVM]-[EQK}-R-x(2HQNK]. 

NAME: Sigma-54 fiKtois family signature 1. 

CONSENSUS: P4UVMl-x-lUVMl-x(2)-[UVMJ-A-x(2)KUVMFl-x(2HHS]-x-^^^ 

NAME: Signia-S4 fectofs fomily signature 2. 
CONSENSUS: R-R-T-HVl-fAn-K-Y-R. 

NAME: Sigma-54 factors family profile. 

NAME: Signia-70 factors family signature I. 

CONSENSUS: PEl-CLrVMF)(2)-[HEQSJ-x-G-x.[UVMFAl-G-L-[LIVMFYEl-x-fGSAM].[UVN^ 
NAME: Sigma-70 factors family signature 2. 

CONSENSUS: lSTN].x(2).[DE<a-[UVMJ4GASl.x(4)-[LIVMF]-[PSTG]-x(3)-[LI^^ 
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CONSENSUS: ELrVMA]-[EQH]-x(3)-[LIVMFW]-x(2)-ILIVMJ. 

NAME: Sigma-70 factors ECF subfamily signature. 

CONSENSUS: [STAm-fPQDELJ-[DE]-|LIVl-(LIVTA]-Q-x.[STAV]-ILIVMI^C)-fLIVMA^ 
CONSENSUS: (COTAIVJ-[UMFVWQ]-x(12J4).(STAP]-[FYV^l-[LIF]-x(2HI^ 

NAME: Signia-54 intsraction domain ATP-binding region A signature 
CONSENSUS: (LIVMFY](3>.x-G-IDEQ]-lSTEJ-G-(STAV]-G-K-x(2HUVMFY]. 

NAME: Signia*54 interaction domain ATP-binding region B signanire. 

CONSENSUS: [GS]-x-[LIVMFJ-x(2)-A4DNEQASH]-[GNEK]-G-[SnMl-[LIVMFYl(3)-[DE]^ 
CONSENSUS: [LIVMJ. i i i ^ JK:>f It^cj 

NAME: 5igma-S4 interaction domain C*tenninal part signanire. 
CONSENSUS: [FyW]-P-[GSl-N-ILIVM)-R-[EQ]-L-x.[NHAT|. 

NAME: Sigma-54 interaction domain profile. 

NAME: Single-strand binding protein family signature 1. 

CONSENSUS: (UVMmNSTh(KRTI-{LIVM)-x4LIVMn(2)-G-(NHRK3-[^^ 

NAME: Single-strand binding protein family signature 2. 

CONSENSUS: T-x<W-[HYl-{RNS]-[LIVM]-x-[lJVMF]-CFY]-lNGKRJ. 

NAME: Bacterial histone-like DNA-binding proteins signature 

CONSENSUS: [GSK]-F-x(2VIUVMH-x(4)4RKEQA]-x(2HRSTJ-x.[GA].x.[KN).^^^^ 
NAME: Dps protein family signature 1. 

CONSENSUS: H-(FW3-x-CLIVM)-x^x(5)-[LV]-H-x(3HDEl. 
NAME: Dps protein family signature 2. 

CONSENSUS: IUVMFY)-[DHJ-x-[LIVM]-(GA]-E-R-x(3HUF)-lGDNl-x(2)-[PA]. 

NAME: DNA repair protein radC family signature. 
CONSENSUS: H-N-H-P-S-G. 

NAME: recA signature. 

CONSENSUS: A-L-[KR]-[IF]-[FY]-lSTA]-[SrADJ-ILIVMQ]-R- 
NAME: RecF protein signature 1. 

CONSENSUS: P-IED]-x(3)-|UVMJ(2>-x^[GSADl-P-x(2).R-R.x-[Fn-tUV^ 
NAME: RecF pcotein signature 2. 

CONSENSUS: [UVMFY](2)-x-D-x(2.3)-[SA]-[EH]-L-D-x(2HKRH]-x(3).L. 
NAME: RecR protein signature. 

CONSENSUS: C-x(2H:-x(3HSn-x(4>C-x-I-C.x(4).R. 

NAME: Histone H2A signamre. 
CONSENSUS: [Aq-G-L-x-F-P-V. 

NAME: Hisione H2B signamre. 

CONSENSUS: [KR]-E.[LIVM]-[E(a-T-x(2)-tKRhx.[LIVM]aVx-[PAG]-IDE]-L.x-rK^^ 
CONSENSUS: [UVM]-[STA1-E-G. 

NAME: Histone H3 signature 1. 
CONSENSUS: K-A-P-R-K-Q-L. 

NAME: Histone H3 signature 2. 

CONSENSUS: P-F-x.[RA]-^[VA].(KRQl*(DEG^[IV]. 

NAME: Histone H4 signatuie. 
CONSENSUS: G-A-K-R-H. 

NAME: HMGl/2 signature. 

CONSENSUS: (FlJ-S-[KR].K.C-S-tEK)-R-W-K-T-M. 

NAME: HMG-I and HMG>Y DNA-binding domain (A+T-hook). 
CONSENSUS: fATI.x(1.2HRiqC2HGPhR^.R.p.[Riq.x. 

NAME: HMG14 and HMG17 signatuie. 
CONSENSUS: R-R-S-A-R-L-S-A-ERK]-P. 

NAME: Bromodomain signature. 
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CONSENSUS: lSTANVFl-x(2)-F-x(4).(DNSl-x(5.7MDENQTFl-Y-[HFYl-x(2)-lUVMFyj-x(3)- 
CONSENSUS: [LIVMl-x(4HLIVM].x(6.8)-Y-x(12.13)-(LIVM}-x(2)-N-(SACF].x(2HFYl. 

NAME: Bromodoniain profile. 

NAME: Chromo domain signature. 
CONSENSUS: iFYLl-x-(UVMC]-[KRJ.W-x-lGDNRl-lFYWLE3-x(5.6HSTJ.W-IESJ-[PSTDNJ.x(3)- 
CONSENSUS: ILIVMCJ. 

NAME: Chromo and chromo shadow domain profile. 

NAME: Regulator of chromosome condensation (RCCl) signature 1 . 
CONSENSUS: G.x-N-D-x(2)-IAVl.L.G-R-x-T. 

NAME: Regulator of chromosome condensation (RCCl) signature 2. 

CONSENSUS: [LIVMFA]-[STAGCJ(2)-G-x(2>-H-|STAGLI]-[UVMFA]-x-[LIVM]. 

NAME: Protamine Pi signature. 

CONSENSUS: lAVJ-R-(NFY)-R-x(2,3)-[ST]-x-S-x-S. 

NAME: Nuclear transition protein I signature. 
CONSENSUS: S-K-R-K-Y-R-K. 

NAME: Nuclear transition protein 2 signature 1. 
CONSENSUS: H-x(3)-H-S-[NS]-S-x-P-Q-S. 

NAME: Nuclear transition protein 2 signature 2. 
CONSENSUS: K-x-R-K-x(2)-E-G-K-x(2)-K-rKR]-K. 

NAME: Ribosomal protein LI signature. 

CONSENSUS: |IM|-x(2HLIVA]-x(2.3)-fLIVMl-G-x(2)-lLMS)-[GSNHMPTKR]-[KRAV]-G-x- 
CONSENSUS: fLMFJ-P-lDENSTKl. 

NAME: Ribosomal protein L2 signature. 

CONSENSUS: P-x(2)-R-G-{STAIVl(2)-x-N-[APK).x.[DEJ. 

NAME: Ribosonnat protein L3 signature. 

CONSENSUS: [FLl-x{6)-IDNl-x{2)-[AGSJ-x-[ST|-x-G-(KRHl-G-x(2>-G-x(3)-R. 
NAME: Ribosomal protein L5 signature. 

CONSENSUS: [LIVM]-x(2)-rL!VM)-lSTACJ-IGE]-lQVJ-x(2)-[UVMA]-x.tSTC]-x-[STAGl-^^ 
CONSENSUS: x-(STA]. 

NAME: Ribosomal protein L6 signature I. 
CONSENSUS: fPSl-fDENSl-x-Y-K-lGA]-K-G-[UVM]. 

NAME: Ribosomal protein L6 signature 2. 

CONSENSUS: Q-x(3)-ILIVM].x(2)-[KR)-x(2)-R-x-F.x-D-G-ILIVMl-Y-[LIVM]-x(2)-[KR]. 

NAME: Ribosomal protein L9 signature. 

CONSENSUS: G-x(2)-(GN].x(4)-V-x(2).G-lFYJ-x(2).N-[FYl-L-x(5MGA]-x(3HSTN]. 
NAME: Ribosomal protein LIO signamre. 

CONSENSUS: [DEHl.x(2^[GSJ-[LIVMF)-(STNl-(VAJ-x-[DEQKl-[LIVMAl.x(2HLIM)-R. 
NAME: Ribosomal protein LI 1 signature. 

CONSENSUS: [RKNI-x-[LIVMl-x-G.[ST]-x(2)-[SNQl-lLIVM]<5-x(2).[UVM].^^^ 

NAME: Ribosomal protein LI 3 signamre. 

CONSENSUS: [LIVM][KRV]-[GK]-M-[LIV]-[PS]-x(4.5)-(GS)-[NQEKRA)-x(5)-[UVM]-^^ 
CONSENSUS: [LFYl-x-IGDN]. 

NAME: Ribosomal protein L14 signature. 

CONSENSUS: [GA]-[LIVJ(3)-x(9.10)-{DNS)-G-x(4)-IFY]-x(2)-[Nn-x(2)-V-[LIV]. 
NAME: Ribosomal protein L15 signamre. 

CONSENSUS: K-(LlVMJ(2HGAL]-x-lGT|.x.[LIVMA).x(2,5)-(UVM]-x.lLIVMF].x(3,4>. 
CONSENSUS: [LIVMFC]-[STJ-x(2).A-x(3).[UVM).x(3)-G. 

NAME: Ribosomal protein L16 signature 1. 

CONSENSUS: IKRJ-R x-[GSACl-lKQVAl-(UVM>W-(LIVMJ-[KRWLIVM]-[LFV]-(^^ 

NAME: Ribosomal protein LI6 signamre 2. 
CONSENSUS: R-M-G-x-lGRl-K-G-x(4HFWKRl. 
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NAME: Ribosomal protein L17 signature. 

CONSENSUS: I-x-lST]-CGT]-x(2).[KR3.x-K-x(6HDEJ-x-[LIM\1-[UVN^ 
NAME: Ribosomal protein L19 signature. 

CONSENSUS: [RT]-[KRSVY]-[GSA].x-V.[RS]-[KR].(SAl.K-L-Y-Y-L-R. 
NAME: Ribosomal protein L20 signature. 

CONSENSUS: K-x(3)-[KRC]-x-tUVM]-W-nV]-ESTNALV].R-ILIVM]-N-x(3)-[RlCH]. 
NAME: Ribosomal protein L21 signature. 

CONSENSUS: IIVTJ.x(3HKRJ-x(3)-[KRQ3-K-x(6)-G-[HF|-R-IRQ)-x(2)-T. 
NAME: Ribosomal protein L22 signature. 

CONSENSUS: [RKQN]-x(4)-[RH].[GASJ.x-G-(KRQS].x(9HHDN)-(LIVMl-x-(LIVMS]-x-[UVM]. 
NAME: Ribosomal protein L23 signature. 

CONSENSUS: fRKJ(2HAM]-IIVFYTI-tIV]-[RKT]-L.[STANQKl-x(7)-aJ^ 
NAME: Ribosomal protein L24 signature. 

CONSENSUS: (GDENJ-I>x.V-x-[IV).[LIVMA]-x-G-x(2)-[KA].[GN).x(2,3)-(GAl.x-(IV]. 

NAME: Ribosomal protein L27 signatuiv. 
CONSENSUS: G-x-[UVM](2)-x-R-Q.R-G-x(5)-G. 

NAME: Ribosomal protein L29 signature. 

CONSENSUS: [KNQS]-[I>STL].x(2).[lJMFAHKRGSANl-xH3-IVYSTA]-[KRHKRH].[DESTA 
CONSENSUS: [Lrvi-A-[KRCQVT]-[UVMA]. J i J i «««J li^ci>iA«Ki-j 

NAME: Ribosomal protein L30 signature. 

CONSENSUS: [IVTJ-[UVM]-x(2^[Ln-x-[U]-x-CKRHQEG]-x(2HSTNQHl-x-nVT1- 
CONSENSUS: x(10)-[LMSl-[LIV]-x(2)-[LIVA).x(2)-[LMFY].{IVn. 

NAME: Ribosomal protein 131 signanire. 

CONSENSUS: H.P-F^.rni.x(9)^-R-[AVl-x-[KR]. 

NAME: Ribosomal protein L33 signature. 

CONSENSUS: Y-x.[STI-x^RWSJ-x(4).(PAT)-x(l,2)-(UVM]-[EAl-x(2)-K-[l^ 
NAME; Ribosomal protein L34 signature. 

CONSENSUS: K-^RG].T*(FYWL]-tEQS^x(5HKRHS3-x(4.5H^•F.x(2).R. 
NAME: Ribosomal protein L3S signature. 

CONSENSUS: [lJVM]-K4TV]-x(2HGSAHSAIL)-x-K-R-(UVMFY3-(KRLJ. 
NAME: Ribosomal protein L36 signature. 

CONSENSUS: C-x{2K-x(2)-[LIVM]-x-R-x(3)-[IJVMN)-x-[UVMl-x-C-x(3,4HKRl.H-x^^^ 
NAME: Ribosomal protein Lie signature. 

CONSENSUS: N-x(3^fKRJ-x(2)-A-[UVTl.x-S-A.[UV].x-A-[ST14SGAl-x(^^ 

NAME: Ribosomal protein L6e signature. 

CONSENSUS: N-x(2)-p.L-R-R.x(4)-CFY].V-I-A-T-S-x-K. 

NAME: Ribosomal protein L7Ae signaftiiv. 

CONSENSUS: [CAJ.x(4HIV]-P-[FYl-x(2)-[UVMl-x-[GSQl-[KRQI-x(2)-L<S. 

NAME: Ribosomal protein LlOe signattue. 

CONSENSUS: R-x-A-(FYW)-G-K-[PA]-x-G-x(2>.A-R-V. 

NAME: Ribosomal protein LI 3c signature. 

CONSENSUS: [KR]-Y-x(2)-K-[UVM3-R-ISTA]^.[KR3-G-F-(ST]-L-x-E, 
NAME: Ribosomal protein LlSe signamre. 

CONSENSUS: [DEI-(KR)-A.R-x-L-G-tFYl.x-[SAP3-xa)^[LIVMFYl(4)-R.^^^^ 
NAME: Ribosomal protein LlSe signature. 

CONSENSUS: IKI^-x.L-x(2>n»S]-[KR).x(2)-[RH].[PSA].x4LIVMl-[NS]-a 
CONSENSUS: [LIVM]. 

NAME: Ribosomal protein L19e signature. 

CONSENSUS: R-x-[KR>x(5).(KR)-x(3).[KRH3-xa)^x-G-x-R-x^-x(3M-R-x(3)^ 
CONSENSUS: x(2)-W-x(7)-R-x(2)-L-x(3)-R. y ri \w 
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NAME: Ribosomal protein L21e signature. 

CONSENSUS: G-[DE]-x-V-x(10)-IGV|.x(2)-IFYH]-x(2)-(FY]-x-G-x-T.G. 
NAME: Ribosomal protein L24e signature. 

CONSENSUS: lFY)-x.[GS]-x(2)-[IVJ.x-P.G-x-G-x(2MFYVJ-x-(KRHE)-x-D. 

NAME: Ribosomal protein L27e signature. 
CONSENSUS: G-K-N-x-W-F-F-x-K-L-R-F > . 

NAME: Ribosomal protein L30e signature I. 

CONSENSUS: [STAl-x(5)-G-x-(QKR]-x(2)-CLIVM)-[KQT]-x(2HKR)-x-G-x(2)-K-x-[LIVMJ(3). 

NAME: Ribosomal protein L30e signature 2. 

CONSENSUS: IDEI-L-G-(STA)-x(2)-G-[KR]-x(6)-(LIVMl-x-{UVM]-x-(DEN]-x-G. 
NAME: Ribosomal protein L31e signature. 

CONSENSUS: V-[KRJ-[LIVMJ-x(3)-(LIVMhN.x.(AK)-x-W-x-[KR]-G, 
NAME: Ribosomal protein L32e signature. 

CONSENSUS: F-x-R-x(4)-rKRJ-x(2)-IKRHLIVM).x(3)-W-R-(KR]-x(2)-G. 

NAME: Ribosomal protein L34e signature. 
CONSENSUS: Y.x-(STJ-x-S-[NYl-x(5)-[KR]-T-P-G. 

NAME: Ribosomal protein L35Ae signature. 

CONSENSUS: G-K-[LIVMl-x-R-x-H-G-x(2)-G-x-V-x-A-x-F-x(3HLIl-P. 
NAME: Ribosomal protein L36e signamre. 

CONSENSUS: P-Y-E-(KRJ-R-x-(LIVM]-[DE)-[LrVMJ(2HKR]. 
NAME: ' Ribosomal protein L37e signature. 

CONSENSUS: GT x-ISAJ-x-G-x-[KRl-x(3)-(STl-x(0,!)-H-x(2)-C*x-R-C.G. 
NAME: Ribosomal protein L39e signature. 

CONSENSUS: [KRAl-T-x(3HLIVM)-rKRQFl-x-[NHS]-x(3)-R-[NHYl.W-R-R. 

NAME: Ribosomal protein L44e signature. 
CONSENSUS: K-x-(TV]-K-K-x(2)-L-IKRJ-x(2)-C. 

NAME: Ribosomal protein S2 signamre 1. 

CONSENSUS: [LIVMFA]-x(2HL!VMFYC](2)-x-[STACI-[GSTANQEiaHSTALV]-rHYl-(LIVMF]<J. 

NAME: Ribosomal protein S2 signature 2. 

CONSENSUS: P-x(2)-[LIVMF](2)-[LIVMS]-x-IGDNl-x(3HDENL]-x(3HLIVM]-x-E-x(4)- 
CONSENSUS: (GNQKRHJ-[LfVM)-(AP]. 

NAME: Ribosomal protein S3 signamre. 

CONSENSUS: [GSTA]-[KRJ-x{6)-G-x-[LIVMTI-x(2HNQSCH).x(1.3>-[LIVFCA]-x(3)-[LIVl- 
CONSENSUS: [DENQJ.x(7)-[LMT]-x(2)-G-x(2>-G. 

NAME: Ribosomal protein S4 signamre. 

CONSENSUS: (LIVMJ-[DE]-x-R-L-x(3>[UVMC)-[VMFYHQJ-fKRTl-x(3HSTAGCF]-x-[STl-x(3)- 
CONSENSUS: ISAI]-tKR)-x-[LIVMF)(2). 

NAME: Ribosomal protein SS signature. 

CONSENSUS: G-[KRQ]-x(3)-[FY]-x-(ACV]-x(2)-[LIVMAl-CLIVM]-(AG)-[DN]-x(2)-G'X- 
CONSENSUS: (LIVMJ-G-x-ISAGl-x(5.6HDEQ]-IUVMl-x(2)-A-lLIVMF). 

NAME: Ribosomal protein S6 signature. 

CONSENSUS: G-x-(KRC]-[DENQRH3-L-lSA}-Y-x-I.(KRNSA). 
NAME: Ribosomal protein S7 signature. 

CONSENSUS: IDENSK]-x-[LIVMETl-x(3)-[LIVMFTia)-x(6)-G-K-[KR]-x(5HI-IVMFJ-[UVMF 
CONSENSUS: x(2)-[STAl. 

NAME: Ribosomal protein SS signature. 

CONSENSUS: [GE)-x(2)-[LIV)(2)-[STY]-T-x(2H5^Ml(2).x(4HAGHKRHAYI]. 
NAME: Ribosomal protein S9 signature. 

CONSENSUS: G-G<3-x(2HGSAW-x(2HSA>x(3HGSAl-x-IGSTAV]-[KRJ-(GSAL}-tLIF]. 
NAME: Ribosomal protein SIO signamre. 

CONSENSUS: lAV]-x(3)-tGDNSRl-ILIVMSTA]-x(3)-G-P-CLIVM]-x-tLIVM]-P-T. 


1034 


wo 01/12659 PCT/IBOO/01496 

NAME: Ribosomal protein SI 1 signature. 

CONSENSUS: rLrVMF]-x-[GSTACJ-(LFVMF]-x(2HGSTALJ-x(0.1)-[GSN]4UVMF]-x-ILIVMl. 
CONSENSUS: x(4V(DEN]-x-T-P-x-[PA]-[STCH]-fDN]. 

NAME: Ribosomal protein S12 signature. 
CONSENSUS: [RK)-x-P-N-S-[AR)-x-R. 

NAME: Ribosomal protein SI 3 signature. 

CONSENSUS: lKRQS)G-x-R-H-x(2)-IGSNH)-x(2)-[LIVMq-R-G43. 
NAME: Ribosomal protein S 14 signature. 

CONSENSUS: [RP)-x(0. l).C-x(l 1 . 12HUVMF]-x.ILIVMFl-[SC3.[RG^x(3^^RN]. 
NAME: Ribosonia) protein S15 signature. 

CONSENSUS: fLIVM3-x(2^H-(LIVMnrj-x(5).D-x(2)-[SAGNl-x(3HLF|-x(9HLIVMl-x(2)- 
CONSENSUS: [FY]. 

NAME: Ribosomal protein SI 6 signature. 

CONSENSUS: [LIVMT)-x-[LIVM]-rKR]-L-[STAK]-R-x-G-[AKRl. 
NAME: Ribosomal protein SI7 signature. 

CONSENSUS: G-D-x-[LIV]-x-[LIVAI-x.[QEKI-x-[RKl-P-[UVJ-S. 

NAME: Ribosomal protein S 18 signature. 

CONSENSUS: IJVJ-(DY]-Y.x(2)-[LIVMTJ-x(2)-IUVM3-x(2)-[FYT]-ILIVM)-rSTl-[DERP).x- 
CONSENSUS: IGY]-K-[LIVM]-x(3)-R-rLIVMAS]. 

NAME: Ribosomal protein S19 signature. 

CONSENSUS: fSTDNQ)-G-[KRQM)-x(6)-[LIVMl-x(4>-(LIVM].[GSD]-x(2).[LF]-{GASl-[DE]-F- 
CONSENSUS: x(2)-[Sn. 

NAME: Ribosomal protein S2 1 signature. 

CONSENSUS: [DEJ.x-A-ILY]-[KR]-R-F-K-IKR].x(3)-[ICR). 

NAME: Ribosomal protein S3Ae signature. 

CONSENSUS: [LIV]>x-lGHl-R-lIVJ-x-E.x-[SCK-x-D-L. 

NAME: Ribosomal protein S4c signature. ' 
CONSENSUS: H-x-K-R-[LIVM]-[SAN)-x-P-x(2)-W-x.lUVM]-x-IKR]. 

NAME: Ribosomal protein S6e signature. 

CONSENSUS: [LIVMl-[STAMRl-G-G-x-D-x(2)-G-x.P-M. 

NAME: Ribosomal protein S7e signature. 

CONSENSUS: [KRl-L-x-R-E-L-EK-K-F-fSAPl-x-tKRl-H. 

NAME: Ribosomal protein S8e signature. 

CONSENSUS: R-x(2)-T-G-IGA)-x(5)-[HR]-K-[KR]-x-K-x-E-ELMl-G. 
NAME: Ribosomal protein S12e signanire. 

CONSENSUS: A-L-IKRQP]-x.V-L-x(2)-(SA]-x(3)-[DN]-G.L. 

. NAME: Ribosomal protein S17e signature. 
CONSENSUS: A-x.I-x-[ST)-K.x-L.R.N-IKR]-I-A-G-IFY)-x-T-H . 

NAME: Ribosomal protein S19e signature. 

CONSENSUS: P-x(6)-[SAN).x(2)-[LIVMAl-x.R-x-[ALIV]-[LV3-Q-x-L-[EQJ. 

NAME: Ribosomal protein S21e signature. 
CONSENSUS: L-Y-V-P-R-K-C-S-tSAl. 

NAME: Ribosomal protein S24e signanire. 

CONSENSUS: [FA].G.x(2>.[KRJ.ISTAJ.x-G-[FY]-(GAl-x.[LIVM]-Y-pN]-[SNl. 

NAME: Ribosomal protein S26c signature. 
CONSENSUS: [YHj-C-V-S-C-A-l-H. 

NAME: Ribosomal protein S27e signature. 

CONSENSUS: IQK]^-x(2K:-x(6VF-CGS3-x-lPSA]-x(5)-C-x(2)<:-[GSJ-xa)-L-x(^^^ 

NAME: Ribosomal protein S28e signature. 
CONSENSUS: E-[ST]-E-R-E-A-R-x.L. 

NAME: DNA mismatch repair proteins mutL / hcxB / PMSl signature. 
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CONSENSUS: G-F>R-G-E-A-L. 

NAME: DNA mismatch repair proteins mutS family signature. 

CONSENSUS: [ST)-[LIVM].x-ILIVMJ-x-D-E-[UVMY]-(GCHRKH]-G-[GST}-x(4)-G. 
NAME: mutT domain signature. 

CONSENSUS: G-x(5)-E-x{4)«[STAGCl-(LrVMAC]-x-R-E-ILIVMFT|-x-E-E. 
NAME: DnaA protein signature. 

CONSENSUS: I-(GA]-x(2)-[LIVMF|-[SGDNK]-x(0,lHKRl-x-H-(STP]-[STV]-IUVMl(2)-x- 
CONSENSUS: [SA]-x(2)-[KREHLIVM]. 

NAME: Small, acid-soluble spore proteins, alpha/beta type, signature I. 
CONSENSUS: K.x>E-(LIV]-A-x-[DE]-IUVMF]-G-ILIVMFl. 

NAME: Small, acid-soluble spurc proteins, alpha/beta type, signature 2. 
CONSENSUS: EKR3-ISAQJ-x-G-x-V-G-G-x-(LIVMl-x-[KRI(2)-(LIVM](2), 

NAME: Zinc-comainii^ alcohol dehydrogenases signature. 
CONSENSUS: G-H-E-x(2)-G-x(5)-[GA)-x(2)-tIVSAC]. 

NAME: Quinone oxidoreductase / zeta-crystallin signature. 
CONSENSUS: lGSD]-[DEQHl-x(2)-L-x(3).[SAl(2H3-G-x-G-x(4)-Q-x(2)-[KR). 

NAME: Iron-containiQg alcohol dehydrogenases signature 1. 

CONSENSUS: [STAUV].[LIVFl-x-[DEl-x(6.7)-P-x(4).[ALIV)-x-IGST]-x(2)-D-lTAIVMJ- 
CONSENSUS: [LIVMF]-x(4)-E. 

NAME: Iron-containiog alcohol dehydrogenases signanire 2. 
CONSENSUS: [GSW)-x-[LrVTSACDJ-(GH)-x(2)-[GSAEl-(GSHYQ]-x-[LIVTP]-[GASTJ.[GAS]-x(3)- 
CONSENSUS: [IJVMn-x.[HNS]-[GA]-x-tGTACl. 

NAME: Short-chain dehydrogeitases/reductases femily signature. 

CONSENSUS: (UVSPADNK]-x(12)-Y-[PSTAGNCV3.[STAGNQCrVM]'[STAGC]-K-{PC)-(SAGFRl- 
CONSENSUS: (LIVMSTAGDl-x(2)-CLIVMFYW)-x(3)-fLIVMFYWGAPTHQl-tGSACQRHM]. 

NAME: Aido/keto reductase family signature 1 . 

CONSENSUS: G-lFYJ-R-[HSALJ-[LIVMF]-I>(STAGC]-[ASJ-x(5)-E-x(2>-(LIVM]-G. 
NAME: Aldo/keto reductase family signature 2. 

CONSENSUS: [UVMFYl.x(9HJ«EQJ-x-(LIVMl-G-[LIVM]-[SCl-N-[FY]. 
NAME: Aldo/keto reductase family putative active site signature. 

CONSENSUS: fUVM3-(PAIVHKR]-[STJ-x(4)-R-x(2HGSTAEQKl-[NSL]-x(2)-(UVMFAJ. 
NAME: Homoserine dehydrogenase signature. 

CONSENSUS: A-x(3)-G-[LIVMFYl-(STAG]-x(2.3)-[DNS]-P-x(2)-D-[LIVM]-x-G-x-D-x(3)-K. 
NAME: NAI^dependent glyceroI-3-phosphate dehydrogenase signature. 

CONSENSUS: G-lAT]-[LIVM]-K^DN)-[LIVMJ{2)-A-x-[GAl-x<J-[UVMF3-x-[DEl-G-ILIVM^ 
CONSENSUS: [UVMFYW]-G-x-N. 

NAME: FAD-dependent glyceiol-3-phosphate dehydrogenase signature 1 . 
CONSENSUS: [IV]-G-G-G-x(2)-G-(STACV]-G-x-A-x-D-x(3)-R-G. 

NAME: FAI>dependent glycerol-3-phosphate dehydrogenase signature 2. 
CONSENSUS: G-G-K-x(2).[GSTEJ-Y-R.x(2)-A. 

NAME: Mannitol dehydrogenases signanire. 

CONSENSUS: [LIVMY]-x-[FS)-x(2HSTAGCV)-x.V.D-R-IIV]-x-[PS]. 
NAME: Histidinol dehydrogenase signature. 

CONSENSUS: I-D-x<2)-A-G-P-lST|-E-(UVS)-[UVMAl(3HAC)-x(3)-A-x(4HLIVMl-[AV^ 
CONSENSUS: [SACLJ.[DE]-IUVMFC]-(LIVM]-[SA]-x(2)-E-H. 

NAME: L-lactate dehydrogenase active site. 
CONSENSUS: [LIVMAJ-G.[EQ]H-G-PN]-[ST1. 

NAME: D-isomer specific 2-hydroxyacid dehydrogenases NAD-binding signature. 
CONSENSUS: [UVMAl-[AG]-[IVT)-(LIVMFyi-[AG]-x-G-[NHKRQGSACl-[UVl^^ 
CONSENSUS: [UVfMT)-x(2)-[FYwCTH]-[DNSTiq. 

NAME: D-isomer specific 2-hydroxyacid dehydrogenases signature 2. 

CONSENSUS: lUVMI^Al-[LIVFYWq.x(2HSACHDNQHRl-[IVFA]-JLIVF3-x-[UVFl-Q^ 
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CONSENSUS: P-x(4HSTN)-x(2)-[LIVMF]-x-[GSDN). 
NAME: D-isomer specific 2 -hydroxy acid dehydrogenases signature 3 

NAME: 3-hydFoxyisobutyrate dehydrogenase signature 

CONSENSUS: [LIVMFYl(2)-G-L-G-x-[MQ]-G.x.[PGS]-(MAl-[SA]. 

NAME: Hydroxymethylglutaryl-cocnzyme A r«duciases signature 1 
CONSENSUS: [RKH]-x(6)-D-x-M-G-x-N-x.[LIVMAJ. 

NAME: Hydroxymethylglutaiyl-coenzyme A reductases signature 2 
CONSENSUS: [LIVM].G.x-fLIVM].G-G.[AG]-T. 

NAME: Hydroxymethylglutaryl-coenzyme A reductases signature 3 

CONSENSUS: A-[LIVM]-x-[STAN)-x(2).rLI]-x-fKRNQMGSAi-H-ILM]-x-(FYLH]. 

NAME: Hydroxymethylglutaryl-coenzyme A reductases profile. 

NAME: 3-hydroxyacyl-CoA dehydrogenase signature 

NAME: Malate dehydrogenase active site signature 

CONSENSUS: [LIVMJ-T-riWCMN)-L-D-x(2)-R-rSTAl-x(3HLIVMFY]. 
NAME: Malic enzymes signature. 

CONSENSUS: F-x-fDV].D-xa)<3-T^GSAJ-x-tIV].x-aiVMAHGASn(2HUVMF](2). 

NAME: Isocitrate and isopropytaaUte dehydrogenases signature 

NAME: 6-phosphogIuconate dehydrogenase signature 
CONSENSUS: (LIVM]-x-D-x(2>-[GAJ-[NQS]-K.G-T-G-x-W. 

NAME: Glucose-6-phosphatc dehydrogenase active site 
CONSENSUS: D-H-Y-L-G-K-IEQKl. 

NAME: IMP dehydrogenase / GMP reductase signature 

CONSENSUS: [UVM].fRK]-[LIVM]-G-ILIVM]-G.x-G-S-[UVM]-C.x.T. 

NAME: Bacterial quinoprotcin dehydrogenases signature I 

CONSENSUS: rDENl.W.x(3)-G-puC]-x(6)-[FYW].S-x(4HLlVM].N.x(2>-N.^^^ ' 

NAME: Bacterial quinoproiein dehydrogenases signature 2 

CONSENSUS: W.x(4)-Y.D-x(3HDN}.[LrVMFY](4)-x(2H5-x(2HSTA]-P. 

NAME: FMN-dependent alpha-hydroxy acid dehydrogenases active site 
CONSENSUS: S-N-H-G-(AGJ-R-Q. 

NAME: GMC oxidoreductases signature I 

g*jy-*-n^-<^HG^(2).x-WN.x(3MFyWA^ 

NAME: GMC oxidoreductases signature 2 

CONSENSUS: lGS]-[PSTA]-x(2)-[STJ-P.x-[LIVM](2).x(2)-S-G.[UVM]-G. 
NAME: Eukaryotic molybdopcerin oxidoreductases signature 

NAME: Prokaiyotic molybdoptcrin oxidoreductases signature 1 
Ss^lls; g'^^'-'^-3>^-I^A«HGSTVMFl-x<^^^ 

NAME: Piokaryodcinolybdopterinoxidoreduciasessigiatiiic2 

CONSENSUS: tSTA^x.|STAq(2).x(2)-[STA]-D-[UVMYle)-L-P-x-[S^ACK2).x(2)-E. 
NAME: Prokaryodc molybdopterin oxidoreductases signature 3 
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NAME: Aldehyde dehydrogenases glutamic acid active site. 

CONSENSUS: rUVMFGAJ-E-[LIMSTAC]-(GS]-G-(KNLM]-(SADN]-ITAPFVJ. 

NAME: Aldehyde dehydrogenases cysteine active site. 

CONSENSUS: IFYLVA|-x(3)-G-[QE]-x-C-(LIVMGSTANCJ-(AGCNl-x-IGSTADNEKRJ. 

NAME: Aspartate-semialdehyde dehydrogenase signature. 

CONSENSUS: tLIVMJ-[SADNl-x(2)-C-x R-{UVM]-x(4).IGSq-H.tSTAI. 

NAME: Giyceraldehyde 3-phosphate dehydrogenase acdvc site. 
CONSENSUS: (ASV]-S-C-[NTl-T.x(2).(LIM]. 

NAME: N-acetyl-gamma-glutamyl-phosphate reductase active site. 

CONSENSUS: [LIVM]-[GSAl-x-P-G-C-[FY]-[AVP]T-fGAI-x(3HGTAC)-[LIVMl-x-P. 
NAME: Ganuna-glutamyl phosphate reductase signature. 

CONSENSUS: V.x(5M-(LrVJ-x.H-I-x(2)-[HY].lGS)-[Sn-x-H-[ST)-[DE)-x4. 

NAME: Dihydrodipicolinate reductase signature. 
CONSENSUS: E.[IV).x-E.x-H-x(3)-K-x-D-x-P-S-G-T.A. 

NAME: Dihydroorotate dehydrogenase signature 1. 

CONSENSUS: (GSJ-x(4>-[GKJ-tSTA)-[IVSTAJ-(GTl-x(3)-tNQR]-x-G-[NH]-x(2)-P-[RTI. 
NAME: Dihydroorotate dehydrogenase signature 2. 

CONSENSUS: [UV](2)-[GSA]-x<5<5-[IV]-x-[STGN)-x(3)-[ACV]-x(6)-G-A. 
NAME: Coproporphyrinogcn III oxidase signature. 

CONSENSUS: K-x-W-C.x(2)-[FYHJ(3)-[LIVMJ-x-H-R-x-E-x-R-G (LIVMl-G-G-fLIVMJ-F-F-D. 

NAME: Fumarace reductase / succinate dehydrogenase FAD-biiKiing site 
CONSENSUS: R-(ST]-H-(ST3-x(2)-A-x-G-G, 

NAME: Acyl-CoA dehydrogenases signature I . 

CONSENSUS; [OAO.CLIVM]-[ST|-E-x(2)-[GSANJ-G-lSn-D-x(2)-[GSA]. 

NAME: Acyl-CoA dehydrogenases signature 2. 

CONSENSUS: (QDEJ-x(2)-G.(GS]-x-G-[LIVMFY].x(2)-IDEN|-x(4)-[KRl-x(3)-[DEN]. 

NAME: Alanine dehydrogenase & pyridine nucleotide transhydrogenase signature I 
CONSENSUS: G-[LIVM)-P-x-E<3)-N.E-x(1.3)-R-V-A-x-(STl P-x^GSTI-V-x(2)-L-x-rKRH) 
CONSENSUS: x^. y r i j 

NAME: Alanine dehydrogciose & pyridine nucleotide transhydrogenase signature 2 - 
CONSENSUS: (LIVMl(2)-G^GAl-G-x-A<i-x(2HSA^x(3>(GA^x-[SG^[LIVMl^A-x-V. 
CONSENSUS: x(3)-D. 

NAME: Glu / Leu / Phe / Val dehydrogenases active site. 
CONSENSUS: [LIVl-x(2)-G-G-[SAG)-K-x-(GVl-x(3)-[DNSn-[PL]. 

NAME: D-amino acid oxidases signature. 

CONSENSUS: ILIVMl(2)-H-{NHA]-Y-G-x.(GSAl(2)-x-G-x(5)-G-x-A. 

NAME: Pyrjdoxaniiiie 5'-phosphate oxidase signature. 
CONSENSUS; [LIVFl-E-F-W-[QHG]-x(4)-R-fLIVMl.H.[DNEl-R. 

NAME: Copper amine oxidase topaquinone signature. 

CONSENSUS: fLIVMl-[UVMA)-[LIVM]-x(4)-T-x(2>N-Y-(DE)-[YN]. 

NAME: Copper amine oxidase copper-binding site signature. 
CONSENSUS: T-x-G.x(2)-H-[LIVMF)-x(3)-E-[DEl-x-P. 

NAME: Lysy] oxidase putative copper-binding region signature. 
CONSENSUS: W-E-W-H-S-C-H-Q-H-Y-H. 

NAME: Delia l-pynoline-S-carboxylate reductase signature. 

CONSENSUS: [PAUn-x(2.3HUVl-x(3).[UVM]-[STAC)^STV).x-tGAN]-G-x-T-xaV 
CONSENSUS: IlJV]-x(2V[LMF)-[DENQK]. 

NAME: Dihydrofolate reductase signature. 

CONSENSUS: [LVAGC]-[UI^-G-x(4HLIVMFl-P-W-x(4.5)-tDE]-x(3)-CFYIV]-x(3)-[STIQ^ 
NAME: Tetrahydrofolate dehydrogenase/cydohydrolase signature 1. 

CONSENSUS: (EQJ-x.[EQKl-[UVM](2M2)-[UVMl-xa)-[UVMYl-N.x-PN)-x(3>-[L^ 
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CONSENSUS: Q-L-P-[LV). 

NAME: Tetrahydrofolate dehydrogenase/cyclohydrolase signature 2 
CONSENSUS: P-G-G-V-G-P-[MF1-T-[I V] . 

NAME: Oxygen oxidoreducuses covalem FAD-binding site. 

CONSENSUS: P-x(10HDE3-fLIVMJ-x(3).[LIVM]-x(9)-[LIVM].x(3).[GSAl.[GSn-G.H. 

NAME: Uridine nucleotide-disulphidc oxidoreductases class-I active site 
CONSENSUS: G-G-x-C.[LIVA]-x(2).G-C-rLIVMl-P. 

NAME: Pyridine nuclcotidc-disulphide oxidoreductases class-ll active site 
CONSENSUS: C.x(2K-D-IGAJ-xa.4HFYl-x(4)-[UVMJ-x.[UVM](2H5(3HDNl. 

NAME: Respiratory-chain NADH dehydrogenase subunit 1 signature 1 

NAME: Respiratory-chain NADH dehydrogenase subunit 1 signature 2 

CONSENSUS: P-F-0-[UVMFYQl^STAGPVMI.E-[GAC].E.x-[EQI.(UVMS]-x(2)-G. 

NAME: Rcspiratoiy-chain NADH dehydrogenase 20 Kd subunit signature 

CONSENSUS: [GN].x-D^[KRST]-[UVMF){2)-P-IIV].D-[LIVMFYW](2)-x-P-x^.P-[PTJ. 

NAME: Respiratoiy-chain NADH dehydrogenase 24 Kd subunit signature 
CONSENSUS: I>-x(2)-F-[STl-x(5)-C-L-G-x-C-x(2HGA]-P. 

NAME: Respiratory chain NADH dehydrogenase 30 Kd subunit signamrc 

CONSENSUS: E-R-E-x(2)-[DE).[LIVMF)a>-x(6)-[HK].x(3).[KRP].x-[LIVM]-CLIVMSl. 

NAME: Respiratory chain NADH dehydrogenase 49 Kd subunit sienanire 
CONSENSUS: [LIVMH]-H-{RTl-[GA]-x-E-K-[LIVMT].x-E-x.[KRQl.' 

NAME: Rcspiratoiy^jhain NADH dehydrogenase 51 Kd subunit signature 1 
CONSENSUS: G-IAMl-G-^AR^Y.[UVM]^^-[DEl(2)-tSTAl(2).^LI^Q(2HEN).S. 

NAME: Respiraioiy-chain NADH dehydrogenase 51 Kd subunit signature 2 
CONSENSUS: E-S-C-G-x-C-x-P^T-R-x-G. 

NAME: Respiratory-chain NADH dehydrogenase 75 Kd subunit signature 1 
CONSENSUS: P-x(2K-IYWSJ-x(7)-G-x<:-R-x-C. 

NAME: Respiratoiy-cham NADH dehydrogenase 75 Kd subunit sienanire 2 
CONSENSUS: C-P-x^-{DEJ-x-[GS](2)-x.C-x-m?. 

NAME: Respiratory-chain NADH dehydrogenase 75 Kd subunit signature 3 
CONSENSUS: R-C-[LIVM].x-C-x-R-C-tLIVM)-x [FY]. 

NAME: Nitrite and sulfite reductases iron-sulftir/sirobcmc-binding site 
CONSENSUS: [STV]^<:.x(3)-C-x(6)-(DEl-[LIVMF]-[GAT|-[LIVMFI. 

NAME: Uricase signature. 

CONSENSUS: L-x-[LV]-L-K-(Sn-T-x-S-x-F-x(2HFY]-x(4)-tFYl. 

"«™«-«>PPC«" oxidise catalytic subunit, cc^r B binding region signature 
CONSENSUS: ^VWGl.[LIVFYWTA]a)-EVGS^H-^LNP^x-V-x{44,4^^^ 

NAME: CO II and nitrous oxide reductase dhniclear copper centers sisnature 
CONSENSUS: V.x.H-x(33.40H:.x(3K-x(3)-H-x^. 

NAME: Cytochrome c oxidase subunit Vb, zinc binding region signanire 
CONSENSUS: (LIVM]{2)-rFYW]-x(10K:-x(2)-C.G-x(2HFY]-K.L. 

NAME: Multicopper oxidases signanire 1. 

CONSENSUS: G.x.IFYWl-x-[UVMFYW]-x-[CST|.x(8)<}-rU^ 

NAME: Multicopper oxidases signanire 2. 
CONSENSUS: H-C-H-x(3)-H-x(3)-[AG]-(LM]. 

NAME: Peroxidases proximal heme-ligand signature. 

CONSENSUS: n5EIl-[UVMTA].x(2)-[LIVMl-[mMSTAGJ.[SAGl-[LIV^ 
NAME: Peroxidases active site signaoue. 

CONSENSUS: lSGATV]-x(3)-[UVMA)-R-[LIVMA].x.(FW)-H.x-CSAq. 
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NAME: Caialase proximal hemc-ligand signature. 

CONSENSUS: R'[LIVMFSTAN)-F.(GASTNPJ.Y-x-D-[AST].tQEH]. 

NAME: Cataiase proximal active site signature. 

CONSENSUS: lIF]-x-[RH]-x(4HEQ3-R-x(2).H.x(2)-(GAS)-[GASTF].[GAST]. 
NAME: Glutathione peroxidases selenocysteine active site. 

CONSENSUS: [GN]-[RKHNFyCJ-x.(UVMFC^[LIVMFJ(2)-x-N-tVT]-x-[STCJ-x-C-(GA]-x-T. 

NAME: Glutathione peroxidases signature 2. 
CONSENSUS: [UV)-[AGD].F-P.(CS)-[NG]-Q-F, 

NAME: Lipoxygenases iron-bindintg region signature I. 

CONSENSUS: H-[EQl-x(3)-H-x-[LMJ-(NQRC]-(GSTl-H-[UVMSTACj(3)-E. 

NAME: Lipoxygenases iron-binding region signature 2. 

CONSENSUS: [UVMA)-H-P-[LrVM|-x-[KRQJ-[LIVMF](2)-x-(AP]-H. 

NAME: Extradiol ring-cleavage dioxygenases signature. 

CONSENSUS: [GNnV]-x-H-x(5,7)-(UVMFJ-Y-x{2)-[DENTA]-P-x-[GP]-x(2,3)-E. 
NAME: Intradiol ring-cleavage dioxygenases signature. 

CONSENSUS: tLIVMhx-G-x-[LIVM]-x(4)-(GSJ-x(2)-[LIVMl-x(4)-[LIVM].[DE]-(LIVMFYl. 

CONSENSUS: x(6)-G-x-[FY] . 

NAME: Indoleamine 2, 3-dioxygenase signature L 
CONSENSUS: G-G-S-IANl-[GAJ-Q-S-S-x(2)-Q. 

NAME: Indoleamine 2,3-dioxygenase signature 2. 

CONSENSUS: [FY)-L-[DQ]-PEJ-[LIVM]-x(2)-Y-M-x(3)-H-(KR]. 

NAME: Bacterial ring hydroxylating dioxygenases alpha-subunit signature. 
CONSENSUS: C-x-H-R-[GA^x(8)-G-N-x(5K:.x^FY]-H. 

NAME: Bacterial luciferase subunits signature. 

CONSENSUS: [GA^[UVM)-P-[LI^^-x-^LIVMFY)-x-W-x(6^(RK)-x(6)-Y.x(3HAR]. 

NAME: ubiH/C0Q6 monooxygenase family signature. 
CONSENSUS: H-P-[LIV]-[AG]-G-Q-G-x-N-x-G-x(2)-D. 

NAME: Biopterin-dependent aromatic amino acid hydroxylases signature. 
CONSENSUS: P-D-x(2)-H-PE]-[Ln-[UVMFl-G-H-(LIVMC]-P. 

NAME: Copper type II, ascorbate-dependem monooxygenases signature I . 
CONSENSUS: H-H-M-x(2)-F-x-C. 

NAME: Copper type II, ascorbate-dependent monooxygenases signature 2. 
CONSENSUS: H-x-F-x(4).H-T-H-x(2)-G. 

NAME: Tyrosinase CuA-binding region signamrc. 

CONSENSUS: H-x(4,5)-F-[UVMFTP).x-[FW)-H.R-x(2)-CLM]-x(3)-E. 

NAME: Tyrosinase and hemocyanins CuB-binding region signamrc. 
CONSENSUS: r)-P-x.F-ILIVMFyW]-x(2)-H-x(3)-D. 

NAME: Fatty acid desaturases family I signature. 
CONSENSUS: G-E-x-[FY)-H-N-[FYl-H-H-x-F-P-x-D-Y. 

NAME: Fatty acid desaturases family 2 signature. 

CONSENSUS: [S^^^SA^x(3HQR^[Ln-x(5.6)-D.Y.x(2)-[UVMFYW]4LIVM^[DEJ. 

NAME: Cytochrome P450 cysteine heme-iron Ugand signatuic. 
CONSENSUS: [FW].[SGNHl-x-(GD]-x-[RHFnx-C-[LIVMFAPJ-[GAD]. 

NAME: Heme oxygenase signature. 
CONSENSUS: L-L-V-A-H-A-Y-T-R. 

NAME: Copper/Zinc superoxide dismutase signature 1. 

CONSENSUS: [GAMIFATl-H-(LrVFl-H-x(2HGP)-[SDGJ.x-(STAGD). 

NAME: Copper/Ziiu: superoxide dismutase signature 2, 
CONSENSUS: G.[GN]-[SGAl-G-x-R-x-[SGA]-C-x(2).[IVl. 
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NAME: Manganese and iron superoxide dismuiases signature 
CONSENSUS: D-x-W.E-H-[STAMFYl(2). 

NAME; Ribonucleotide reductase large subunit signature. 
consensus" ^^*^2^f^^-*<^»''>-^-t'^'^Ml-[FYRAl-[NH].x 

NAME: Ribonucleotide reductase small subunit signature. 

CONSENSUS: riVMSEQ]-E.x(I.2)-[IJTrA]-[HY]-[GSA]-x-[STAVM]-Y.x(2^rUVM01-xO^^ 
CONSENSUS: [LlFYl-dVFYCSA]. J ^ /-l" ^vu xij; 

NAME: Nitrogcnases component 1 alpha and beta subunits signature 1 
CONSENSUS: (UVMFYH]-[LIVMFST|-H-[AG]-IAGSPHLIVMNQAI-[AG1^. 

NAME: Nitrogcnases component I alpha and beta subunits signature 2. 

CONSENSUS: (STANQJ.[ET].C.x(5>G-D-[DNl-tUVMT].x-[STAGR]-[LIVMFYSTl. 

NAME: NifH/frxC family signature 1 . 

CONSENSUS: E-x-G-G-P.x(2)-[GA]-x^-[AGJ-G. 

NAME: NifH/frxC family signatuiv 2. 
CONSENSUS: D-x-L-G-D-V-V<:^-G-F-tAG)-x-P. 

NAME: Nickel siependent hydrogenascs large subunit signature 1 
CONSENSUS: R-G-[LIVMF] E-x(15)-[QESM]-R>x-C-G-(LlVM]-C. 

NAME: Nickel-dependent hydrogenascs laigc subunit signature 2 
CONSENSUS: IFYl-D-P<:-{LIM)-[ASG]^-x(2,3)-H. 

NAME: Glutamyl-tRNA reductase signature. 

n^M™^c' H-lLIVMNxa)-(UVM]-fGSTAC](3HUVM]-(DEQl-S-[LIVMA).p^^ 
CONSENSUS: x-[QRJ-{IV).(Lrr)-fSTAG]-Q-[LIVM]-iKRl. ^ 

NAME: Bacterial'type phytocne dehydrogenase signature. 

CONSENSUS.- 

NAME: Glycine radical signanire. 

CONSENSUS: [STIV]-x.R.(IVn-[CSA>G-Y-x-[GACV]. 

NAME: Ergosterol biosynthesis ERG4/ERG24 family signature 1 
CONSENSUS: G-x(2)-[LIVM)-Y-D-x-(FYJ-x-G-x(2)-L.N-P-R. 

NAME: Ergosterol biosynthesis ERG4/ERG24 family signature 2 
CONSENSUS: [LIVM](2)-H-R-x(2)-R.D-x(3>-C-x(2).K-Y-G. 

NAME: NNMT/PNMT/TEMT femily of methyltiansfciascs signature 
CONSENSUS: L-I-l>I-G-S-G-P-T-nV)-Y-Q-L-L-S-A^. 

NAME: RNA methyltransferase trniA family signanire 1 

CONSENSUS: {DN].P-(PA]-R.x-G-x(14, 16)-[LIVM)(2)-Y.x-S-C-N-x(2)-T. 

NAME: RNA methyltransferase trmA fiunily signature 2 
CONSENSUS: rLlVMF)-I>-x-F-p.CQHY]-[STl-x-H-[UVMFY]-E. 

NAME: Thymidylate synthase acdve site. 
consensus! |f:^^^f™-''<3HFW).[QNJ.x^ 

NAME: Ribosomal RNA adenine dimediylases signature 

^^r!c^^!^o* flJVMl.[LIVMFY]-[DE]-x-G-[STAPV]<3-x.[GAl-x-[I^ 

CONSENSUS: x(6)-[LIVMY]-x-ESTAGVj-[LIVMFYHC]-E-x-D. J *UMi^vMj 

NAME: Methylated-DNA-protetn-cysteine methyltransferase active site 
CONSENSUS: [UVMF]-P-C-H-R-(LIVMF3(2). 

NAME: N-6 Adenine-specific DNA metfaylases signature 
CONSENSUS: IUVMAC]-tLIVFYWAJ-x-[DNl-P-P-tFYW]. 

NAME: N-4 cytDsine-specifk DNA mefliylases signature. 
CONSENSUS: ILIVMFJ-T-S-P-P-[FY] . 

NAME: C-5 cytosine-specific DNA methylases active site. 

CONSENSUS: PENKS^x-[FLIV^3^(2)-[GSTq.x-P-C-x(2HFYWlJMl-S. 
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NAME: C-5 cytosine-specific DNA methylases C-temiinal signature. 

CONSENSUS: (RKQGTF]-x(2)-G-N-ISTAGl-fUVMF|-x(3)-[LIVMTI-x(3)-[LIVMl.x(3)-[LIVMJ, 

NAME: Protcin-L-isoaspaftatc(D-aspanate) 0-mcthyliransferasc signature. 
CONSENSUS: [GSA]-D-G-x(2)-G-[FYWV]-ji(3)-(AS)-P-[FY)-[DN].x-L 

NAME: Uroporphyrin-III C-methyltransfense signature I. 

CONSENSUS: [LIVMJ-IGSHSTALl-G.p.G-x(3)-(LIVMFY]-[LIVMl-T-[LIVM)-[KIWQ^^ 

NAME: Uroporphyrin-III C-methyltransferase signature 2. 

CONSENSUS: V-x(2)-fLII-x(2)-G-D-x(3)-[FYW]-IGS]-x(8)-[UVH-x(5.6HLIVMFYWP 
CONSENSUS: x-[UVMYJ-x-P-G. 

NAME: ubiE/C0Q5 methyl transferase family signature 1. 
CONSENSUS: Y-D-x-M-N-a(2)-[LIVM]-S-x(3)-H-x(2)-W. 

NAME: ubiE/C0Q5 methyltransferase family signature 2. 

CONSENSUS: R-V-rUVM].K^V)-G-G-x-[LIVMF].x(2)-(LIVM]-E-x-S. 

NAME: Serine hydroxymethyltransferasc pyridoxal-phosphate attachment sitse. 

CONSENSUS: [DEH]-[LIVMFYl-x4STMVl-IGSTI-[STJ(2)-H-K-(Sn-[LFJ-x^[PAC]-rRQI- 

CONSENSUS: IGSAJ-[GA]. 

NAME: Phosphoribosylglyciiiamide formyluansferase active site. 

CONSENSUS: G-x-(STM]>{IVT]-x-[nnVVQ>[VMATl-x-[DEVMl-x-tUVMY^D-x-^^^^ 
CONSENSUS: x(6)-(LIVM] . 

NAME: Aspartate and ornithine carbamoylcransferases signature. 
CONSENSUS: F-x-[EKJ-x-S-[GTl-R-T. 

NAME: Transketolasc signature 1 . 

CONSENSUS: R-x(3)-[LIVMTAJ-[DENQSTHKn-x(5,<i)-[GSN]-G-H-[PUVMF]-[GSTAJ-x(2V 
CONSENSUS; fLlMC]-(GSJ . 

NAME: Transketolasc signature 2. 

CONSENSUS: G-{DEQGSAl-[DN]<J-(PAEQ]-[ST|-fHQ)-x-[PAGM]-(UVMYAC]-[DEFYW)-x(2)- 
CONSENSUS: (STAP]-x(2)-[RGAl. 

NAME: Transaldolase signature 1 . 

CONSENSUS: [DG1-(IVSAJT-[ST1-N-P-[STA]-[UVMFI(2). 

NAME: Transaldolase active site. 

CONSENSUS: [UVM]-x-flJVM]-K-[UVM]-n*ASJ-x^STl-x-[DENQPASl-G-lIJ^ 
CONSENSUS: IQEKRSTJ-x-[LIVM]. 

NAME: Acyltransfcrascs ChoActase / COT / CPT family signature 1. 
CONSENSUS: [U]-P-x-[LVP]-P-(IVTA]-P-x-[UVM]-x-[DENQASI-[STl-[UV^ 

NAME: Acyltransferases ChoActase / COT / CPT family signature 2. 

CONSENSUS: RHnfW)-x-(DAl-[KA)-x(0,l)-(UVMFn-x-[UVMI^C2>x<3)-[DNSl-IGS 
CONSENSUS: [DEh[HSl-x(3)-PEJ-[GA]. 

NAME: Thiolases acyl-enzymc imennediate signature. 

CONSENSUS: [UVM]-(NSTl-xC2)-C-[SAGLI]-[ST|-[SAG)-(UVMFYNSJ-x-CSTAGMLIVM3-x(6). 
CONSENSUS: [LIVM]. 

NAME: Thiolases signature 2. 

CONSENSUS: N-xa)-G-G-x-[LIVMl-[SAl-x-G-H-P.x-G-x-[STI-G. 
NAME: Thiolases active site. 

CONSENSUS: [AG3-[UVMA3-[STAGLIVMl-[STAG]-(LIVMA]<:.x-(AG].x-[AG]-x.(A 

NAME: Chloramphenicol acecyltransferase active site. 
CONSENSUS: Q-[LIV].H-H-(SAl-x(2)-D-G-(FYl-H. 

NAME: Hexapeptide-repeat containing -transferases signature. 

CONSENSUS: [UVl.[GAED]-x(2)-[STAV)-x-(LIV]-x(3)-[LIVAq-x-[LIV]-[GAED]-x(2)- 
CONSENSUS: lSTAVRj.x-[LIV)-(GAED).x(2)-(STAV]-x-[UVJ-x(3HUVl. 

NAME: Beta-ketoacyl synthases active site. 

CONSENSUS: G.x(4HLIVMFAPl-x(2HAGCl-C-[STA](2HSTAG]-x(3)-|JJV^ 
NAME: Cbalcone and stilbene synthases active site. 
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CONSENSUS: [^^'-j'^"'^SJ.x-[LIVM)-x.(QHGJ..-G-C-[FYNA]-[GAJ.G^^^ 

NAME: MyristoyI-CoA:proiein N-myristoyltransferase signature 1 
CONSENSUS: E-I-N-F-L-C-x-H-K. 

NAME: Myristoyl-CoA:protem N^myristoylcninsferase signature 2 

CONSENSUS: K-F-G-x-G-D-G. 

NAME: Gamma-glutamyltranspeptidase signature 

T^-fSTAJ.H.x-lST|4LIVMA3-x(4)^[SN].x.V-[STAhx.r^^^^ 
CONSENSUS: x(I,2)-[FY]-G. J 

NAME: Transglutaminases active site. 

CONSENSUS: [GTl^^[CA).W-V.x-tSA)-[GAl-IIVT|-x(2>-T-x.fU4SC].MCSAJ^^^ 

NAME: Phosphoiylase pyridoxal-phosphate attachment site. 
CONSENSUS: E-A.[SCl-G-x-[GS].x-M-K-x(2).(LM]-N. 

NAME: UDP-glycosyltransferascs signature. 

^S^cfl^oH^' f'^'''<2)^»(2).[LIVMYAl-[LIMV]-x(4.6)-[LVGACHLVFyA3-[LIVMFl-[CT^ 

^^Mc^Mc,^c f"NQJ-fSTAGC]-G.x(2)-tSTAG]-x(3HSTAGLJ-[UVMFAlWlPQRl-^^^ 
CONSENSUS: x(3)-(PAJ.x(3HDES3-[QEHN]. v ;i v Jl * mij 

NAME: Purine/pyrimidine phosphoribosyl transferases signature 
NAME: Glutamine amidotransferases class-I active site. 

CONSENSUS: tPAS]-[LIVMFYTl^IVMFY]<5.[UVMFYl^-[LIVMFYNl.G-x.^ 

NAME: Glutamine amidotransferases class-II active site. 
CONSENSUS: < x(0. 1 1 H:-[GSI-[IV]-ILIVMFYW].[AG). 

NAME: Purine and other phosphorylases family 1 signature 
CONSENSUS: [GST|.x-G-[LIVM)-G-x-fPA]-S-x-[GSTAl.I-x(3)-E-L. 

NAME: Purine and other phosphorylases family 2 signature. 

CONSENSUS: [LIV]-x(3>-G-x(2^H-x-[UVMFn-x(4HUVMF]-x(3)-[ATV]-x{l,2)-{LIVM].x^ 
CONSENSUS: ^ATV]-x(4^[GN]-x(3.4HLIVMF)(2)-x(2).(STN]-[SA)-x-C-[GS]-[LIVMJ. 

NAME: Thymidine and pyrimidioe-nucleoside phosphorylases signanire 
CONSENSUS: S-CGSl.R-[GAl-IUV)-x(2)-rrAJ-(GAl^-T-x-D-x-[ljv^ 

NAME: ATP phosphoribosyltransfeiase signature. 

CONSENSUS: E-x(5)-G-x-[SAG]-x(2)-[lV]-x.D-[LIV|-x(2HSTI-G-x.T-[LM]. 

NAME: NAD:argtnine ADP-ribosyltransferases signature 
CONSENSUS: (FYl-x.tFy]-K.x(2)-H-[Fy]-x-L-tSTl-x-A. 

NAME: Prolipoprotein diacylglyceiy] transferase signaiure. 
CONSENSUS: G-R-x-(GAJ-N-F.[UVMFl-N-x-E-x(2)-G- 

NAME: S-adenosylmethionine synthetase signature 1 
CONSENSUS: G-A-G-D-<^G-x(3)-G-Y. 

NAME: S-adenosyfanethionine synthetase signature 2. 
CONSENSUS: G-IGA]^-{ASC]-F-S-x-K-[DEJ. 

NAME: Pdiyprenyl synthetases signature 1. 

CONSENSUS: [UVM)(2)-x-D.D-x(2,4)-D-x(4)-R-R-(GHI. 

NAME: Polyprenyl synthetases signature 2. 

CONSENSUS: (UVMFY]-G-x(2)-[FYL]-Q-[LIVM)-x-D-D-[UVMFYl-x-PNG]. 
NAME: Squalene and i^ytoene synthases signature 1. 

CONSENSUS: Y-{CSAMKxa)-(VSGl-A4GSA)-IUVATl-(IV]^-xaHU«SCFx^^ 

NAME: Squalene and phytoene synthases signature 2. 

SS^cI!?' ^lJVM)<^x(3M^x(2.3^N-[IF]-x-R-D.lUVMFY^ 
CONSENSUS: x-P. 

NAME: Protein prenyitransfeiases alpha subunit repeat sigmtute. 
CONSENSUS: IPSIAV]-x-[NDFV]-[NEQIY).x4UVMAGIl-WKNQS^ 
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NAME; Riboflavin synthase alpha chain family signature. 

CONSENSUS: [UVMF]-x{5)-G-tSTADNQ]-[KREQIYWJ-V.N-[LrVM)-E. 

NAME: DIhydropteroate synthase signature I . 

CONSENSUS: [UVM]-x-(AC}-(UVMFl(2)-N-x-T-x-D-S-F-x-D-x-[SGJ. 
NAME: Dihydropteroate synthase signature 2. 

CONSENSUS: [GE)-[SA)-x-[LIVMK2)-D-lUVM]-G-tGPJ-x(2MSTA]-vP. 
NAME: EPSP synthase signature 1. 

CONSENSUS: [LIVMl-x(2>-[GNl-N.[SA)^-T-[STA]-x-R-x-CUVMY].x-(GSTAl. 
NAME: EPSP synthase signature 2. 

CONSENSUS: [KR]-x-(KH]-E-[CST]-[DNEJ-R-[UVMJ-x-lSTAJ-[LIVMC]-x(2)-[EN)-(LIVMF].x- 

CONSENSUS: PCRA)-[LIVMF|-G. 

NAME: FLAP/GST2/LTC4S family signature. 
CONSENSUS: G-x(3)-F-E-R-V-[FY]-x.A-(NQ)-x-N-C. 

NAME: Aminotransferases class-I |yyridoxa]-phosphate attachment site. 

CONSENSUS: lGSJ-fUVMnrTACl-[GSTA]-K-x(2MGSALVN].ILIVMFA]-x.[GNAR]-x-R-[LIVMA]- 
CONSENSUS: [GAJ. 

NAME: Aminou^nsferases dass-II pyridoxal-phosphate attachment site. 

CONSENSUS: T-fUVMFYW]-(STAG]-K-[SAGl-[LIVMFYWR)-[SAG]-x(2HSAGl. 

NAME: Aminotransferases class-Ill pyridoxal -phosphate attachment site. 

CONSENSUS: (UVMFyWC](2)-x-D-E-(UVMA]-A(2HGP]-x(0,l>-[LIVMFYWAG)-x(0,l)-lSACR)-x- 
CONSENSUS: [GSADJ.x(12.16)-D-[LIVMFYWCl-x(2.3)-[GSA]-K-x(3HGSTADNHGSAJ. 

NAME: Aminotransferases class-IV signature. 

CONSENSUS: E-x-[STAGCn-x(2)-N-aiVMFACHFY]-x(6.12)-[UVMH-x-T-x(6,8)-(UVMh^^ 
CONSENSUS: [GS]-[LIVM]-x-(KR). 

NAME: Aminotransferases class-V pyridoxal -phosphate attachment site. 

CONSENSUS: [UVFYCHTl-lDGH]-[UVMFYAq-[LIVMFYA]-x(2)-[GSTACl-IGSTAJ-CHQR]-K- 
CONSENSUS: x(4.6)-G.x-lGSAT|-x-[LIVMFYSAq. 

NAME: Hexokinases signature. 

CONSENSUS: [UVMI-G-F-[TNJ-F-S-[FY]-P-x(5HUVMl-PNST|<3HUVM].^^^ 
CONSENSUS: [LFl. 

NAME: Galactokinasc signature. 

CONSENSUS: G>R-x-N-[LIV]-I-G-E-H-x-D-Y. 

NAME: GHMP kinases putative ATP-binding domain. 

CONSENSUS: EUVM]-[PK^x-[GSTAl-x(0.l)-G-L-IGSl-S-S•[GSAHGSTAQ. 

NAME: Phosphofnjctokinase signature. 

CONSENSUS: (RK).x(4)-G-H-x-Q-(QR)-G-G-x(5)-D-R. 

NAME: pfkB family of carbohydrate kinases signamre I. 
CONSENSUS: (AG]-G-x(0. l)-[GAP)-x-N.x-[STA}-x(6)-(GSl-x(9Hi. 

NAME: pfkB family of carbohydrate kinases signature 2. 

CONSENSUS: [DNSK]-(PSTV3-x-(SAGl(2HGD)-I>.x(3HSAGV3-[AGHLIVMFYl-[LIVMSTAP3. 

NAME: ROK family signature. 

CONSENSUS: (UVM)-x(2)-G-[LIVMFCn-G.x-tGAHUVMFA}-x(8)-G-x(3.5HGATPj-x(2). 
CONSENSUS: G-[R]CH]. 

NAME: Pbosphoribulokinase signature. 

CONSENSUS: K-[LIVM3-x-R-D-x(3).R-G-x-[STl-x-E. 

NAME: Thymidine kinase ceilular-type signature. 

CONSENSUS: IGA]-x(l,2)-(DE]-x.Y.x-(STAPJ-x<:-(NKR]-x-[CHl-[LIVMFYWHl. 
NAME: FGGY fatnily of carbohydrate kinases signature 1. 

CONSENSUS: IMFVGS]-x-pSTI-xa)-K-[LIVMFYW]-x-W^UVMF]-x-[DENQTO 
NAME: FGGY family of carbohydrate kinases signature 2. 

CONSENSUS: (GSA]-x.[LIVMFYW].x-G-ILIVM].x(7.8)-CHDEN(M-[UVMF]-x(2HAS]4S^^ 
CONSENSUS: {UVMFY)-EDEQ). 
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NAME: Protein kinases ATP-binding region signature 
CONSENSUS: x(5.18)-[LIVMFYWCSTAR]-[AIVP]-tLIVMFAGCKRl-K. 

NAME: Serine/Threonine protein kinases active-site signature. 

CONSENSUS: [LIVMFYQ-x-[HY)-x-D-(UVMFY)-K-x(2)-N-[LIVMFYCT|(3). 

NAME: Tyrosine protein kinases specific active-site signature. 

CONSENSUS: rLIVMFYCJ-x-[HYl.x.I>.[UVMFyi-(RSTACJ.x(2)-N-[UVMFYC](3). 

NAME: Protein kinase domain profile. 

NAME: Casein kinase 11 regulatory subunit signature. 

CONSENSUS: C-P-x-[LIVMY]-x<:-x(5)-L-P-[UVMq-G-x(9)-V-[iaiJ-x(2K-P-x<;. 
NAME: Pyruvate kinase active site signature. 

CONSENSUS: [UVACJ-x-|lJVM3(2HSAPCV]-K.(UV3-E-INKRSTI-x-nOEQHJ.(GCT^ 

NAME: Shikimate kinase signaoire. 

CONSENSUS: [KRI-x(2)-E-x(3)-CLIVMF)-x(8.12>ILIVMFI(2).[SAJ.x<}(3)-x-[LIVMF], 

NAME: Prokaryotic diacylglycerol kinase signature. 
CONSENSUS: E-x-CLIVMl.N-[ST)-[SA]-[UV]-E-x(2)-V.D. 

NAME: Phosphatidylinositol 3- and 4-kinases signature 1 

CONSENSUS: [UVMFAC]-K-x(l,3)-[DEA)-(DEJ.[LIVMC]-R-Q.(DE)-x(4)-Q. 
NAME: Phosphatidylinositol 3- and 4-kinases signature 2. 

CONSENSUS: [C3S]-x-(AV]-x(3)-[LIVM]-x(2)-IFYH]-[UVM)(2)-x-(LIVMF]-x-M^ 

NAME: Acetate and butyrate kinases family signature 1 . 
CONSENSUS: (UVMl(2)-x-(LIVM)-N-x-G-S-[ST!-S.x-[KE]. 

NAME: Acetate and butyrate kinases family signature 2. 

CONSENSUS: [UVMA](2)-x(2)-H-x-G-x.G-x-ISTJ-[LIVM]-x-{AV].x(3K5. 

NAME: Phosphoglyccrate kinase signature. 

CONSENSUS: [KRHGTCV^tVT].[UVMFl-[UVMC]-R-x.D-x-N-CSACVl-P. 
NAME: A^rtokinase signature. 

CONSENSUS: [LIVM]-x-K-[FYJ-G-G-[Sn PC]-(LIVM]. 

NAME: Gtutamate 5-kinase signature. 

^^^cMc ^GST^q-xa)-C^x<HCK:HIMl-x-ISTA^K-(UVM^MSA]-^rcA]-IaHGALV^ 
CONSENSUS: x(3)-G. 

NAME: ATP:guanido phosphotransferases active site. 
CONSENSUS: C-P-x(0. 1)-[ST|-N-[ILJ-G-T. 

NAME: PTS HPR component histidine phosphorylation site signaure 
CONSENSUS: G-[LIVM)-H.[STA^R-[PAHGSTA1-[STAM1. 

NAME: PTS HPR component serine phosphorylation site signature 
CONSENSUS: tGSADE^[KREQTVl-x(4^[KRN]-S-IUVMF]a^^^ 

NAME: PTS EIIA domains phosphorylation site signature 1 
CONSENSUS: G-x(2>[LIVMF]{3)-H-[LIVMFl^-[UVMFl-x.T-[ALV]. 

NAME: PTS EIIA domains phosphorylation site signature 2 

CONSENSUS: ^>EN<a-x(6)-DJVMI^^GA^xa>[LIVM^A-ILIVMl.P-H.IGACl. 

NAME: PTS EIIB domains cysteine phosphorylation site signature 
CONSENSUS: N-[LIVMFYl-x(5)-C-x-T-R-[UVMF]-x-[LIVMIl-x-ILIVM)^ 

NAME: Adenylate kinase signature. 

CONSENSUS: IUVMFYW](3)-D-G-{FYI]-P-R-x(3HNQ). 

NAME: Nucleoside difdiosphate kinases active site. 
CONSENSUS: N-x(2)-H-[GAJ-S-D-[SA)-[LIVMPKNE]. 

NAME: Guanylatc kinase signature. 

CONSENSUS: T.[STI-R-x(2)-(KR]-x(2>PEl-x(2K5-x(2^Y.x-n^HUVl^^ 
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NAME: Guanylaie kinase domain profile. 

NAME: Phosphoribosyi pyrophosphate synthetase signature. 

CONSENSUS: D-(Ln-H-fSAl-x-0-(IMST|-[QMl-G-[FY].F-x(2)-P-{UVMFC)-D. 

NAME: 7,8-<iihydro-6>hydroxymethylpterin-pyrophosphokinase signature. 
CONSENSUS: G'(PEi-R-x(2)-D-L-D-IUVMJ(2). 

NAME: Bacteriophage -type RNA polymerase family active site signature 1. 
CONSENSUS: P-fLIVM]-x(2).D-[GAMSTJ-[ACHSNl-IGA]-[LIVMFYl-Q. 

NAME: Bacteriophage^type RNA polymerase family active site signature 2. 
CONSENSUS: (LIVMF]-x-R-x{3)-K-x(2)-ILIVMF|-M-[PT|.x(2)-Y. 

NAME: Eukaryoiic RNA polymerase 11 hcpiapeptide repeat. 
CONSENSUS: Y-(Sn-P-[ST]-S-P-ISTANKJ. 

NAME: RNA polymerases beta chain signamre. 

CONSENSUS: G-x.K-[LIVMFA]-[STAC]-IGSTNhx-rHSTA]-[GSHQNHl-K-G-lIVT]. 
NAME: RNA polymerases M / IS Kd subunits signature. 

CONSENSUS: F-C-x-fDEKSn-C.[GNK)-[DNSA)-fLIVMH]-[LrVMJ-x(8.14)-C-x(2)-C. 

NAME: RNA polymerases D / 30 to 40 Kd subunits signature. 

CONSENSUS: N-[SGA)-(UVMFJ-R-R-x(9HSAl-x(3)-V.x(4)-N-x-(STA]-x(3HDN]-E-x.(LI]- 
CONSENSUS: [GAl-x.R-[LI)-[GA]-(LIVM](2)-P. 

NAME: RNA polymerases H / 23 Kd subunits signature. 
CONSENSUS: H.(NEI|-[LfVMl-V-P-x-H-x(2)-(UVM]-x(2MDEJ. 

NAME: RNA polymerases K / 14 to 18 Kd subunits signature. 
CONSENSUS: [STl-x-|FYl-E-x.[AT]-R-x-[LIVMl-[GSAI-x-R-lSA]-x-Q. 

NAME: RNA polymerases L / 13 to 16 Kd subunits signature. 

CONSENSUS: [DEK2)-H-IST]-[LIVMMGAP]-N-x(n)-V.x-(FMl-x(2)-Y-x(3)-H-P. 

NAME: RNA polymerases N / 8 Kd subunits signature. 
CONSENSUS: [LfVMF)(2)-P-lLrVM]-x-C-F-(ST]-C-G. 

NAME: DNA polymerase family A signature. 

CONSENSUS: R-x(2)-IGSAVl.K-x(3)-[LIVMFY]-(AGQl-x{2)-Y-x(2).IGS]-x(3)-[UVMA]. 
NAME: DNA polymerase family B signature. 

CONSENSUS: (YA]-[GLIVMSTACl-D.T-D-[SG).(LIVMFTCl-x-(LIVMSTAC). " 

NAME: DNA polymerase family X signature. 

CONSENSUS: G-[SG)-rLFY].x-R-(GE|-x(3)-{SGCL]-x.D-[LIVMJ-D-[LrVMFY)(3)-x(2HSAP]. 

NAME: Galactose- 1 -phosphate uridyl transferase family 1 acuve site signature. 
CONSENSUS: F-E-N-(RK]-G-x(3).G-x(4>-H-P-H-x-Q. 

NAME: Galactose* 1 -phosphate uridyl (ransfei^se family 2 signature. 
CONSENSUS: D-L-P-I-V-G-G-ISTJ-[LIVMl(2)-[SA)-H.[DEN].H.rFY]^J-G-G. 

NAME: ADP-glucosc pyrophosphorylasc signature 1 . 

CONSENSUS: lAGl-G-G-x-G-[STK]-x-L-x(2)-L-[TA]-x(3)-A-x-P-A-ILV]. 

NAME: ADP-glucose pyrophosphorylasc signature 2. 
CONSENSUS: W-(FYl-x.G-[ST]-A.tDNSH}-(ASl-[LIVMFYW]. 

NAME: ADP-glucose pyrophosphorylasc signature 3. 

CONSENSUS: tAPV)-(GS]-M-G>(LIVMNJ-Y-[rvC]-[LrVMFY3-x(2)-IDENPHK3. 
NAME: Phosphatidate cyttdylyltransferase signature. 

CONSENSUS: S-x-[LiyMF^K-R-x(4>K-D-x-[GSAl.x(2HLI).IPG]-x.H-G^-[LIVMl-x-D-R^ 
CONSENSUS: tLIYMFTJ-D. 

NAME: Ribonucleasc PH signature. 

CONSENSUS: C.[DE]-(LIVM](2>-Q-[GTA]-D-G-{SG]-x(2)-(TA)-A. 

NAME: 2'-5'-oligoadenylatc synthetases signature 1. 

CONSENSUS: G-G-S-x-[AGJ-[KR]-x-T-x-L-[KR]-(GST]-x-S-D-[AG). 

NAME: 2'-S'-oligoadenylate synthetases signature 2. 
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CONSENSUS: R-P-V.I-L-D-P-x-[DE].P-T. 

NAME: CDP-alcohol phosphatidyltransfcrases signature 
CONSENSUS: D-G-x(2)-A-R-x(8)-G-x(3)-D-x(3)-D. 

NAME: PEP-utilizing enzymes phosphoiyladon site signature 

CONSENSUS: G-[GA3-x.rrN].x-H-[STAJ-ISTAV).(LIVM](2HSTAV)-IRGI. 

NAME: PEP-utilizing enzymes signature 2. 
NAME: Rhodanese signature 1. 

CONSENSUS: [FV3-x(3).H.ILIV]-P-G-A-x(2HLIVFl. 
NAME: Rhcxianesc C-iermina] signature. 

CONSENSUS: [AV)-x(2)-[FY].[DEAPJ-G.[GSA]-[WFl-x-E-[FYW). 
NAME: CoA transferases signature 1. 

CONSENSUS: IDNI-[GN]-x(2HLIVMFAK3)-G-G-F-x(3Hi-x-P. 

NAME: CoA transferases signanire 2. 

CONSENSUS: [LFI-fHQ]-S-E-N-G-[LIVF](2).IGAl. 

NAME: Phospholipase A2 histidine active site. 
CONSENSUS: C-C-x(2)-H-x(2>C. 

NAME: Phospholipase A2 aspartic acid active site. 
CONSENSUS: [LIVMAl-C-{UVMFYWPCST}-C.D-x(5K. 

NAME; Lipases, serine active site. 

CONSENSUS: [LIV]-x.[LIVFYl-[LIVMSTI-G-[HYWV)^x<}-[GCTAC). 

NAME: Colipase signature. 
CONSENSUS: Y-x(2)-Y-Y-x-C.x<:. 

NAME: Lipolytic enzymes "G-D-S-L" family, serine active site. 
CONSENSUS: aiVMFYAG](4)-G-D-S-[LrVM]-x(l ,2)-[TAG].G. 

NAME: Lipolytic enzymes "G-D-X-G' family, putative histidine active site 
CONSENSUS: [UVMn(2H.(UVMiq.H.G-G-ISAG]-CFYl.x{3)-[STDl!o.x(2HST]-H. 

NAME: Lipolytic enzymes •G-D-X-G" family, putative serine active site 
CONSENSUS: [LIVM3-x.[LIVMFl-CSA]-G.I>-S-CCA}-G-[GAI-x.L-[CA]. 

NAME: Carboxylcsterases type-B serine active site. 

CONSENSUS: F-{GRJ-G-x(4>-[LIVM]-x-[LIV]-x-G-x-S-[STAG)-G. 

NAME: Carboxylcsterases lype-B signature 2. 

CONSENSUS: [EDJ-D<;.i^(YTl-tUV].(DNS>[LIVh[UVFYW)-x-(PQRJ. 
NAME: Pectinesterase signature i. 

CONSENSUS: [GSTN]-x(5)-[LIVMH-[LIVM1.x(2K;.x-Y-[DNK3.E-x.(LIV^^^^ 

NAME: Pectinesterase signature 2. 
CONSENSUS: G-(STAD)-[LIVMT]-D.F-I-F-G. 

NAME: Peptidyl-tRNA hydrolase signature 1 . 

CONSENSUS: [FYl-xa)-T-R.H-N-x-G-x(2)-[LIVMFAl(2)-CDE). 

NAME: Peptidyl-tRNA hydrolase signature 2. 

CONSENSUS: [G5]-x(3)-H-N-G-[LIVM]-[ICRHDNS]-[LIVMD. 

NAME: Alkaline phosphatase active site. 

CONSENSUS: ^V]-x-D-S-[GASHGASq-lGAST)-[GA^T. 

NAME: Histidine acid phosphatases phosphohistidine signature 

CONSENSUS: tUVMl<2HUVMA).x(2)-[LIVMl-x.R.H-{GNJ.x-R-x.[PAS3. 

NAME: Histidine acid phosphatases active site signature. 

CONsSs: g^J''^•*-^^^AG>«(2HSTA0^*D-ISTANQ^,^UV^ 
NAME: Qass A bacterial acid phosphatases signature. 
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CONSENSUS: G-S-YP-S-G-H-T. 
NAME: S'-nuclcotidase signature 1. 

CONSENSUS: (UVM]-x-[LIVM]{2)-[HEA)-fni-x-D-x-H-[GSAl-x-fUVMFl. 

NAME: 5 '-nucleotidase signature 2. 

CONSENSUS: [FYP]-x(4)-[UVMl-G-N-H-E-F-[DN]. 

NAME: Fructose-l-6-bispho$phatase active site. 

CONSENSUS: [AG^[RK^L-x(1.2HLIVl.[FY]-E-x(2)-P-^LIVM]-[GSA). 

NAME: Scrine/direonine specific protein phosphatases signature. 
CONSENSUS: [UVM]-R-G-N.H-E. 

NAME: Protein phosphatase 2A regulatory subunit PR55 signature I . 
CONSENSUS: E-F-D-Y-L-K-S-L-E-I-E-E-K-I-N. 

NAME: Protein phosphatase 2A regulatory subunit PRS5 signature 2. 
CONSENSUS: N-(AG]-H-rrA)-Y-H-l-N-S-I-S-[LIVM]-N-S-D. 

NAME: Protein phosphatase 2C signature. 

CONSENSUS: [UVMFY)-(LIVMFYA]'IGSAC]-[LIVM]-[FYC1-D-GH-[GAV1. 

NAME: Tyrosine specific protein phosphatases acdve site. 

CONSENSUS: [LIVMF)-H-C-x(2)-G-x(3)-[STCJ-[STAGP)-x-[UVMFYl. 

NAME: Tyrosine specific protein phosphatases profile. 

NAME: Dual specificity protein phosphatase profile. 

NAME: PTP type protein phosphatase profile. 

NAME: Inositol monophosphatase family signature 1. 

CONSENSUS: [FWV]-x(0J)-(UVM]-D-P-[LIVMJ-D-(SGl-[ST]-x(2)-[FY]-x-[HKRNSTY]. 
NAME: Inositol monophosphatase family signature 2. 

CONSENSUS: [WVl-D.x-IACJ-(GSA).[GSAPV]-x-[LIVACPHUVJ-ILIVACl-x(3HGH)-[GAI. 

NAME: Prokaiyodc zinc-dcpcndem phospholipase C signature. 
CONSENSUS: H-Y-x-[GT] D-[LIVMJ-[DNSJ-x-P-x-H-[PAl-x-N. 

NAME: Phosphatidylinositol -specific phospholipase X-box domain profde. 

NAME: Phosphatidylinosttol-specilic phospholipase Y-box domain profile. 

NAME: 3'5'<yctic nucleotide phosphodiesterases signature. 
CONSENSUS: H-D-[LIVMFY]-x-H-x-[AGl-x(2)-(NQJ-x-[LIVMFYl. 

NAME: cAMP phosphodiesterases class-II signature. 

CONSENSUS: H.x-H-L-D-H-CLIVMJ-x-IGSJ-[UVMAHUVMl(2)-x-S-[AP). 
NAME: Sutfatasessignamre 1. 

CONSENSUS: (SAPl-[LIVMST3-(CSJ-[STAC3-P-[STA]-R-x(2>-ILIVMFW](2)-(TR)-G. 
NAME: Sulfatases signature 2. 

CONSENSUS: G.[YVl-x-tSn-x(2)-[IVAJ-G-K-x(0,I)-[FYWK)-[HLl. 

NAME: AP endonucleases family 1 signature 1. 
CONSENSUS: {APF]-D-[LIVMFl(2)-x-[LIVM)-Q-E-x-K. 

NAME: AP endonucleases family 1 signature 2. 

CONSENSUS: D-(ST]-fFY]-R-(KH]-x{7.8)-[FYW]-[ST]-IFYWK2). 

NAME: AP endonucleases family 1 signature 3. 

CONSENSUS: N-x-G-x-R-[UVM]-I>-(LIVMFYHJ-x-[LV]-x.S. 

NAME: AP endonucleases family 2 signature 1 . 
CONSENSUS: H.x(2>-Y-[LIVMF]-tIM]-N-[LIVMCAMAG). 

NAME: AP endonucleases family 2 signature 2. 
CONSENSUS: [GR]-[LIVMF]-C-[LIVM]-D-T-C-H. 

NAME: AP endonucleases family 2 signature 3. 

CONSENSUS: ILiVMW).H-x-N-[DE]-[SA]-K-x(3)-G-ISA]-x(2)-D. 
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NAME: Deoxyribonuc lease I signature 1 . 

CONSENSUS: [UVM](2)-(AP]-L-H-[STA)(2)-P-x(5)-E-[UVMl-[DN].x.L.x.IDE]-V. 

NAME: Deoxyribonuclease I signature 2. 
CONSENSUS: G.D-F-N-A-x-C-[SAJ. 

NAME: Endonuclease III iron-sulfur binding region signature. 
CONSENSUS: C-x(3)-[iCRSJ-P-(KRAGLl^-x(2H:-x(5)-C. 

NAME: Endonuclease lit family signature, 

CONSENSUS: lGSTl-x-[LIVMF|-P-x(5)-lLIVMW]-x(2.3)-[LI]-[PAS]-G-V-[GA]-x(3)-fGACl- 
CONSENSUS: x(3)-lUVM].x(2)-[SALVJ-fLIVMFYW]-[GANK]. 

NAME: Ribonuclease II family signature. 

CONSENSUS: ^HIJ-[FYE)-[GSTAM]-[LIVM]-x(4.5)-Y.[STALJ-x-^FWVACJ-^IV^^SAl-P-[L^m 
CONSENSUS: fRQ]-[KR).(FY)-x-D-x(3)-lHQ]. 

NAME: Ribonuclease III family signature, 

CONSENSUS: [DEQJ-IRQ)-[LM]-E-{FYWJ-rLVl-G-D-[SAR]. 

NAME: Bacterial Ribonuclease P protein component signature. 

CONSENSUS: fLIVMFYS]-x(2>-A-x(2)-R-[NH]-[KRQL]-[LIVMl.[KRA3-R-x-[UVMTAI-(KR]. 

NAME: Ribonuclease T2 family histidine active site L 
CONSENSUS: [FYWL]-x-{LIVMl-H-G<L-W.P, 

NAME: Ribonuclease T2 family histidine active site 2. 

CONSENSUS: [UVMFJ-x<2)-(HDGTY]-[EQ]-[FYW]-x-[KR|-H-G-x^. 

NAME: Pancreatic ribonuclease family signature. 

CONSENSUS: C-K-x(2)-N-T-F. 

NAME: DNA/RNA non-specific endonucleases active site. 
CONSENSUS: D-R-G-H-[QlL]-x(3)-A. 

NAME: Thermonuclcase family signature I , 

CONSENSUS: D-G-D-T-[LIVMJ-x-[LIVMC]-x(9.10)-R-[LiVM3-x(2).[UVM].D-x.p.E, 

NAME: Thennonudease family signature 2. 

CONSENSUS: D-[KR]-Y-[GQ]-R-x-[LV)-[GA3-x-[IVl.[FYW]. 

NAME: Beta-amylase active site I. 
CONSENSUS: H-x-C-G-G-N-V-G-D. 

NAME: Bcia-amylase active site 2. 

CONSENSUS: G-x-{SA)-G-E-[LIVM]-R-Y-P-S-Y. 

NAME: Glucoamylase active site region signature. 
CONSENSUS: lSTN)-[GPJ-x(1.2)-[DE]-x-W-E-E.x(2).(GS). 

NAME: Polygalacturonase active site. 

CONSENSUS: (GSDENKRH]-x(2)-[VMFC)-x(2)-[GS]-H-G-[LIVMAG]-x(l,2)-[LIVMl-G.S. 

NAME: Clostridium cclhilosome enzymes repeated domain signature. 

CONSENSUS: D-IL!VMI^-(DNV].x-[DNS].xaHUVM]-IDN)-(SAIJ4].x-D-xO^ 

CONSENSUS: IRKSj-x-ILIYMF]. 

NAME: Chilinases family 18 active site. 

CONSENSUS: (UVMFY]-[DN]-G-[UVMF]-[DN]-[HVMFl-[DN)-x-E. 
NAME: Chitinases family 19 signamre 1 . 

CONSENSUS: C.x{4,5)-F-Y.(ST)-x(3HFYJ.[LIVMF).x-A-x(3)-[YF]-x(2)-F.[GSA]. 

NAME: Chitinases family 19 signature 2. 

CONSENSUS: [liVM]-IGSA].F-x-ISTAG](2)-[LIVMFY]-W-[Fn-W-[LIVM). 

NAME: Alpha-lactalbumin / lysozyme C signature. 
CONSENSUS: C.x(3)-C-x(2)-[LMFl-x(3)-n>EN)-|IJ].x(5K. 

NAME: Alpha-galactDsidase signature. 

CONSENSUS: G-[LIVMFY)-x{2>[UVMFY).x-IUVM)-D-D-x-W.x(3,4)-R-lDNSF). 
NAME: Trehalase signature 1. 


1049 


wo 01/12659 PCT/IBOO/01496 

CONSENSUS: P-G-G-R-F vE-x-Y-x-W-D-x-Y. 

NAME: Trehalase signature 2. 

CONSENSUS: Q-W-D-x-P-x-(GA)-W-{PA)-P. 

NAME: Alpha-L-fucosi<Us€ putative active site. 
CONSENSUS: P-x(2)-L-x{3)-K-W-E-x-C. 

NAME: Glycosyl hydrolases family 1 active site. 

CONSENSUS: (LIVMFSTC1-[LIVFYS]-[LIV3-[LIVMST]-E-N-G-[LIVMFAR]-[CSAGN]. 
NAME: Glycosyl hydrolases femily 1 N-terminal signature. 

CONSENSUS: F-x-[FYWM)-[GSTAJ-x-(GSTA]-x-[GSTA)(2)-[FYNH]-(NQJ-x-E-x-(GSTA). 

NAME: Glycosyl hydrolases family 2 signature I . 

CONSENSUS: N-x-ILlVMFYWD]-R-(STACN](2>-H-Y-P-x(4)-lLIVMFYWl(2)-x(3)-[DN]-x(2)- 
CONSENSUS: G-(UVMFYW](4). 

NAME: Glycosyl hydrolases family 2 actd/base catalyst. 

CONSENSUS: [DENQFl-IKRVW3-N-H-(APl-[SACJ-[LIVMFl(3)-W-(GS]-x(2.3)-N-E. 
NAME: Glycosyl hydrolases family 3 active site. 

CONSENSUS: (UVMl(2)-[KR]-x-fEQK)-x(4)-G-[UVMFn-lUVT).tLIVMF]-[ST)-D-x(2)- 
CONSENSUS: {SGADNIJ. 

NAME: Glycosyl hydrolases family 5 signature. 

CONSENSUS: [UVl-(UVMFYWGA](2)-[DNEQGJ-[LIVMGST)-x-N-E-[PV)-IRHDNSTLIVFY). 

NAME: Glycosyl hydrolases family 6 signature 1. 

CONSENSUS: V-x-Y-x(2)-P.x-R-D-C-[GSAF].x(2)-IGSAl(2)-x-G. 

NAME: Glycosyl hydrolases family 6 signature 2. 

CONSENSUS: ILIVMYA]-(LIVA]>rUVT]-IUV]-E-P-D.(SAL}-[Lll-[PSAGl. 
NAME: Glycosyl hydrolases family 8 signature. 

CONSENSUS: A-[Sn-D-(AG]-D-x(2)-[IM]-A-x-(SA]-[LIVMl-[UVMGl-x-A-x(3)-[FW]. 
NAME: Glycosyl hydrolases family 9 active sites signature 1. 

CONSENSUS: (STV]-x.(LIVMFVl-[STV]-x(2).G-x-[NKR]-x(4)-[PLIVM]-H-x-R. 

NAME: Glycosyl hydrolases family 9 active sites signature 2. 
CONSENSUS: IFYW]-x-D-x(4)-[FYW]-x(3>-E-x-lSTAl-x(3)-N-(STA]. 

NAME: Glycosyl hydrolases family 10 active site. 

CONSENSUS: [GTA]-x(2>-[UVNl-x-[IVMF]-[ST>E-[UY]-[DN]-(UVMFl. 

NAME: Glycosyl hydrolases family 11 active site signature I. 
CONSENSUS: [PSAWLQ]-x-E-Y-Y-[LIVMl(2).{DEl-x-rFYWHN]. 

NAME: Glycosyl hydrolases family 11 active site signamre 2. 

CONSENSUS: aiVMF]-x(2>-E-[AG]-(YWGl'(QRFGSMSGJ-[STAN]-G-x-[SAF]. 

NAME: Glycosyl hydrolases fomily 16 active sites. 

CONSENSUS: E-[UV]-D-[LIVl-x(ai)-E-x(2HGQ]-[KRNFl-x-[PSTA). 

NAME: Glycosyl hydrolases family 17 signature. 

CONSENSUS: [LIVM)-x-ll.IVMFYWA](3)-[STAG)-E-[STA)-G-W.p.[STN)-x-[SAGQ]. 
NAME: Glycosyl hydrolases family 25 active sites signamre. 

CONSENSUS: IHUVMl-x(3)-INQl-IPG]-x(9.10)-G-x(4)-[LIVMFY](2)-K.x-(STl-E-[GS]-x(2)- 
CONSENSUS: Y-x-IDNl. 

NAME: Glycosyl hydrolases family 31 active site. 
CONSENSUS: (GF]-[LIVMF)-W-x>D-M-[NSAl-E. 

NAME: Glycosyl hydrolases family 31 signature 2. 

CONSENSUS: G^AV].D-(LIVMTl<:<^[FYl-x(3HSTl.x(3)-L<:-x.R-W-xa)-(LV)-[GS]-(^^ 
CONSENSUS: F-x-P-F-x-R-[DNl. 

NAME: Glycosyl hydrolases family 32 active site. 
CONSENSUS: H.x(2)-P-x(4)-lLIVM]-N-D-P-N-G 

NAME: Glycosyl hydrolases family 35 putative acdve site. 
CONSENSUS: G-G-P-[UVMK2)-x(2)-Q-x-E-N-E-[FYl. 
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NAME: Glycosyl hydrolases famUy 39 active site. 
CONSENSUS: W-x-F-E-x-W-N-E-P-IDN]. 

NAME: Glycosyl hydrolases family 45 active site. 
CONSENSUS: [STAj.T-R-Y-[FYW].D-x(5)-(CAJ. 

NAME: Prokaryotic transglycosylascs signature. 
CONSENSUS* ^^]^|^j^*^^^^>*^-^»(3HAP].x(3)-S^ 

NAME: Inosine-uridine preferring nucleoside hydrolase family signaftirc, 
CONSENSUS: D-x-D.[PTJ-IGAJ-x-D.D-[TAV3-[VI]-A. 

NAME: Alkyibase DNA glycosidases aJkA family signature. 

CONSENSUS: G-I-G-*-W-tST|-(AV]-A [LIVMFY](2)-A-(LIVMJ-x(8)-fMF]-A(2).tEDJ.D. 
NAME: Formamidopyrimidine-DNA glycosylase signature. 

CONSENSUS; C-x(2.4)<:.x-[GTA(a-x-[IV).x(7VR.(GSTAN] [STA]-x.fI^^-^^^ 

NAME: Uracil-DNA glycosylase signature. 

CONSENSUS: [KRHUV)-(LIVC]-[LIVM).x-G-IQI)-D.P-Y. 

NAME: S-adcnosyl-L-homocysteine hydrolase signature 1 . 

CONSENSUS: [CSJ-N.x-[FYL]-S-[Sn-[QAl-[DEN]-x-[AV](2)-A-A-ILIV]-ISAV). 

NAME: S-adenosyl-L-homocysteine hydrolase signanire 2 
CONSENSUS: G-K-xOHUVJ-x-G-Y-G-x-V-G-fKRJ-G-x-A. 

NAME: Cytosol aminopeptidase signature. 
CONSENSUS: N-T-D-A-E-G-R-L. 

NAME: Aminopeptidase P and proline dipeptidase signature 

CONSENSUS: [HA).(GSYR]-(LrVMn-[SG]-H-x-lLIVJ^-[UVM]-x-[IVI-H.[DE). 
NAME: Methionine aminopeptidase subfamily 1 signaoire. 

CONSENSUS: IMFY3.x-G.H-G.[UVMCHGSH3-x(3)-H-x(4)-[LIVM)-x-|HN]-[YWV). 
NAME: Methionine aminopeptidase subfamily 2 signature 

CONSENSUS: [DA].[LIVMYl.x-K-[LIVMJ-D.x.G-x.[HQ]-[LIVM]-pNS]-G.x(3)-[DNJ. 
NAME: Renal dipeptidase active site. 

CONSENSUS: fUVM)-E-G-[GAJ-x(2HUVMF)-x(6)-L-x(3)-Y-x(2)-G-[LIV\^ 

NAME: Serine carboxypeptidases. scrime active site. 
CONSENSUS: [UVM]-x^GTA)-E-S-Y.(AG)-[GS]. 

NAME: Serine carboxypeptidases. histidine active site. 

OTNSeJsUS; |{:^J'■''0>-n^STA^x-lIVPSn-x-[GSDNQI,HSAGV]^SG^H-x^WAQJ-p.^^^ 

NAME: Zinc carboxypeptidases, zinc<binding region 1 signature. 

TOnSs: [[|^-^M'^-»*IVMinn-x(4)-H-|STAG)-,.E-x-(^ 

NAME: Zinc carboxypeptidases. zinc-binding region 2 signature. 
CONSENSUS: H-[STAG]-x{3)-ILIVMEJ-xa)-rLIVMFYWl.p-[FYW]. 

NAME: Serine proteases, trypsin family, histidine active site 
CONSENSUS: (LIVM]-[ST]-A-fSTAG]-H-C. 

NAME: Serine proteases, trypsin family, serine active site 

CoSus; |^^|^^'•W-«2)^^DEl-S^tOSHSAPHV^(UVMFY^ 

NAME: Serine proteases, subtilase family, aspartic acid active site 

CONSENSUS: [STAIV>x-[LIVMF]-[UVM)-D-pSTA]-G-[LIVMFq-x(2.3)-PNH). 

NAME: Serine proteases, subtilase femily, histidine active site 

CONSENSUS: H-G-[STM]-x-[VIC]-[STAGC]-[GSJ-x-[LIVMA]-(STAGCLV].[SAGM]. 

NAME: Serine proteases, subtilase ftmily. serine active site 
CONSENSUS: G.T^-x-[SAl-x-P.xa>-[STAVCHAGl. ' 
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NAME: Serine proteases, V8 family, hisiidine active site. 

CONSENSUS: [STJ-G-[LIVMFYWl(3HGN)-x(2).T-(L!VMI.x-T-x(2>-H. 

NAME: Serine proteases, V8 family, serine active site. 
CONSENSUS: T-x(2)-[GC]-[NQ]-S-G-S-x-[LIVMMFYl. 

NAME: Serine proteases, omptin family signature 1 . 
CONSENSUS: W-T-D-x-S-x-H-P-x-T. 

NAME: Serine proteases, omptin femily signature 2. 

CONSENSUS: A.G.Y.Q-E-|Sn-R-IFyW]-S-(FYW].[TN]-A-x-G-G-(STJ-Y. 
NAME: Prolyl endopeptidase family serine active site. 

CONSENSUS: D-x(3)-A-x(3)-[LIVMFYW]-x(I4)-G-x-S-x-G-G-(LIVMFYWl(2). 
NAME: Endopeptidase Cip serine active site. 

CONSENSUS: T-x(2)-[UVMFl-G-x-A-[SAC!-S-[MSA].[PAGMSTA). 

NAME: Endopeptidase Cip histidine active site. 

CONSENSUS: R-x(3)-[EAP)-x(3>-[LIVMFYTl-M-[LIVMJ-H-Q-P, 

NAME: ATP-dependent serine proteases. Ion family, serine active site. 
CONSENSUS: D-G-(PDJ-S-A-CGS]-fLIVMCAJ-[TA]-(LIVM]. 

NAME: Eukaryotic thiol (cysteine) proteases cysteine acdve site. 
CONSENSUS: Q-x(3)-[GE]-x-C-[YW]-x(2)-ISTAGCMSTAGCV). 

NAME: Eukaryotic thiol (cysteine) proteases histidine active site. 

CONSENSUS: [LIVMGSTAN)-x-H-[GSACE]-(LIVM]-x-[LrVMAT|(2)-G-x-[GSADNH]. 
NAME: Eukaryotic thiol (cysteine) proteases asparagine active site. 

CONSENSUS: CFYCH)-[Wn-[LIVT]-x-rKRQAGl-N-(ST)-W-x(3HFYW)-G-x(2)-G-[LFYWl- 
CONSENSUS: [LIVMFYG)-x-[LI VMF] . 

NAME: Ubiquitin carboxyl-terminal hydrolase family I cysteine active-site. 
CONSENSUS: Q-x(3>-N-[SA}-C-G-x(3)-fLIVMl(2)-H-[SA)-[LIVM]-(SA). 

NAME: Ubiquitin carboxyl-terminal hydrolases family 2 signature 1. 

CONSENSUS: G-{LIVMFY]-x(l,3)-[AGC3-(NASM]-x-C-[FYW]-[LIVMq-[NSTl-{SACV]-x-[LIVMS> 

CONSENSUS: Q. 

NAME: Ubiquitin carboxyl-terminal hydrolases family 2 signature 2. 
CONSENSUS: Y-x-L-x-[SAG)-[LIVMFn-x(2)-H-x-G-x(4.5H5-H-Y, 

NAME: Caspase family histidine active site. 

CONSENSUS: H-x(2,4)-lSCJ-x(4)-[LIVMFl(2)-lSTl.H-G. 

NAME: Caspase family cysteine active site. 
CONSENSUS: K-P-K-tUVMF](4)-Q-A-C-[RQG)-G. 

NAME: Eukaryotic and viral aspartyl proteases active site. 

CONSENSUS: [LIVMFGACHUVMTADN]-IUVFSA]-D-(S^^-G-[STAV^[STAPDEN(M-x-lUVMFSTO 
CONSENSUS: x-[LIVMFGTA]. 

NAME: Neutral zinc metallopeptidases. zinc-binding region signature. 

CONSENSUS: (GSTAUVN]-x(2)-H-E-[LIVMFYW]-{DEHRKP}-H-x-ELIVMFYWGSP(a. 

NAME: Matrixins cysteine switch. 

CONSENSUS: P-R-C-(GN]-x-P-tDRMUVSAPKQ]. 

NAME: Insulinase fanuly, zinc-bindii)g region signature. 

CONSENSUS: G-x(8.9)-G-x.[STA]-H-IUVMFY]-[LIVMC]-[DERN)-[HRKL]-[LMFATl.x-[LF^^ 
CONSENSUS: [GSTAN]-[GST|. 

// 

AC PS01016; 

DE Glycoprotease family signature. 

CONSENSUS: (KRl-tGSAT^x(4V[FYWHLl-IDQNGK]-x-P-x-(LIVMFy].x(3>-H-x(2HAGl.H. 
CONSENSUS: [UVM]. 

NAME: Proteasome A-type subunits signamre. 

CONSENSUS: [FYl-x(4)-ISTNV]-x-tFYW)-S-P-x-G-(RKH)-x(2)-Q-[UVM)-(DE)-Y-[SAD)-x(2)- 
CONSENSUS: [SAO]. 
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NAME: Proieasome B-type subunits signature. 

CONSENSUS: lLIVMAl-[GSA]-[LIVMF)-x-fFYLVGAC].x(2)-(GSACFYl.[LIVMSTACJ(3HGACl- 
CONSENSUSr [GSTACV].[DES]-x(l5)-[RKl-x(12.13).G-x(2HGSTA].D. 

NAME: Signal peptidases I serine active site. 
CONSENSUS: IGSl-x-S-M-x-tPS)-(AT14LFl. 

NAME: Signal peptidases I lysine active site. 

CONSENSUS: K-R-lLlVMSTA)(2)-G-x-(PG)-G.pEJ-x-ILIVM)-x.IUVMFY]. 
NAME: Signal peptidases I signature 3. 

CONSENSUS: tUVMFYW](2).x(2)-G-D-[NH)-x(3)-[SND]-x(2)-[SGJ. 
NAME: Signal peptidases U signature. 

CONSENSUS: [GAF)-[GA1-[GAS)-CLIVM]-IGAS]-N-(LVMFG]-[LIVMFY]-D.R.[LIMFA). 
NAME: Peptidase family U32 signature. 

CONSENSUS: E-x.F-x(2H3-[SA]-[LIVM]-C-x(4).G-x-C-x-[LIVMJ-S. 
NAME: Amidases signature. 

CONSENSUS: G-{GA]-S-S>[GS]-G-x.[GSA}-[GSAVY].x.[LIVM].[GSA]-x(6HGSA]-x-[GA).x-D- 
CONSENSUS: x-[GA}-x S-[LIVM)-R-x-P-[GSAC]. J I aj x i/- 

NAME: Asparaginase / gluaminase active site signature 1. 
CONSENSUS: [LIVMJ-x(2)-T-G-G-T-(IV)-[AGSl. 

NAME: Asparaginase / glutaminase active site signature 2. 
CONSENSUS: G-x-[UVMJ-x(2).H-G-T-D-T-[LIVMJ. 

NAME: Urease nickel ligands signature. 

CONSENSUS: T-[AY)-[GA]-[GATl-(LIVM).D-x-H-tLIVM]-H-x(3)-P. 
NAME: Urease active site. 

CONSENSUS: [LIVM3(2>[CT)-H-[HN)-L-x(3)-(UVMl-x{2)-D-[LIVMl-x.F-A. 

NAME: ArgE / dapE / ACYl / CPG2 / yscS family signature 1 
CONSENSUS: [IJV]-(GALMY]>ILIVMF]-x-(GSA)-H-x-D-[TV].(STAV). 

NAME: ArgE / dapE / ACYl / CPG2 / yscS family signature 2. 

CONSENSUS: [GSTAn.[SANQ]-I>.x-K-(GSACN)-x(2>[LIVMA]-x(2)-[LIVMFYl-x(14.l7)-[LIVMl- 
CONSENSUS: x-[LIVMF]-[UVMSTAG)-[LIVMFA]-x(2)-[DNG].E-E-x-[GSTNJ. 

NAME: DihydrooTDiase signature 1. 

CONSENSUS: D-[LIVMFYWSAP]-H-[LIVA]-H-[UVF]-[RN)-x-[PGN]. 

NAME: Dihydroorotase signaoire 2. 
CONSENSUS: [GAHSTI-D-x-A-P.H-x(4)-K. 

NAME: Beta-laciamasc class-A active site. 

CONSENSUS: CFY]-x-[LlVMFYJ-x-S-rrV]-x-K-x(4HAGLMl-x(2)-(LCJ. 

NAME: Beta-lactamase class-C active site. 
CONSENSUS: F-E-ELIVM]-G-S-[LIVMG]-[SA]-K. 

NAME: Beta-lactamase class-D active site. 

CONSENSUS: lPA]-x-S-[Sn-F-K-(LIVJ-IPAL]-x-CSTAJ-[LU. 

NAME: Beta-lactamases class B signature 1. 

CONSENSUS: [LI]-x.[STN].[HNl-x-H.(GSTA}-I>-x(2)-G-[GPJ-x(7.8HGS]. 

NAME: Beta-lactamases class B signature 2. 

CONSENSUS: P-x(3)-[LIVM](2Vx-G-x-C-lUVMF](2)-K. 

NAME: Arginasc family signature 1 . 

CONSENSUS: [UVMF]-G-G-x-H-x-[UVMTI-[STAV].x.[PAGJ-x(3HGSTA3. 

NAME: Arginase family signature 2. 

CONSENSUS: [LIVM3(2)-x-[UVMFY].D-[AS]-H-x-D. 

NAME: Arginase femily signature 3. 

CONSENSUS: lSTK[UVMFY]-I>-[LIVM]-D-x(3HPAQ]-x(3)-P-IGSA]-x(7)-G, 
NAME: Adenosine and AMP deaminase signature. 
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CONSENSUS: (SA]-[LIVM] [NGS}-ISTA1-D-D-P. 

NAME: Cytidinc and deoxycytidylate deaminases zinc-binding region signature. 

CONSENSUS: [CH}.[AGV).E-x(2)-ILIVMFGATl.[UVMJ.x(17.33)-P<:-x(2,8)-C.x(3).[UVM]. 

NAME: GT? cycbhydrolase I signature 1 . 

CONSENSUS: [ENl-(LIVM](2)-x(2)-IKRQNJ-(DNj-(LIVMl*x(3HSTl-x-C-E-H.H. 

NAME: GTP cyclohydrolase I signature 2. 

CONSENSUS: [SA]-x-(RK)-x-Q-(LIVM]-Q-E-[RNMLIl-ITSNJ. 

NAME: Nitrilases / cyanide hydraiasc signaoire 1 . 

CONSENSUS: G-x(2).[LIVMFYia)-x-[IFI-x.E-x(2)-[LIVM)-x.G-Y-P. 

NAME: Nitrilases / cyanide hydratase active site signature. 

CONSENSUS: G-{GAQ]-x(2)-C-[WA]-E-[NH)-x(2HPST|-[UVMFyS]-x-[KRJ. 

NAME: Inorganic pyrophosphatase signature. 

CONSENSUS: D-[SGDN)-D-lPE)-[L!VMF]-D-[LrVMGACJ. 

NAME: Acylphosphatase signature 1 . 
CONSENSUS: [LIV].x-G-x-V-Q-G-V-x-[FM]-R. 

NAME: Acylphosphatase signature 2. 

CONSENSUS: G-(FYWJ-[AVC)-[KRQAM]-N-x(3)-G-x-V.x(5>-G. 

NAME: ATP symhase alpha and beta subunits signature. 
CONSENSUS: P-[SAP]-[LIV]4DNHJ-x(3)-S-x-S. 

NAME: ATP synthase gamma subunit signature. 
CONSENSUS: [IVJ.T-x-E-x(2)-[DE]-x(3)-G-A-x-(SAKR}. 

NAME: ATP synthase delta (OSCP) subunit signature. 

CONSENSUS: (UVM)-x-[U\^l^.x(3)-fUVMTl-n>ENQKl-x(2HUVM]-x^GSA]-G-(LI^ 
CONSENSUS: x-[UVM]-{KRHENQ)-x-[GSENl. 

NAME: ATP synthase a subunit signature. 

CONSENSUS: (STAGN]-x-[STAG) [LIVMF].R.L-x-[SAGV]-N-[LIVMT|. 
NAME: ATP synthase c subunit signature. 

CONSENSUS: lGSTAl-R-(NQ]-P-x(10)-[LIVMFYW]a)-x{3)-ILIVMFYWJ-x-[DEJ, 

NAME: E1-E2 ATPases phosphoiylation site. 
CONSENSUS: D-K-T^-T-[U]-(Tn. 

NAME: Sodium and potassium ATPases beta subunits signature i. 

CONSENSUS: [FYW]-x(2)-[FYW]-x-[FYW3-[DN)-x(6HLIVM]-G-R-T-x(3)-W. 

NAME: Sodium and potassium ATPases beta subunits signature 2. 
CONSENSUS: [RKl-x(2)-C-[RKQWIl-x(5)-L-x(2)-C-(SAl-G. 

NAME: GDA1/CD39 family of nucleoside phosphatases signature. 
CONSENSUS: [UVM]-x-G-x(2)-E-G-x-[FY]-x.[FW]-CLIVA]-nrAG3.x-N-[HY]. 

NAME: lodothyronine deiodinases active site. 
CONSENSUS: R-P-L-V-x-N-F-G-S-ICAJ-T-C-P-x-F. 

NAME: Cutinase. serine active site. 

CONSENSUS: P-x-(STA).x-ILIV]-[IVTl.x-[GS]-G-Y-S-[QLl-G. 

NAME: Cutinase, aspartate and histidine active sites. 

CONSENSUS: C-x(3)-D-x-[TV}-C-x-G-[GST|-x(2)-[LIVM]-x(2,3)-H. 

NAME: DDC / GAD / HDC / TyrDC pyridoxal-phosphate attachment site. 
CONSENSUS: S-[UVMFYW)-x(5^K-[UVMIWG]a)-x(3HLIVMFYWl-x-(C^^^ 
CONSENSUS: x(2)-[RK]. 

NAME: Om/Lys/Aig decaiboxylases family 1 pyridoxal>P anachmem site. 
CONSENSUS: [STAV]-x-S-x-H-K-x(2)-[GSTAN]{2)-x-[STA]-Q-[STA](2). 

NAME: Om/DAP/Arg decaiboxylases family 2 pyiidoxal-P attachment site. 
CONSENSUS: [FY)-[PA)-x-K-(SACV)-{NHCLFW)-x(4)-[LIVMn-[UVMT 
CONSENSUS: [GTEJ. 
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NAME: Om/DAP/Arg decarboxylases family 2 signature 2. 

CONSENSUS: [GS]-x(2,6)4LIVMSCPJ-x(2)-[LIVMI=l.(DNSJ-tLIVMCA]-G^-G-fLlVMFYl. 
CONSENSUS: [GSTPCEQ]. 

NAME: Oroiidine 5 '-phosphate decarboxylase active site. 

CONSENSUS: [UVMFTA]-[LIVMF]-x-D-x-K-x(2)-D-I-[GPJ-x.T-[LIVMTA]. 

NAME: Phosphoenolpynivace carboxylase active site 1. 
CONSENSUS: [Vn-x-T-A-H-P-T.[EQ)-x<2)-R-[KRH]. 

NAME: Phosphoenolpynivate carboxylase active site 2. 
CONSENSUS: (IV3-M-CLIVM]-G.Y-S-M-x-K-D-(STAG].G. 

NAME: Phosphoenolpynivate carboxykinase (GTP) signature. 
CONSENSUS: F-P-S-A-C-G-K-T-N, 

NAME: Phosphoenolpynivate carboxykinase (ATP) signature, 
CONSENSUS: L-I^-D-I>-E-H-x-W-x-[DE]-x-G-[IVJ-x-N. 

NAME: Uroporphyrinogen decarboxylase signature 1 . 
CONSENSUS: P-x-W-x-M-R-Q-A-G-R. 

NAME: Uroporphyrinogen decarboxylase signanire 2. 

CONSENSUS: G-F-[STAGCV].[STAGC>x-P-[FYW]-T-lLVJ-x(2)-Y-x(2)-[AEJ.[GKJ. 
NAME: Indole-3-glycerol phosphate synthase signature. 

CONSENSUS: CUVMFY]-[LIVMCJ.x-E.tLIVMFYCJ-K-[KRSPJ-[STAKl.S-P-lST|-x(3)-[LIVMFYST^ 

NAME: Ribulose bisphosphate carboxylase large chain active site 

CONSENSUS: G-x-[DN}-F-x.K-x-D-E. 

NAME: Fructose-bisphosphate aldolase clas»-I active site. 
CONSENSUS: [LIVM)-x-[LIVMFYW]-E-G-x-[LSJ-L-K-P-[SN). 

NAME: Fnictose-bisphosphate aldolase class-II signanire 1 

CONSENSUS: (FYVMl-x(l.3)-[UVMH].IAPNl.[UVM)-x(1.2)-(LIVMJ-H-x.D-H-[GACH]. 

NAME: Fructose-bisphosphate aldolase class-II signanire 2. 
CONSENSUS: [UVMJ-E.x-E-[UVM]-G-x(2HGMHGSTA].x-E. 

NAME: Malate symhase signature. 

CONSENSUS: [KR]-IDENQJ-H.x(2)-G-L-N.x-G.x-W-D-Y4UVM1-F. 

NAME: Hydroxymethylglutaiyl-coenzynje A lyase active site. 
CONSENSUS: S-V-A-G-L-G-G-C-P-Y. 

NAME: Hydroxymediylglutaryl-cocnzyme A symhase active site. 
CONSENSUS: N-x-[DNl-[IV)-E-G-[IVl-D.x(2)-N-A-C-(FY).x-G. 

NAME: Citrate synthase signanire, 

CONSENSUS: G-[FYAJ-[GA3-H-x-(IV]-x(1.2)-[RKTI-x(2)-D-tPS]-R. 

NAME: Alpha-isopropylmalate and homocitrate synthases signature 1 . 
CONSENSUS: L-R-[DE]-G-x-Q-x{IO)-K. 

NAME: Alpha-isopropylmalate and homocitnite synthases signature 2 
CONSENSUS: [IiVMFW].x(2)-H-x.H-[DN]-D-x^-x-IGAS]-x.[GASLn. 

N.\ME: KDPG and KHG aldolases active site. 
CONSENSUS: G-[LIVM]-x(3)-E-[LIV]-T-CLF]-R. 

NAME: KDPG and KHG aldolases Schiff-base fonniiig residue. 
CONSENSUS: G-x(3)-[LIVMIT-K-[LFI-F-P-[SAJ.x(3)-G. 

NAME: Isocitrate lyase signanire. 
CONSENSUS: K-[KR]-C-G-H-[LMQ), 

NAME: Beta-eliminating lyases pyridoxal-phosphate attachment site. 
CONSENSUS: Y.x-D.x(3>M-S-[GAl-K-K.D-x-[UVM](2)-x-[LIVMl^-G. 

NAME: DNA photolyases class 1 signanire 1. 

CONSENSUS: T-G-x-P.[LIVM](2)-I>-A-x-M-[RA)-x-nJVM]. 

NAME: DNA photolyases class 1 signature 2. 
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CONSENSUS: [DN]-R>x-R-[LIVMl(2)-x-(STA](2)-F-CLIVMFAJ-x-K.x-L-x(2.3).W.IlCRQ]. 

NAME: DNA photolyases class 2 signature 1. 
CONSENSUS: F-x-E-E-x-[UVM](2)-R-R-E-L-x(2)-N.F. 

NAME: DNA photolyases class 2 signature 2. 

CONSENSUS: G-x-H-D-x(2)-W.x-E-R-x-[LIVM)-F-G-K-tLlVMJ-R-[FY].M-N. 
NAME: Eukaryotic-type cartonic anhydrases s^nature. 

CONSENSUS: S-E-H-x-[LIVM}-x(4)-lFYHJ-x(2)-E-(LIVMJ-H-CUVMFAJ(2), 

NAME: Prokaryotic-typc carbonic anhydrases signature 1. 
CONSENSUS: C-[SA1-I>^R-[LIVM)-X'[AP1. 

NAME: Prokaryotic-lype carbonic anhydrases signature 2. 

CONSENSUS: [EQJ-Y-A-[UVM)-x(2)-(LIVM]-x(4)-[LIVMF)(3).x-G-H-x(2>-C-G. 

NAME: Fumarate lyases signature. 
CONSENSUS: G-S-x(2)-M-x(2)-K-x-N. 

NAME: Aconitase family signature 1 . 

CONSENSUS: (LrVMJ.x(2HGSACIVMl-x.[UVHGTIV]-[STPK.x(0.1)-T-N.lGSTANI]-^^^^ 
CONSENSUS: [LIVMA]. 

NAME: Aconitase family signature 2. 

CONSENSUS: G-x(2HUVWPQ].x(3)-[GAC]<:-[GSTAMl-[LIMITrAK-ILIMVl-[GAJ. 

NAME: Dihydroxy-acid and 6-phosphoghiconate dehydratases signamre 1 . 
CONSENSUS: C-D-K-x(2)-P.{GA]-x(3)-[GA]. 

NAME: Dihydroxy-acid and 6-phosphogluconate dehydratases signature 2. 
CONSENSUS: tSA]-L-[LIVM]-T-D-IGAJ-R-(LIVMF]-S-(GA]-[GAV]-CST|. 

NAME: Dehydroquinase class I active site. 

CONSENSUS: D-[LIVMl-(DEl.(LIVN].x(18.20)-{LIVMia)-x-tSq.[NHY]-H-(DN]. 

NAME: Dehydroquinase class II signature. 

CONSENSUS: lLIVM]-tNQ]-G-P-N-[LV]-x{2>L-G-x-R-[QED]-P-x(2>-[FY).G. 
NAME: Eoolase signature. 

CONSENSUS: IUVl(3)-K-x-N-Q-I^-(Sn-[LIVHST)-EDE)-[STA). 

NAME: Serine/threonine dehydmtases pyridoxal-phosphate attachment site. 

CONSENSUS: PESH]-x(4,5HSTVG3.x.CAS]-[FYn-K-[DUFSA]-mVMF]-[GAl-rLIVMGA]. 

NAME: Enoyl-CoA hydratase/isomerase signature. 

CONSENSUS: [lJVMl-ISTAl-x-fUVM]-PENQRHSTA)^>x(3HAGK3)-x(4)-IUVMST)-x-^^ 
CONSENSUS: [DQHP3-[LIVMFY1. 

NAME: bnidazoleglyceroKphosphate dehydratase signamre \. 

CONSENSUS: [LIVMY)-[DEl-x-H-H-x(2)-E-x(2)-[GCA]-CLIVM]-[STACl-ILIVMl. 

NAME: Imidazoleglycerol-phosphate dehydratase signature 2, 
CONSENSUS: G-x-[DN}-x-H-H-x(2)-E-[STAGq-x.[FY]-K. 

NAME: Tryptophan synthase alpha chain signature. 

CONSENSUS: [LIVM1-E^UVM]-G.x(2)-(FYC^[STI^DE)-[PA^(UVMY3-[AGLI1-[D^^ 

NAME: Tryptophan synthase beta chain pyridoxaJ-phosphate anachment site. 
CONSENSUS: [UVMl-x-H-x-G-ESTAJ-H-K-x-N. 

NAME: Delta-aminolevulinic acid dehydratase active site. 
CONSENSUS: G-x.D-x-[UVMJ(2)-(IVl.K-P-(GSAhx(2)-Y. 

NAME: Urocanase active site. 
CONSENSUS: F-Q-GL-P-x-R-I-C-W. 

NAME: Prephenate dehydratase signature 1. 

CONSENSUS: [FY)-x-[LIVMl-x(2HLIVM].x(5).IDN)-x(5)-T-R-F-[LIVMrW0-^^^ 

NAME: Prephenate dehydratase signature 2. 
CONSENSUS: [LIVM).CST|.[KR3-(UVM1-E-IST1.R-P. 

NAME: Dihydrodipicolinate synthetase signature 1. 
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CONSENSUS: fGSAj-(LIVM]-[LIVMFY].x(2)-G-[STI-lTG]^-E-IGASNF]-x(6)-[EQ]. 

NAME: Dihydrodipicolinate synthetase signature 2. 

CONSENSUS: Y-[DNSJ-rLIVMF]-P.x(2)-(ST]-x(3HUVMl.x(13.l4).[LIVM]-x-ISGA)>[LIVMFl- 
CONSENSUS: K-[DEQAF]-[STAC]. 

NAME: RsuA family of pseudouridine synthase signature. 
CONSENSUS: G-R-L-D-x(2)-[STJ-x-G-[LI VMF](4)-[ST]-tDNTl . 

NAME: Cysteine synthase/cystathionine beta-synthase P-phosphate attachmait site. 
CONSENSUS: K-x-E-x(3)-{PA]-[STAGC3.x-S-^VAP]-K-x.R.x.^STAG^x(2)-[LIVMl. 

NAME: Phenylalanine and histidine ammonia-Iyases signamre. 

CONSENSUS: G-ISTG].(LIVM]-[STG]-[AC].S-G-tDH)-L.x-P-L-(SAl.x(2).(SA]. 

NAME: Porphobilinogen deaminase cofacior-binding site. 

CONSENSUS: E-R-x.(UVMFA).x(3HLIVMn-x4>IGSA)<:.x.[IVTl.P-(LIVMFl[GSA]. 

NAME: Cys/Met metabolism enzymes pyridoxal-pho^hate attachment site 

CONSENSUS: (IX3HUVMn.x(3HSTAGCHSTAGCIJ-T-K-[FYWQHLIVMFJ-x^-l^ 

NAME: Qyoxalase I signaojre 1 . 

CONSENSUS: fHQJ-[IVTl-x-(LIVFYl-x.[IV).x(5)-lSTA]-x(2VF.rW^.x(2.3)-[IJ4F]-G^ 
NAME: Glyoxalase I signamre 2. 

CONSENSUS: G-fNTKQl-x<0.5)-[GAl-[LVFY).lGH).H.[IVF]4CGA]-x-[STAGL].xa)-pNq 

NAME: Cytochrome c and cl heme lyases signamre 1 . 
CONSENSUS: H-N-x(2)-N-E-x(2)-W-tNQKR]-x(4)-W-E. 

NAME: Cytochrome c and cl heme lyases signature 2. 

CONSENSUS: P-F-D-R-H-D-W. 

NAME: Adenylate cyclases class-I signature 1. 
CONSENSUS: E-Y-F-G-[SAK2)-L-W-x-L-Y-K. 

NAME: Adenylate cyclases class>I signamre 2. 

CONSENSUS: Y-R-N-x-W-[NS]-E-lLIVM]-R-T-L-H-F-x-G, 

NAME: Guanylate cyclases signamre. 

CONSENSUS: G-V-[LIVMl-x(0, l)-G-x(5HFY]-x^UVMl^FYW>[GS)-[DNTHKW]-f DNTI-n^ 
CONSENSUS: IDNTA]-x{5HDE]. 

NAME: Chorismate synthase signamre 1. 

CONSENSUS: G.E-S-H-[GC]-x(2>0LIVMl-£GTV]-x-[LIVMK2)-[DEI-G-x-[PVJ. 
NAME: Chorismate synthase signamre 2. 

CONSENSUS: IGEl-R-fSA)(2HSAGJ-R^[EVHSTI-x(2HRHl-V-x(2)-G. 
NAME: Chorismate synthase signamre 3, 

CONSENSUS: RKSH]-D-[PSV].[CSAV]-x(4HGAn-x-(IVGSP]-ILIVMl.x-E-[STAHl-[UV^ 

NAME: 6-pyruvoy] tctrahydropterin symhasc signature 1. 
CONSENSUS: C-N-N-x(2)-G-H-G-H-N-Y. 

NAME: 6i)yruvoyl tetrahydropterin synthase signature 2. 
CONSENSUS: D-H-K-N-L-D-x-D. 

NAME: Ferrochelatase signamre. 

CONSENSUS: [UVMFl(2)-x-S.x.H-[GS)-[UVM)-p.x(4,5)-(DENQKR].x-G-D-x-Y. 

NAME: Alanine racemase pyridoxal-phosphate atiachmeiu site 
CONSENSUS: V-x-K-A-[DNJ-[GA]-Y-G-H-G. 

NAME: Aspartate and glutamate racemases signamre 1. 

CONSENSUS: IIVA]-(LrVM]-x-C-x(0. l)-N-(ST]-{MSA}.[STH]-IUVFYSTANKJ. 

NAME: Aspartate and glutamate racemases signamre 2. 

CONSENSUS: lliVMl(2)-x-[AG]<:-T-IDEH]-[UVMFY)-[PNGRS]-x-[U^ 

NAME: Mandehte racemase / muconate lactoniziqg enzyme family signamre 1 
CONSENSUS: A.x-[SAG]a)-CLIVMl-[DEJ-x-A.x<2)-D-x(2HGAl-[KR3. 

NAME: Mandelate racemase / muconate lactoniziQg enzyme fomily signamre 2. 


1057 


wo 01/1 2659 PCT/IBOO/01496 

CONSENSUS: G-x(7)-D-x(9)-A>x(l4)-(LIVMl-E-[DENQ]-P-x(4HDENQl, 
NAME: Ribulose-phosphate 3-cpimerase family signature 1. 

CONSENSUS: fLIVMF).H.|tJVMFYJ-D-ILrVMJ.x-D.x(I.2HFY].[UVM]-x-N.x-[STAVl. 

NAME: Ribulose-phosphate 3-epimcrase family signature 2. 

CONSENSUS: lUVMA]-x-(LIVMI-M-{ST]-[VSJ-x-P-x(3)-G-Q-x-F-x(6HNK]-(UVMC]. 

NAME: Aldose 1-epimerase putative active site. 
CONSENSUS: [NS)-x-T-N-H-x- Y-f FWJ-N-|UJ . 

NAME: Cyclophil in-type pepiidyl-prolyl cis-trans isomcrase signature. 

CONSENSUS: lFYl-x(2)-[STCNLV)-x-F-H-[RHi-(LIVMNMLIVM)-x(2)-F-ILIVM)-x-Q-IAGJ-G. 
NAME: Cyclophilin-type peptidyl-prolyl cis-trans isomcrase proftle. 

NAME: FKBP-type peptidyl-prolyl cis-trans isomcrase signanire 1. 

CONSENSUS: IUVMC]-x-|YFl-x-[GVL]-x(i.2)-[LFT]-x(2)-G-x(3HDE]-(STAEQK]-tSTANl. 
NAME: FKBP-type peptidyl-prolyl cis-trans isomerase signanire 2. 

CONSENSUS: [LIVMFYl-x(2HGAl-x(3.4)-(LrVMF]-x(2)-[LIVMFHK]-x(2)-G-x(4)-(LIVMFl- 
CONSENSUS: x(3>-[PSGAQ|-x(2)-fAGJ-[FY)-G. 

NAME: FKBP-type peptidyl-prolyl cis-trans isomerase domain profile. 

NAME: PpiC-iype peptidyl-prolyl cis-trans isomerase signature. 

CONSENSUS: F-IGSADEn-x-lLVAQl-A-x(3).rSTl.x(3.4)-(STQl-x(3»5)-(GERl-G-x-[LIVM]- 
CONSENSUS: [GS]. 

NAME: Trioscphosphaie isomerase active site. 
CONSENSUS: [AV]-Y-E-P-{LIVM]-W-[SAM-G-T-IGK1. 

NAME: Xylose isomerase signature 1. 

CONSENSUS: ILn-E-P-K-P-x(2)-P. 

NAME: Xylose isomerase signature 2. 

CONSENSUS: [FLl-H-D-x-D-fLIV]-x-(PD)-x-[GDEJ. 

NAME: Phosphomannose isomerase type I signature 1. 
CONSENSUS: Y-x-D-x-N-H-K-P-E. 

NAME: Phosphomannose isomerase type I signature 2. 

CONSENSUS: H-A-Y-ILIVMI-x-G-x(2>-[LIVM)-e-x-M-A-x-S-D-N-x-(LIVM]-R-A-G-x-T-P-K. 
NAME: Phosfrfioglucose isomerase signature 1. 

CONSENSUS: fDENS)-x-ILIVM^G-G-R-[FY]-S-[LIVMT|-x.[STAHPSAC]-[LIV^IA].G. 

NAME: Phosphoglucose isomerase signature 2. 

CONSENSUS: [GS]-x-(LrVMMUVMFYW]-x(4)-[FYl-[DN]-Q-x-G-V-E-x(2>-K. 

NAME: Glucosamine/galactosamine-6-phosphate isomerases signature. 

CONSENSUS: [LIVMl-x(3)-G-x-(LITl-x-lLIVJ-x-[UVM)-x-G-[LIVM]-G-x-[DENl-G-H. 

NAME: Phosphoglycerate mutase family phosphohistidine signature. 
CONSENSUS: ILIVMl-x.R.H-G-IEQl-x(3)-N. 

NAME: Phosphoglucomutasc and phosphomaimomutasc phosphoserine signature. 
CONSENSUS: lGSAJ-[LrVM]-x-lLIVM]-(ST]-[PGA).S-H-x.P-x(4)-[GNHE]. 

NAME: Methylmalonyl-CoA mutase signature. 

CONSENSUS: R-l-A-R-N-(TQJ-x(2)-IUVMFYl(2)-x-[EQ)-E.x(4HKRNl-xC2).I>-P-x-[GSAh 

CONSENSUS: G-S. 

NAME: Terpene synthases signamre. 

CONSENSUS: [DEl-G-S-W-x-G-x.W-(GAh[LIVM).x-[FY]-x-y-[GAJ. 

NAME: Eukaryotic DNA topoisomerase I active site. 

CONSENSUS: (DEN].x(6)-[GS)-[!T|-S-K-x(2)-y.(LIVMl.x(3)-[LIVM). 

NAME: Prokaryoiic DNA topoisomerase I active site. 

CONSENSUS: [EQ]-x-L-Y-[DEQT3-x(3.l2)-[LIl.lST]-Y-x-R-[STl-(DEQS]. 

NAME: DNA topoisomerase U signature. 
CONSENSUS: [UVMA]-x-E-G-tI>Nl-S-A-x-[STAGl. 
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NAME: Aminoacyl-transfer RNA synthetases class ! signature. 

CONSENSUS: P-x(0.2)-[GSTANl-[DENQGAPK)-x.(LIVMFP]-[HT].[LIVMYAC3.G.[HNTGl- 
CONSENSUS: (LIVMFYSTAGPC]. 

NAME: Aminoacyl-transfer RNA synthetases class-H signature 1 . 
CONSENSUS: [FYH)-R-x-IDEJ-x(4. 12)-IRHl-x{3)-F-x(3HDE]. 

NAME: Aminoacyl-transfer RNA synthetases class-U signature 2. 

CONSENSUS: [GSTALV^-{DENQHRKP}.[GSTAl-[UVMI^^E].R.[UVMF^x-[UVMSTAG]^^^ 
NAME: WHEP-TRS domain signature. 

CONSENSUS: lQy]-G«[DNEAj-x.(UV]-(ia3.xa>-K-x(2).[KRNG].[AS]-x(4).[UV].ro^ 
CONSENSUS: x(2>-[IV]>x(2)-L-x(3)-K. 

NAME: ATP-citratc lyase / succinyl-CoA ligases family signamrc 1. 

CONSENSUS: S-[KR].S-G-[GTJ4UVM14GST3.x>[EQ]-x(8jO)^-x(4HLIVMWGA].[LIVM)^ 
CONSENSUS: G-D, Ji J-^ 

NAME: AT7<citrate lyase / succinyl-CoA ligases family active site 
CONSENSUS: G-x(2)-A-x(4,7)-[RQn-[LIVMF)-G-H-[ASl-EGH]. 

NAME: ATP-citrate lyase / succinyl-CoA ligases fomily signature 3. 

CONSENSUS: G-x-[IV]-x(2)-[LIVMn-x-[NAl-G-(GAl-G-[lJV3.fSTAV]-x(4).D.x-rUV^ 
CONSENSUS: GIGRE]. 

NAME: Glutamine synthetase signature 1. 

CONSENSUS: (FYWL3-D-G-S-S-x(6.8)-[DENQSTAKl-[SAHDEl.xaHLIVMFY]. 

NAME: Glutamine synthetase putative ATP-binding region signature. 
CONSENSUS: K-P-(LIVMFYA]-x(3,5)-[NPAT)-G-{GSTANl-G-x-H-x(3)-S, 

NAME: Ghicamine synthetase class-I adenylation site. 
CONSENSUS: K-[LIVMl-x(5)-[LIVMA}-D-[RK].[DNl-[LIl-Y. 

NAME: D-alanine—D-alanine ligase signature 1. 

CONSENSUS: H-G-x(2)-G-E-D-G-x-tUVMAJ-[QSA)-IGSA]. 

NAME: D-alanine-D-alaninc ligase signature 2. 

CONSENSUS: lUV].x(3HGAl-x.[GSAIV].R-ELIVCA}-D-CLIVMFl(2)-x(7.9)-ILIl.x-E- 
CONSENSUS: (UVA^N^STPl-x-P-^GA). 

NAME: SAICAR synthetase signanire 1. 

CONSENSUS: (LIVMF](2)-P-[LIVM]-E-x-[LIVM]-[LIVMCAJ-R-x(3)-[TA]-G-S. 

NAME: SAICAR synthetase signature 2. 

CONSENSUS: [LIVM]-[LIVMA]-D-x-K-[UVMFY]-E-F-G. 

NAME: Folylpolyglutamate synthase signature 1. 

CONSENSUS: [UVMFY]-x-[UVMl.(STAG]-G-T-[Niq^-K.x.[STl.x(7)-(UV^ 
NAME: Folylpolyglutamate synthase signature 2. 

CONSENSUS: [LIVMFY](2)-E-x^.(LIVM].(GA]-G-x(2)-D.x-lGSTl-x-[UVMl(2). 

NAME: Ubiquitin-activatiiig enzyme signamre 1. 
CONSENSUS: K-A<:-S-G-K-F-x-P. 

NAME: Ubiquidn-activatiitg enzyme active site. 
CONSENSUS: P-ILIYMl-C-T-rLIVMlIKRHJ-x-IFTI-P. 

NAME: Ubiquitin-conjugating enzymes active site. 

CONSENSUS: rFYWl^P].H-[PCl<[NH].tLIV3-x(3.4)-G-x-IUVl<:-[UV3-x-[U^ 

NAME: Fomiate-tetrahydrofolate ligase signamre 1. 
CONSENSUS: G-[UVM)-K-G-G-A-A-G-G-G-Y. 

NAME: Foraiate-letrahydrofolate ligase signature 2. 
CONSENSUS: V-A-T-nVl-R-A-L-K-x-[HN]-G-G. 

NAME: Adetiylosuccinate synthetase GTP-bindiog site. 
CONSENSUS: Q-W-G-D-E-G-K-G. 

NAME: Adenylosuccinate synthetase acdve site. 
CONSENSUS: G-I-IGR)-P-x-Y-x(2)-K-xa)-R. 
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NAME: Argininosuccinate synthase signature 1. 
CONSENSUS: A-(FY1-S-G-G-L-D-T-S. 

NAME: Argininosuccinate synthase signature 2. 
CONSENSUS: G-x-T.x-K-G-N-D-x(2VR-F. 

NAME: Phosphoribosylglycinamide synthetase signature. 
CONSENSUS: R-F-G-D-P-E-x-[QMl. 

NAME: Carbamoyl-phosphate synthase subdomain signature 1. 

CONSENSUS: [I^VJ.[PS]-(LIVMCl-ILIVMAHLIVM3-[KRl-[PSA)-ISTA]-x(3HSG}-G-x-[AGl. 
NAME: Carbamoyl-phosphate synthase subdomain signature 2. 

CONSENSUS: [LIVMF]-[LIMN].E-[LIVMCA)-N-[PATLIVM]-[KR>[UVMSTAC]. 

NAME: ATP-dcpendent DNA ligase AMP-binding site. 
CONSENSUS: [EDQHl-x-K-x-[DN)-G-x.R-[GACIVMl. 

NAME: ATP-depcndeni DNA ligase signature 2. 

CONSENSUS: E-G[LrVMA)-[LIVMJ(2HKR)-x(5.8)-[YWj-[QNEKl-x(2.6)-(KRH]-x(3.5>K- 
CONSENSUS: [LIVMFYJ-K. 

NAME: NAD-dependent DNA ligase signature 1 . 

CONSENSUS: K-fUVMJ-D-G-lUVMl-[SA]-x(4)-Y-x(2)-G-x-L-x(4HST]-R-G-(DNJ-G'X(2)-G- 
CONSENSUS: [DElfDENL]. 

NAME: NAD-dependent DNA ligase signature 2. 

CONSENSUS: [IVJ-G-[KRJ-[ST]-G-x [UVM]-[STNK]-x-[VTI-x(2)-L-x-fPS]-V. 

NAME: RNA 3'-ienninal phosphate cyclase signature. 
CONSENSUS: [RHl-G-x(2)-P-x-G(3)-x^UV]. 

NAME: Lipoate^protcin Ligase B signature. 

CONSENSUS: R-G-G-x(2)-T-(FYWl-H-x(2)-IGH)-Q-x-[Lrvj-x-Y. 

NAME: Isopenicillin N synthetase signature 1. 
CONSENSUS: [RK)-x-fSTA]-x(2)-S-x-C-Y.lSL). 

NAME: Isopenicillin N synthetase signature 2. 

CONSENSUS: [LIVM](2)-x-C-G-(STAl-x(2HSTAG]-x(2)-T-x-IDNG]. 

NAME: Site-specific recombinases active site. 
CONSENSUS: Y-[UVAq-R-IVAl-S-[STl-x(2)-Q. 

NAME: Site-specific recombinases signature 2. 

CONSENSUS: G-fDE]-x(2)-rUVM)-x(3).[UVM]-[DTI-R-ILIVM3*(GSA]. 
NAME: Transposases. Mutator family, signature. 

CONSENSUS: D.x(3)-G-{UVMn-x(6HSTAV)-[LIVMFYWl-IPT]-x-[STAVJ-x(2)-[QR]-x<:.x(2)^ 
CONSENSUS: H. 

NAME: Transposases. IS30 family, signature. 

CONSENSUS: R-G-xa)-E-N-x-N-G-[LIVM](2)-R-[QE3-[LIVMFY](2>-P-K. 
NAME: Autoinducers syiuhetases family signature. 

CONSENSUS: [LMFY)-R-x(3)-F-x(2)-(KRl-x(2)-W.x-fLrVM]-x(6,9).E-x-D-x-[FY]-D. 
NAME: Thiamine pyrophosfdiate enzymes signature. 

CONSENSUS: [LrVMF]-{GSA]-x(5)-P-x{4)-(LIVMFYW]-x-[LIVMF3-x-G-D-[GSAI-[GSAC]. 
NAME: Biotin^requihng enzymes attachmem site. 

CONSENSUS: [GNl.[DEQTRl-x-[UVMFYl-xaHLIVMl-x-[AIVl-M-K-tLMATl.x(3HLIVM3-x- 

CONSENSUS: [SAV]. 

NAME: 2-0X0 acid dehydrogenases acyltransferase component lipoyi binding site. 
CONSENSUS: tGNJ-x(2HUVF).x(5)-[LIVFCl.x(2)-[LIVFAJ-x(3)-K-[STAAn-[STAVQDN]- 
CONSENSUS: xC2)-[LIVMFSl-x(5)-{CCNl-x-[LIVMFY]. 

NAME: Putative AMP-binding domain signature. 

CONSENSUS: [LIVMFY]-x(2HSTG]-[STAG]-G-(ST|-(STEI]-lSGl-x-(PASLIVMJ.[KR]. 

NAME: Molybdenum cofactor biosynthesis proteins signature 1 . 
CONSENSUS: (LIVMl(3)-[Lm(2)-G-G-T-G-x(4)-D. 
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NAME: Molybdenum cofactor biosynthesis proteins signanire 2 
NAME: moaA / nifB / pqqE family signanire 

CONSENSUS: CLIV]-x(3)-C-[NP)-lLIVMFj-[QRSJ-C.x-[FYM].C. 
NAME: Radical activating enzymes signature. 

CONSENSUS: [GV J.x-G-x-(KR]-x(3)-F-x(2).G-a(0. 1 )-C-x(3)-C-x(2)-C-x.[NLJ. 

NAME: Tpx family signanire. 

CONSENSUS: S-x-D-UP.F-A-x(2)-lKR]-IFW].C. 

NAME: CyiochFome c family heme-binding site signature 
CONSENSUS: C.{CPWHF}-{CPWR}.C-H-{CFYW}. 

NAME: Cytochrome b5 family, heme-binding domain signanire 
CONSENSUS: [FY]-[UVMK]-x(2)-H-P-[GA]-G. 

NAME: Cytochrome b/b6 heme-ligand signanire 

CONSENSUS: [DENQJ-x(3)-G-[FYWMQJ-x-iLIVMn-R-x(2)-H. 

NAME: Cytochrome b/b6 Qo site signature 
CONSENSUS: P-[DE]-W-(FYJ-(LFY](2). 

NAME: Cytochrome b559 subunits heme-binding site signanire 

CONSENSUS: [UV]-x-ISTl-[LIVF]-R.[FYWl.x(2)-[IV).H-"[STGAI.[UV)-[STGAHIVl-^^ 

NAME: NickeWcpcndent hydrpgcnascs b-iypc cytochrome subunit signanire 1 

CONSENSUS: R-P-IVMFYWl-x-H-W-rLIVMl-x(2HUVMn-rSTACl-[LIVM]-x(2>.L-x-ILI^ 

NAME: Nickel-dcpendcm hydrogcnases b-type cytochrome subunit signature 2 

CONSENSUS: [I^J-[STA].(LIVMFnV]-H-[RH}-[LIVMJ-x(2).W.x-(UVMin-x(2)-F-x(3).H. 

NAME: Succinate dehydrogenase cytochrome b subunit signanire 1 

CONSENSUS: R-P-rLIVMn-x(3MLIVMJ-x(6)-rLIVMWPK].x(4).S.x(2)-H.R-x-[Sn. 

NAME: Succinate dehydrogenase cytochrome b subunit signanire 2 

CONSENSUS: H-x(3)-lGA]-fLIVMT]-R.rHFJ-(LIVMF]-x-[FYWMJ.D-x.IGVAJ. 

NAME: Thioredoxin family active site 

NAME: Glutaredoxin active site. 

CONSENSUS: [LIVI>]-fFYSA)-x(4)<:-[PV]-[FYWJ-C-x(2)-rrAV].x<2.3HLIV). 
NAME: Type-1 copper (blue) proteins signanire. 

CONSENSUS: [GAJ-x(0.2).(YSAl-x(0,l)-[VFY].x-C-x(1.2)-fPGl.x(0.1)-H.x(2.4)-[MQ|. 
NAME: 2Fe-2S fcrredoxins, iron-sulfur binding region signanire 
CONSENSUS: C-{C}-{C}.IGAJ.{C}.C-(GAST].{CPDEKRHFYW}^. 

NAME: Adretxxloxin family, iron-sulfur binding region signature. 
CONSENSUS: C.x(2).rSTAQl-x-[STAMVJ.C-[STAJ-T-C-[HRl. 

NAME: 4Fe-4S ferredoxins. iron-sulfur binding region siimanire 
CONSENSUS: C-x(2)-C.x(2)-C-x(3).C-[PEG). 

NAME: High potential iron-sulfur proteins signanire 
CONSENSUS: C-x(6.9)-[LrVM]-x(3)-G-[YWl-C.x(2HFYW]. 

NAME: Rieske iron-sulfur protein signanire 1 
CONSENSUS: C-(TK}-H-L-G-C-fLIVTI. 

NAME: Rieske iron-sulfur protein signanire 2 
CONSENSUS: C-P-C-H.x-[GSA]. 

NAME: Flavodoxin signature. 

CONSENSUS: ^UV]-[LIVFY^(FYl-x-lSD-^((2)-[AGC).x-T-x(3)-A-x(2).IUV]. 

NAME: Rubredoxin signature. 

CONSENSUS: IUVMI-x(3)-W-x-C-P-x-C-(AGD]. 
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NAME: Electron transfer flavoprotein alpha-subunit signature. 

CONSENSUS: [LII-Y-(UVMl-(AT].x.G-(IVJ-|SDl-G-x-|IV]-Q.H-x(2)-G.x(6HIV]-x.A- 
CONSENSUS: [lYJ-N. 

NAME: Electron transfer flavoprotein bcta-subunil signature. 

CONSENSUS: [IVA]-x[KR]-x(2)-(DEJ-[GDJ-[GDEJ-x(1.2)-(EQ]-x-[LIVJ-x(4)-P.x-(LIVMl(2)- 
CONSENSUS: (TACJ. 

NAME: Vertebrate metallothioneins signature. 

CONSENSUS: C.x^-rGSTAPJ.x(2K-x-C-x(2)-C-x.C-x(2)-C-x-K. 

NAME: Ferritin Iron-binding regions signature 1. 

CONSENSUS: E-x-fKR]-E-x(2)>E-[ICR]-[LF]-[LIVMAJ-x(2)-Q-N-x-R-x-G-R. 
NAME: Ferritin iron-binding regions signanire 2. 

CONSENSUS: D-x(2)-[UVMF]-[STAC]-[DH]-F-ILn-IENl-x(2HFY|-L-x(6)-[LIVM]4KN]. 
NAME: Bacterioferritin signature. 

CONSENSUS: <M-x-G-x(3)-V-ILIV^x(2^[LM]-x(3)-L-x(3)-L. 
NAME: Transferrins signature 1. 

CONSENSUS: Y-x(0,l).fVASJ-V-riVACJ-[IVAl.[IVAJ.(RKHJ-[RKS]-[GDENSA]. 
NAME: Transferrins signature 2. 

CONSENSUS: Y-x-G-A.[FLl-[KRHNQ]-C-L-x(3.4>G-lDENQJ-V-[GAJ-[FYW], 
NAME: Transferrins signature 3. 

CONSENSUS: [DENQ].(YF)-x-(LY].L-C-x-[DN]-x(5.8)-lLIVJ-x(4,5)-C-x{2)-A-x(4)-[HQR]-x- 
CONSENSUS: [UVMFYW]-{UVM). 

NAME: Globins profile. 

NAME: Protozoan/cyanobaccerial globins signature. 

CONSENSUS: F-[LF).x(5)-G-[PA]-x(4)-G-[KRAJ-x.[LIVM].x(3)-H . 

NAME: Plant hemoglobins signature. 
CONSENSUS: [SN]-P.x-L-x(2)-H-A-x{3)-F. 

NAME: Hemerytfarins signature. 
CONSENSUS: W-L-x-[NQ]-H-I-x<3)-D-F. 

NAME: Aithropod hemocyanins / insect LSPs signature 1 . 
CONSENSUS: Y-(FYW).x-E-I>[LIVM)-x(2)-N-x(6)-H-x(3)-P. 

NAME: Arthropod hemocyanins / insect LSPs signature 2. 
CONSENSUS: T-x(2)-R-D-P-x-(FYl-[FYWl. 

NAME: Heavy -metal -associated domain. 

CONSENSUS: [UVN)-x(2)-fLIVMFA]-x-C-x-{STAGCDNHJ-C-x(3)-ILIVFG)-x(3HLIV)-x(9. 1 1)- 
CONSENSUS: [IVA]-x-lLVFYSl. 

NAME: ABC transporters family signature. 

CONSENSUS: ILIVMFYCJ-[SA1-[SAPGLVFYKQH]-G-IDENQMW1-[KRQASPCUMFW)-[KRNQSTAVM]- 
CONSENSUS: [KRACLVM1-[LIVMFYPANHPHY}-[LIVMFW]-(SAGCUVP]-{FYWHP}-{KRHP}- 
CONSENSUS: [UVMFYWSTA]. 

NAME: Binding-protein-dependem transport systems inner membrane comp. sign. 

CONSENSUS: (UVMFY]-x(8)-[EQR]-[STAGVl-ISTAGl-x{3)^[LIVMFYSTAC)-x(5^[UVMFYSTAl. 
CONSENSUS: x(4)-[LIVMFY]-[PKR]. 

NAME: ABC-2 type transport system integral membrane proteins signanire. 

CONSENSUS: [UMSTI-x(2)-(UMW].x(2)-rLIMCA]-(GSTCl-x-[GSAIVl-x(6>-[UMGAJ-[PCSN(B- 
CONSENSUS: x(9, 12)-P-[LIMFrj-x-[HRSYl-x(5HRQ). 

NAME: Bacterial extracellular sohite-binding proteins, family 1 signature. 

CONSENSUS: CGAP3-[LIVI^AHSTAVDN]-x(4HGSAVl-[LIVMFY](2).Y-[NDl-x<3).[LIV^ 
CONSENSUS: [KNDE]. 

NAME: Bacterial extracellular solute-binding proteins, family 3 signature. 

CONSENSUS: G-[FYIL).[DE)-[LIVMTl-[DE]-[UVMn-x(3)-[UVMA].[VAGC).x(2)-fUVMAG^ 
NAME: Bacterial extracellular solute-binding proteins, family S signature. 

CONSENSUS: [AG]-x(6.7)-PNEG)-x(2HSTAVE]-IUVMIWAJ-xKLIVMFY>x-{L!VM]-[KR]. 
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CONSENSUS: [KRHDE]-[GDNJ-[LnnviA]-[KNGSP]-[FWl. 
NAME: Serum albumin family signature. 

CONSENSUS: (FY)-x(6)-C-C-x(7K:-fLFY]-x(6HLIVMFYW]. 

NAME: Transthyretin signature 1. 

CONSENSUS: S-K-C-P-L-M-V.K-V-L-D-(AS]-V-R-G. 

NAME: Transthyretin signanire 2. 

CONSENSUS: S-P-[FY)-S-[FY]-S-T-T-A-[LIVM]-V-[ST)-x-P. 
NAME: Avidin / Screptavidin family signature. 

CONSENSUS: [DENl-x(2)-[KRJ-[STA].x(2)-V-G.x.[DN]-x-fFW]-T-[KRI. 

NAME: Eukaryotic cobalamin-binding proteins signature. 
CONSENSUS: tSN]-V.D.T-|GA]-A-[LIVM]-A-x-L-A-[LIVMF)-T-C. 

NAME; Lipocalin signature. 

CONSoJsUS: j^°j^|f''ENQCOTARKl-M0.2)-[DENQAIUq-[LIVnrHCP)^-{^ 

NAME: Cyiosolic fatty-acid binding proteins signature 

CONSENSUS: (^^•^^'^•»■I"VM^-x(4HNHG]^n^-(DE)-x-[LIVMFY]^^^ 

NAME: Acyl-CoA-binding protein signature, 

CONSENSUS: P-ISTA]-x.pEN].x-[LIVMF)-x(2)-[LIVMFyi-Y-fGSTAl-x-fFYl-K-Q-rSTA]a)-x-G. 
NAME: LBP / BPI / CETP family signature. 

^ONS^SUS: [PA^IGAl-^LIVMC]-x(2)-R-[IV].[ST)-x(3^^x(5HEQ]-x(4HUVMl-[E^^ 
CONSENSUS: x(8)"P. 

NAME: Phosphatidylethanolamine-bindii\g protem family signature. 
CONSENSUS: [Fy]-x-[LIVMF](3)-x-[Dq-P-D-x-P-(SN]-x(lO)-H. 

NAME: Plant lipid transfer proteins signanve. 

CONSENSUS: [LIVM]-[PA].x(2)-C-x-ILIVMl-x-{LrVM]-x-CLIVMFY)-x-[UVMl-fSTI 
CONSENSUS: [DN]<:.x(2)«fLrVMl. J i J V r 

NAME: Uteroglobin family signature 1 . 

CONSENSUS: [GAJ-x(3)-I-C-P-x-tLIVMF]-x(3).[LIVM]-(DE]-x-[LIVMF](2). 
NAME: Uteroglobin family signanire 2. 

CONSENSUS: [DEQ)-x(4)-[SN]-x(5).pEQ]-x-I-x{2)-S.[PSE]-[LSl-C. 

NAME: Mitochondrial energy transfer proteins signature. 

CONSENSUS: P-x-(DE]-x-[LIVAT]-fRK)-x-[LRH]-tLIVMFY]-[QMAIGV]. 

NAME: Sugar transport proteins signature I. 

^Sm!!^!!H!- ^IVMSTAGl-(UVMFSAG^x(2).^LIVMSA]^DEJ-x.[LIVMIWA)<^R-(RKJ-x(4.6)- 

CONSENSUS: [GSTA]. 

NAME: Sugar transport proteins signature 2. 

CONSENSUS: [LIVMF^x-G-[UVMFA^x(2H^x(8HLIFY).x(2)-[EQ]-x(6HRKl. 

NAME: LacY family proton/sugar symporters signature I . 
CONSENSUS: G-{LIVM](2)-x-D-[RKl-L-G-L-[RK](2)-x-(LIVM](2)-W, 

NAME: LacY family proton/sugar symporters signature 2. 

CONSENSUS: P-x-[LIVMFK2).N-R-(LIVM)-G-x-K-N-{STA]-[LIVM](3). 

NAME: PTR2 family proton/oligopeptide symporters signamre I 

CONSENSUS: [GAI-[GASJ-(LIVMFYWA]-[LIVM]-[GAS]-D-x-(LIVMFYWT)-[UVMITWl^^ 
CONSENSUS: [IV]-x(3>-[GSTAV]-x.[LIVMF]-x(3)-[GA]. H^v«iri wj-o-x^HTAV] 

NAME: PTR2 family proton/oligopeptide symporters signature 2, 
CONSENSUS: lFYTJ-x(2)-{LMFY)-[ITVl-[UVMFYWA]-x^IVG]-N-[L^ 

NAME: Amiloride-sensitive sodium channels signature. 

CONSENSUS: Y-x(2^[EQTn-x<:.x(2HGSTDNLl<:-x-[QTl.x(2).[LIVMTl.(LIVMS]-x(2)-C-x<:. 
NAME: Sodium:aIaiiine symponer family signature. 

CONSENSUS: G^-x-[CL\](2)-[LIVM]-F.W.M-W-IUVM]-x-[STAV].II^ 
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NAME: Sodiuin:dicarboxylatt: symporter family signature !. 

CONSENSUS: P-x(0.1).G-fDEl.x-[UVMF)(2)-x-rUVM](2HKREQl-(LIVM)(3Vx.P. 
NAME: Sodium:dicarboxylate symporier family signature 2. 

CONSENSUS: P-x-G-x-[STA]-x-(NTl.[LIVMCJ-D-G-[STAN]-x-[UVM]-[FYl-x(2)-fLIVMl.x(2)- 
CONSENSUS: ILIVM]-[FY]-[LI]-(SAJ-Q. 

NAME: Sodiuin:galactoside symporter family signature. 

CONSENSUS: D-x(3)-G-x(3)-(DNl-x(6.8)-G-(KH]-F.[ICR]-P.{FYW].[LIVMJ(2)-x.{GSTAJ(2). 

NAME: Sodium: neurotransmitter symporier family signature 1. 
CONSENSUS: W-R.F-[GP)-Y-x(4)-N-G-G-G-x-[FY]. 

NAME: Sodium: neurotransmitter symporter family signature 2 

CONSENSUS: Y^LIVMITJ-x(2HSC)-[UVMFY]-tSTQl-x(2H-P-W-x(2)-C-x(4)-N-{GST]. 

NAME: Sodiumrsolute symporter family signature 1 . 

CONSENSUS: (GS)-x(2)-[LIY]-x(3)-(UVMFYWSTAGJ(10>lLIY]-n'AV]-x{2)-G-G-aM 
CONSENSUS: (SAP]. 

NAME: Sodium: solute symporter family signature 2. 

CONSENSUS: [GASTI-[LIVMJ-x(3^(KR)-x(4)^-A.x(2)-IGAS)-aiVMGSJ-[IJVMW].(UVMGAT)-G. 
CONSENSUS: x-tUVMCJ. 

NAME: Sodium:suIfate symporter family signature. 

CONSENSUS: [STACPl.S-x<2)-F-x(2>.p-{LIVMHGSA)-x(3VN-x-[UVMl.V. 

NAME: glpT family of transporters signature. 
CONSENSUS: R-G-x(5)-W-N-x(2)-H-N-x-G-G. 

NAME: Ammonium transpoiters signature. 

CONSENSUS: I>-(Fk^S|-A-G-[GSC3-x(2V[IVl-x(3^tSAGJ(2>-x(2HSAGl-(LIVMFl-x(3)- 
CONSENSUS: [UVMFYWA)(2)-x-[GKl-x-R, 

NAME: BCCT family of transporters signature. 
CONSENSUS: {GSDNl-W.T-[LIVM]-x-|FYJ-W-x.W-W. 

NAME: Flagellar motor protein moiA family signature. 
CONSENSUS: A-[LMFl.x-[GAT|-T-(LIVFJ-x-G-x.[LIVMFl-x(7).p. 

NAME: Formate and nitrite transporters signature 1 . 

CONSENSUS: [UVMA]-rLIVMY)-x^-[GSTA)-rDESJ-L-[FIl-fPNMOS). 

NAME: Formate and nitrite transporters signature 2. 
CONSENSUS: (GAJ-x(2)-ICA]-N-[LIVMFYW](2)-V-C.fLV].A. 

NAME: Prokaryotic sulfate-binding proteins signature 1. 
CONSENSUS: K-x.INQEK)-[GT]-G-tDQ)-x-[UVM]-x(3)-Q-S. 

NAME: Prokaryotic sulfate-binding proteins signaoire 2. 
CONSENSUS: N-P-K-CSTI-S-G-x-A-R. 

NAME: Sulfate transporters signature. 

CONSENSUS: Px.Y.[GS)-L-Y.[STAG)(2)-x(4).[LIVMFYK3).x(3>-[GSTAl(2)^(KR]. 
NAME: Amino acid permeases signamrt. 

CONSENSUS: rSTAGC]<;-fPAG]-x(2,3)-[LIVMFYWA]C2)-x-[UVMFYW)-x-[LIVMFWSTAGa(2V 
CONSENSUS: (STAGC)-x(3)-[LIVMFirW]-x-fUVMSTl.x(3)-ILIMCTA].IGA]-^^^^ 

NAME: Aromatic amino acids permeases signature. 

CONSENSUS: I-G-(GAK5-M-(LFl-[SAJ-x.p.x(3HSA]^-x(2>-F. 

NAME: Xaldline^Iaci] permeases family signature. 

CONSENSUS: tUVM)-P.x-[PASIF)-V.[lJVMJ-G-G-x(4)-[LIVMHFY]-IGSAh^^^ 

NAME: Anion exchangers fiamily signature I . 

CONSENSUS: F-G-G-(LIVM](2)-[KR1 D-[LIVM]-[RK3-R-R-Y. 

NAME: Anion exchangers family signature 2. 
CONSENSUS; (FI]-L-I-S-H-F-I-Y-E-T-F-x-K-L. 

NAME: MIP fomily signature. 

CONSENSUS: (HNQA]-x.N-P-[STA]-tLIVMF]-[STl-[LIVMF]-[GSTAFYJ. 
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NAME: General diffusion Gram-negative porins signature. 

CONSENSUS: tL!VMFY)-x(2)-G-x(2).Y-x-F-x-K-x(2HSN].[STAVHUVMFyW]-V. 
NAME: OmpA-like domain. 

CONSENSUS: (UVMAl-x-[GT]-x.rrAJ-[DAJ-x(2)-lDGj-[GSTPJ.x(2)-[LFYDEJ-[NQS]-x(2)- 
CONSENSUS: [LI]-[SGl.(QEI-[KRQEl-R.A-x(2)-[LV]-x(3)-(LIVMFFx(4,5).[LIVM].x(4)- 
CONSENSUS: [UVM]-x(3HSGJ-x-G. 

NAME: Eukaryodc miiochondrial porin signature. 

CONSENSUS: IYH>x(2)-D-[SPAl.x-ESTA]-x(3)-[TAGJ.(KR]-[LIVMF]-[DNSTA]-fDNSl-x(4V 
CONSENSUS: [GSTAN)-rLIVMA)-x-[LIVMYl. 

NAME: Insulin-like growth factor binding proteins signature. 
CONSENSUS: G-C-(GS)-C-C-x(2)-C-A-x(6)-C. 

NAME: GPRl/FUN34/yaaH family signanire. 
CONSENSUS: N-P.[AVl-P-[LF]-G-L-x.(CSA].F. 

NAME: GNS1/SUR4 family signature. 
CONSENSUS: L-x-FL-H-x-Y-H-H. 

NAME: 43 Kd postsynaptic protein signature. 
CONSENSUS: G-Q-D-Q-T-K-Q-Q-I. 

NAME: Actins signature' 1 . 

CONSENSUS: [FYl-ILIVJ-G-{DE]-E.A.Q.x-[RKQK2)-G. 

NAME: Actins signature 2. 

CONSENSUS: W.(IV]-[STAl-{RK]-x-[DE]-Y-[DNE)-[DE]. 
NAME: Actins aixl actin-related proteins signature. 

CONSENSUS: [lJ^l-[LIVM)-T-E-IGAPQl-x-[LIVMFYWHQ]-N-[PSTAQJ-x(2)-N-{KR]. 
NAME: Anncxins repeated domain signature. 

CONSENSUS: rrGJ-ISTV]-x(8)-[LIVMF]-x(2)-R-x(3)-(DEQNHJ.x(7)-(IFy]-x(7)4UVMFI- 
CONSENSUS: x(3)-[UVMF].x(l IHLIVMFA).x(2)-[LIVMF]. 

NAME: Caveolins signature. 
CONSENSUS: F-E-D-V-I-A-E-P. 

NAME: Clathrin light chain signature 1. 
CONSENSUS: F-L-A-Q-Q-E-S. 

NAME: Clathrin light chain signature 2. 

CONSENSUS: [KRJ-D-x.S-[KR]-[UVM]-[KR>x-[LIVM](3)-x.L-K. 

NAME: Chisterin signature I. 
CONSENSUS: C-K-P-C-L-K-x-T-C. 

NAME: Clusterin signature 2. 

CONSENSUS: C-L-[RK]-M-[RK]-x-IEQ]-C-[ED)-K-C. 

NAME: Connexins signature 1 . 

CONSENSUS: C-[PN]-T-x-Q-P-G-C-x(2)-V-C-Y-D. 

NAME: Connexins signature 2. 

CONSENSUS: C-x(3.4)-P^.x(3)-[UVM)-[DEN]-C-(FY]-[LIVM].[SA]-(KR]-P. 
NAME: Ciystallins beta and gamma 'Greek key* motif signature. 

CONSENSUS: [UVMFYWA^x-{DEHRKSTP}^FY)-[DEQHKYl-x(3)-^Tl-x-G-x(4HUVMrc 
NAME: pynamin family signature. 

CONSENSUS: L-P-{RK]-G-[STNl-tGN]-(LIVMJ-V-T-R. 

NAME: pynein light chain type 1 signature. 

CONSENSUS: H-x-I-x-G-[KR]-x-F-[GA]-S-x-V.{Sn-[HY]-E. 

NAME: FtsZ protein signature 1. 

CONSENSUS: N-[STI-D-x-Q-x-L-x(l6,18).G-x-G-[ATV]-G-[GSAN]-x-P-x(2H5. 
NAME: FtsZ protein signature 2. 

CONSENSUS: tDNHKR].[UVMF)-x.(LIVMF]a)-[VSTACJ-[STAC]-G-x-G-[GK]-G-T-G.[ST|.^^ 
CONSENSUS: IGSAR]-(STA]-P-[LIVMFn-[LIVMFl-[SGAVl. 
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NAME: Fungal hydrophobins signature. 

CONSENSUS: tGN]-lDNQPSA]-x-C-IGSTANKl-(GSTADNQ]-ISTNQO-lFnV].x-C<:.[DENQKPSTl. 

NAME; Intermediate filaments signature. 

CONSENSUS: irV]-x-ITACn-Y-[RKH]-x-(LMJ-L-[DE]. 

NAME: Involucrin sigtiature. 

CONSENSUS: <M-S-[QHl*Q-x-T-rLV)-P-V-T-fLV]. 
NAME: Kinesin motor domain signature. 

CONSENSUS: [GSA]-[KRHPSTQVM]-[LIVMF)-x-[LIVMFMIVCl-D-MAH]-G-[SAN].E. 
NAME: Kioesin motor domain profile. 

NAME: Kinesin light chain repeal. 

CONSENSUS: [DEQR]-A-L-x(3)-[GEQ]-x(3^G-x-fDNS]-x-P-x-V-A-x(3)-N-x-^[ASJ- 
CONSENSUS: x(5HQRJ-x-tKR]-[FY3-x(2HAVl.x(4)-[HKNQ]. 

NAME: Myelin basic protein signature. 
CONSENSUS: V-V-H-F-F-K-N. 

NAME: Myelin PO protein signature. 

CONSENSUS: S-lKR]-S-xK-[AG]-x-[SA)-E-K-K-[STA).K. 

NAME: Myelin proteolipid protein signature 1. 
CONSENSUS: G-[MV)-A-L-F-C-G-C-G-H. 

NAME: Myelin proteolipid protein signature 2. 

CONSENSUS: C-x.[ST)-x-[DEJ-x(3)-[ST]-[FY]-x-L-{FY]-I-x(4)-G-A. 

NAME: Neuromoduiin (GAP-43) signature 1. 
CONSENSUS: <M-L-C-C-[LIVMJ-R-R. 

NAME: Neuromoduiin (GAP-43) signature 2. 
CONSENSUS: S-F-R-G-H-I-x-R-K-K-fUVM]. 

NAME: Ostcopontin signature. 

CONSENSUS: [KQl-x-ITA)-x(2HGAJ.S-S-E-E-K. 

NAME: Periphcrin / rom-1 signature. 

CONSENSUS: £HGSl-V-P-F-[ST]-C-C-N-P-x-S-P-R-P-C. 

NAME: Fkofilin signature. 

CONSENSUS: <x(0,l)-[STA]-x(0.1)-W-[DENQH]-x-[YI)-x-(DEQl. 

NAME: Surfactant associated polypeptide SP-C palmitoylation sines. 
CONSENSUS: I-P-C-C-P-V. 

NAME: Synapsins signature 1. 
CONSENSUS: L-R-R-R-L-S-D-S. 

NAME: Synapsins signature 2. 
CONSENSUS: G-H-A-H-S-G-M-G-K-V-K. 

NAME: Synaptobrevin signature. 

CONSENSUS: N.[LIVM)-fDENSJ.^IaJ-V-x-[DE(a-R-xaHKR^[UVM)^STO 
CONSENSUS: [KR>[TA1-(DE]. 

NAME: Synaptophysin / synapioporin signaoire. 
CONSENSUS: L-S-V-pEJ-C-x-N-K-T. 

NAME: Tropomyosins signature. 
CONSENSUS: L-K-E-A-E-x-R-A-E. 

NAME: Tubulin subunits alpha, beta, and gamma signature. 
CONSENSUS: [SAG]-G-G-T-G.[SA]-G. 

NAME: Tubulin-beta mRNA autoregulation signal. 
CONSENSUS: <M-R-[DE]-[IL]. 

NAME: Tau and MAP proteins tubulin-binding domain signature. 
CONSENSUS: GnS-xa)-N-x(2)-H-x-[PAl-[AG)-G(2). 

NAME: Neuiaxin and MAPIB proteins repeated region signature. 
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CONSENSUS: ISTAGDN1.Y.x-Y-E-x(2)-(DEJ.[KR]-tSTAGCIJ. 

NAME: F-actin capping protein alpha subunit signature 1. 
CONSENSUS: V.H-[FY](2>E-D-G-N-V. 

NAME: F-actin capping protein alpha subunit signature 2. 
CONSENSUS: F-K-[AE)-L-R-R.x-L-P, 

NAME: F-aciin capping protein beta subunit signature. 
CONSENSUS: C-D-Y-N-R-D. 

NAME: Vinculin family talin-binding region signature. 

CONSENSUS: (KR]-x.[UVMF]-x(3HLIVMA]-x(2).[LIVMl-x(fi).R-Q4}-E-L. 

NAME: Vinculin repeated domain signature. 
CONSENSUS: tLIVM]-x-[QA)-A-x(2)-W-aL]-x.IDNJ-P. 

NAME: Amyloidogenic glycoprotein extracellular domain signature. 
CONSENSUS: G-{VT)-E-[FY]-V-C-C-P. 

NAME: Amyloidogenic glycoprotein intracelhilar domain signature. 
CONSENSUS: G-Y-E-N-P.T-Y-|KR). 

NAME: Cadherins exiraccllular repeated domain signature. 
CONSENSUS: (L[VJ-x-[LIV]-x-D-x-N-D-[NH]-x.P. 

NAME: Insect cuticle proteins signature. 

CONSENSUS: G-x(7).[DENl-G-x(6^Y-x-A-[DNG^x{2,3)-G-[FY]-x-tAP]. 
NAME: Gas vesicles protein GVPa signature 1. 

CONSENSUS: rUVMl-x-[DE].[LIVMFYTl-fLIVMl-[DEl.x^UVM]{2)-n5KRl{2)^^^ 

NAME: Gas vesicles protein GVPa signature 2. 

CONSENSUS: R[LIVA](3).A-IGSI-[LIVMFy].x-T^x(3).Y-[AG]. 

NAME: Gas vesicles protein GVPc repeated domain signature 
CONSENSUS: F.L-x(2)-T-x(3)-R-x(3)-A.x(2)-Q-x(3).L-x(2)-F. 

NAME: Bacterial microcompartiments proteins signatore. 

CONSENSUS: D-x(0.1)-M-x-K.[SAGJ(2)-x-[IV]-x-[LIVM)-[LIVMA]-[GCS]-x(4).[GD]-[SGPD]- 
CONSENSUS: [GA]. 

NAME: Ftagella basal body rod proteins signature. 

CONSENSUS- [^^Q'-''<^>-P-'^YSTAja)-IGSTA]^STADEN]-N-[LIVM)-ISA^ 
NAME: Flagclla transport protein fliP family signature 1. 

CONSENSUS: [PA^A.IFY]-x-[UVTI-[STH]-[E(a-{L^-x(2HGA^F-(KREQl.[^ 

NAME: Flagella transport protein fliP family signamre 2. 
CONSENSUS: P-[LIVMF]-K-[LIVMFl(5)-x.[LIVMA)-[DNGS)-G-W. 

NAME: Plant viruses icosahedral capsid proteins 'S* region signamre 
CONSENSUS: IFYW)-x-nOTAJ.x(7).G-x-[LIVM].xH[LIVMl-x-[FYW^ 

NAME: Potexviruses and cariaviruses coat protein signature. 

CONSENSUS: [RKJ-[FYWl-A-IGAP3-F-I>.x^F-x(2)-[LV)-x(3HGASTia). 

NAME: Neurotransmitter-gated ion-channels signature. 
CONSENSUS: C-x-[LIVMFQ1-x.[LIVMF1-x(2)-[FY)-P-x-D-x(3K;. 

NAME: ATP P2X receptors signature. 

CONSENSUS: G<J-x-aJVMl^-(UVM)-x.nV)-x-W-x-C-D)Nl-L.I>x(5)^-x-^^^^ 
NAME: G-protein coupled receptors signature. 

CONSENSUS: lGSTALIVMFYWC]-[GSTANCPDEJ-{EDPKRH}.x(2)-[UVMNQGA^x(2)-aiVMFTl- 
CONSENSUS: [GSTANC].[UVMIWSTACl-PENH).R.[FYWCafl M^HLIVMPH 

NAME: G-protein coupled receptors family 2 signature 1 . 

CONSENSUS: C-x(3^[FYWLIV]-D-x(3.4)-C-[FWl-x(2HSTAGVl-x(8.9)-C-[PFI. 

NAME: G-protein coupled receptors family 2 signature 2. 
CONSENSUS: Q-G-[U4FCA3-[LIVMFn-[UVl.x.[UVFSTh[UIl-^ 
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NAME: G-protein coupled receptors family 3 signature i. 

CONSENSUS: [LV]-x-N-lLIVMl(2)-x-L-F-x-I-(PAl-Q-[LIVMHSTA!-x-(STA](3).[STAN]. 
NAME: G-protein coupled receptors family 3 signature 2. 

CONSENSUS: C-C-[FYW]-x-C-x(2)-C-x(4)-[FYW)-x(2,4).[DN]-x(2HSTAHJ-C-x(2)-C. 

NAME: G-protcin coupled receptors family 3 signature 3. 
CONSENSUS: F-N-E-{STA)-K-x-I-(STAG]-F-[STI-M. 

NAME: Visual pigments (opsins) retinal bindii^ sice. 

CONSENSUS: [LIVMWAC]-[PGACl-x(3HSAC>K-ISTALIMR]-[GSACPNV]-ISTACP]-x(2).(DENF]. 
CONSENSUS: [APJ-x(2)-[IY]. 

NAME: Bacterial rhodopsins signature I, 

CONSENSUS: R-Y.x-IDTl-W-x-[LIVMFl-[Sn-T4»-tLIVMl(3). 

NAME: Bacterial rhodopsins retinal binding site. 

CONSENSUS: [FyiV3-x-[FYVGl-(UVM)-D-[LIVMF]-x-[STA}-K-x{2)-[FY3. 

NAME: Receptor tyrosine kinase class II signature. 
CONSENSUS: tDN]-tLIV]-Y-x(3)-Y-Y.R. 

NAME: Receptor tyrosine kinase class HI signature. 
CONSENSUS: G-x-H-x-N.[LIVM)-V.N-L-L-G-A^-T. 

NAME: Receptor tyrosine kinase class V signature 1. 

CONSENSUS: F.x.[DN)'x4GAWl-[GAl-C-[LIVMl.fSA]-[UVMl<2)-[SA].(LV)-(KRHQ]-[LIVA)- 
CONSENSUS: x(3)-(KR]-C-[PSAW]. 

NAME: Receptor tyrosine kinase class V signature 2. 

CONSENSUS: C-x(2)-[DEl-G-[DEQJ-W-x(2.3)-lPAQ)-(LIVMT]-tGT]-x<:-x-C-x(2)-G-[HFY]- 
CONSENSUS: [EQJ. 

NAME: Growth foctor and cytokines receptors fomily signature 1. 
CONSENSUS: C-|LVFYRl-xa,8)-[STIVDN]-C-x-W. 

NAME: Growth factor and cytokines receptors family sigmture 2. 
CONSENSUS: [STGL]-x-W-[SG)-x-W.S. 

NAME: TNFR/NGFR family cysteinc-rich region signature. 

CONSENSUS: C-x(4,6>-[FYHl-x(5. 10K:-x(0,2K-x(2.3K-x(7. I 1K:-x(4.6HDNEQSKPJ- 
CONSENSUS: x(2)-C. 

NAME: TNFR/NGFR family cysteinc-rich region domain. 

NAME: Integrins alpha chain signature. 
CONSENSUS: [FYWS]-[RK]-x-G-F-F-x-R. 

NAME: Integrins beta chain cysteine-rich domain signanire. 
CONSENSUS: C-x-tGNQ].x(1.3)-G-x-C-x-C-x(2K-x.C. 

NAME: Natriuretic peptides receptors signature. 
CONSENSUS: G-P-x-C.x-Y-x-A-A-x-V-x-R-x(3)-H-W. 

NAME: Photosynthetic reaction center proteins signature. 

CONSENSUS: rNH)-x(4)-P-x-H-x(2HSAG]-x(liHSAGCl-x-H-[SAG](2). 

NAME: Antenna complexes alpha subunits signanire. 

CONSENSUS: [LIVFAG]-x-[GASV]-[LIVFA]-x-[IV]-H-x(3MUVMJ.[GSTAEl.[STANH]-x(U3)- 
CONSENSUS: [STN1-W-(LIVMFYW). 

NAME: Antenna complexes beta subunits signature. 

CONSENSUS: [E(a-x(4).H-x(5HGSTAl-x(3)-[FYl-x(3HAGl-x(2HAV].H-x(7)-P. 

NAME: Pbotosystem I psaA and psaB proteins signanire. 
CONSENSUS: C-D-G-P-G-R-G-G-T-C. 

NAME: Photosystem I psaG and psaK proteins signanire. 

CONSENSUS: G-F-x-[LIVMl-x-[DEA)-x(2)-[GA]-x-(GTA]-[SAl.x-G-H.x-[UVMl-IGAl. 

NAME: Phytochrome chromophore anachmem site signature. 
CONSENSUS: [RGSHGSA)-[PVl-H-x-C-H-x(2>Y. 

NAME: Phytochnnne chromophore attachmem site dcmiain profile. 
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NAME; Spcfact receptor repeated domain signature. 

CONSENSUS: G-x(5).G.x(2)-E-x(6)-W-G-x(2)-C.x(3HFYW].x(8)-C-x(3)-G. 
NAME: TonB-de pendent receptor proteins signature 1. 

CONSENSUS: < x(10, 1 15)-[DENF]-[Sn-IUVMFJ-(LIVSTEQI-V-x-[AGPl-(STANEQPKl. 
NAME: TonB-dependent receptor proteins signature 2. 

CONSENSUS: fLYGSTANE)-x(3)-[GSTAENQJ-x-[PGE]-R-x-[LIVFYWA)-x-[LIVMFTAHSTAGNQl- 
CONSENSUS: [LIVMFYGTA].x-[UVMFYWGTADQJ.x.F> . J i « VJ 

NAME: Transmembrane 4 family signature. 

CONSENSUS: G.x(3).fLIVMFl.x{2)-[GSAl-[UVMF](2)^-x.[GA]-[STA]-x(2)-fEGl-^^^^^ 
CONSENSUS: (CWN1.(LIVM](2). 

NAME: Bacterial chemotaxis sensory transducers signature. 

CONSENSUS: R-T-E.[EQ).(^x(2HSA].ILIVM].x-[EQl-T-A-A-S-M-E^J-L-T-A-T-V 

NAME: ER lumen protein retaining receptor signature 1 . 
CONSENSUS: G-l-S-x.fKR).x-Q-x-L-{FY]-x-[LIVK2)-F-x(2)-R-Y. 

NAME: ER lumen protein retaining receptor signature 2 
CONSENSUS: L-E-(SA)-V-A-I-[LM)-P-Q-L. 

NAME: Ephrins signaftire. 

NAME: Granulins signature. 

CONSENSUS: C-x-D-x(2)-H-C-C-P-x(4K:. 

NAME: HBGF/FGF £unily signature. 

CONSENSUS: G.x-L-x-[STAGP].x(6.7).(DEK-x-[FM]-x.E-x(6).Y. 
NAME: PTN/MK hcparin-binding protein family signanire 1. 

CONSENSUS: S-[DEJ<:-x pE]-W-x-W-x(2)-C-x.P-x-[SN]-x-D<:-G.[UVMAl^x.R.E<5. 
NAME: PTN/MK heparin-bindii^ protein family signature 2 

CONSENSUS: C^).[LIVM]-P<:-N-W-K-K x-F G A-iDEl<:-K-Y-x-F-[EQ^^^^ 

NAME: Nerve growth factor family signanire. 

CONSENSUS: G-C-(KR)-G-[LIV]-[DE)-x(3)-tYW].x-S-x-C. 

NAME: Platelet-derived growth fector (PDGF) family signanire. 
CONSENSUS: P-[PS]-C-V-x(3)-R-C-(GSTA)-G-C-C. 

NAME: Small cytokines (iniercrine/chemokme) C-x-C subfamily signamre. 

CONSENSUS: C-x-C.[LIVM]-x(5.6)-(LIVMFY^xCHWCSE<a-x-^LIVMl-xa^^JVM].x(^ 

CONSENSUS: (SAGJ-x(2)-C-x(3HEQJ-[LIVM1(2)-x(9,10H:.L-[DN). 

NAME: Small cytokines (intercrine/chemokinc) C-C subfamily sigmture 

CONSENSUS: C^.[UFYT].x(5.6>fL0.x(4^[LIVMI^-x(2HFYW^x(6.8K-x(3.4HSAGl- 

CONSENSUS: [LIVM)(2).[FL].x(8K:-CSTAj. . /a. u.^m^aoj 

NAME: TGF-beta family signanire. 

CONSENSUS: [UVM).x(2)-P.x(2)-(FY).x(4)-C-x-a.x-C. 

NAME: TNF family signamre. 

TONslJJsUS- [{;J(J;J"^^W^*<3><34UVMF1^^^ 

NAME: TNF family profile. 

NAME: Wnt-1 family signature. 

CONSENSUS: C-K-C-H-G-[UVMn-S-G-x-C. 

NAME: Interferon alpha, beta and delta family signamre 

CONSENSUS: [FYHHFYl-x-(GNRC]-[LIVM3-x(2).[FYl-L-x(7)-[CY]-A-W. 

NAME: Gnunilocyte-macrophage colony-stimulating factor signature 
CONSENSUS: C-P.[LP).T-x-E-(STI-x-C. 

NAME: Inierleukin-l signature. 

CONSENSUS: rrci-x-S-|ASLV]-x(2VP-x(2)-[FyLIVl-[U].(SCA]-T-x(7H^^ 
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NAME: lnterleukin-2 signature. 

CONSENSUS: T-E-[LF)-x(2)-L-x.C-L-x(2)-E-L. 

NAME: Interleukins -4 and -13 signature. 

CONSENSUS: L-x-E-ILIVM](2)^x(4.5)-[LIVMl-ITLJ-x(5.7)-C-x(4)-[IVA]-x.[DNSHLIVMA]. 

NAME: lntcrlcukin-6 / G-CSF / MGF signamrc. 
CONSENSUS: C-x(9)-C-x(6)-G-L-x(2)-[FY]-x(3)-L. 

NAME: Inicrleukin-7 and -9 signature. 
CONSENSUS: N-x-[LAPj-[SCT|-F-L-K-x-L-L. 

NAME: Interleukin-10 family signature. 

CONSENSUS: lGS)-C-x(2)-[LVJ-x(2HLIVM](2).x-F.Y-L-x(2)-V, 
NAME: LIP / OSM family signature. 

CONSENSUS: [PST)-x(4)-F-INQI-x-K-x(3)-C.x-(LF)-L-x(2)-Y-[HKl. 

NAME: Macrophage migration inhibitory factor family signauire. 
CONSENSUS: [DE]-P-C-A-x(3)-(LIVMl-x-S-I-G-x.(LIVM]-G. 

NAME: Adipokinetic hormone family signature. 
CONSENSUS: Q-(LV|.(NTl-IFYMST]-x(2)-W. 

NAME: Bombesin-like peptides family signanire. 
CONSENSUS: W-A-x-G-ISH]-[LF]-M. 

NAME: Calcitonin / CORP / LAPP family signamre. 

CONSENSUS: C-[SAGDNJ-lSTNl-x{0J)-fSAJ.T-C.[VMAl-x{3)-[LYF]-x(3)-{LYF]. 
NAME: Corticotropin-relcasing factor family signature. 

CONSENSUS: iPQJ-x-[UVMl-S-ILIVM)-x(2HPSTl-[LIVMFJ-x-(LIVMl-L-R-x(2HLIVM]. 

NAME: Crustacean CHH/MIH/GIH neurohormones family signamre. 
CONSENSUS: C-[DENKJ-D-C-x-N-[LlV]-[FYl-R-x(7)-C-IKRJ-x(2)-C. 

NAME: Erythropoietin / thrombopoeicin signature. 
CONSENSUS: P-x(4K-D-x-R-[LIVM](2Vx-IKRJ-x( 14)-C. 

NAME: Granins signature 1 . 

CONSENSUS: [DE]-[SN)-L.[SANl-x(2)-[DE].x-E-L. 

NAME: Granins signature 2. 

CONSENSUS: C-[LIVM)(2)-E-lUVMK2)-S.[DN].[STA]-L-x-K.x-S-x(3).EUVM).[STA)-x.E-C. 
NAME: Galanin signature. 

CONSENSUS: G-WT-L-NS-A-G-Y-L-L-G-P-H. 

NAME: Gastrin / cholecystokinin fomily signature. 
CONSENSUS: Y-x(0. l)-(GDJ-(WHJ.M-fDR]-F. 

NAME: Glucagon / GIP / secretin / VIP family signature. 

CONSENSUS: [YH]-[STAIVGDJ-(DEQJ-[AGFl-lLIVMSTE].[FYLR}.x4DENSTAKl-[DENSTA). 
CONSENSUS: [LIVMFYG]-x(9)-(KREQLJ-fKRDENQL3-[LVFYWC)-[LIVQ3. 

NAME: Glycoprotein hormones alpha chain signature 1 . 
CONSENSUS: C-x-G-C-C-lFY|-S-R-A-(FY]-P-T-P. 

NAME: Glycoprotein hormones alpha chain signature 2. 
CONSENSUS: N-H-T-x-C-x-C-x-T-C.x{2)-H-K. 

NAME: Glycoprotein hormones beta chain signature I . 
CONSENSUS: C-[STAGM]-G-(HFYL]-C.x-[ST]. 

NAME: Glycoprotein homiones beta chain signature 2. 

CONSENSUS: [PA]-V.A.x(2)-C.x-C-x(2)-C-x(4).[STDHDEYl^-x(6.8HPGSTAVM]-x(2)-C. 

NAME: Gonadotropin-relcasiiig hormones signature. 
CONSENSUS: Q-H.IFYW]-S-x{4)-P-G. 

NAME: Insulin family signature. 

CONSENSUS: C-C-{P)-x(2H:-[STDNEKPn-x(3)-[LIVMFS]-x(3)-C. 
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NAME: Natriuretic peptides signature 
CONSENSUS: C-F-G-x(3)-D-R-I-x(3)-S.x(2)-G-C. 

NAME: Neurohypophysial hormones signature. 
CONSENSUS: C-(LIFY](2)-x-N-[CSJ-P-x-G. 

NAME: Neuromedin U signature. 
CONSENSUS: F-[LIVMF].F-R.P-R-N. 

NAME: Endogenous opioids neuropeptides precursors signature 

C.x(3K-x(2K.x(2).[KRH).x(6J)-[LIFJ.[DN^x(3)<;.x-^^ 
CONSENSUS: (EQ]-x(8).W-x<2)<:. J i 

NAME: Pancreatic hormone family signanirc. 

CONSENSUS: [FYl.x(3).[LIVMl-x(2)-Y-x(3)-[LIVMFY]-x-R-x-R.[YF]. 

NAME: Parathyroid hormone family signanjrc. 
CONSENSUS: V-S-E-x-Q-x(2)-H-x{2)-G. 

NAME: Pyrokinins signature. 
CONSENSUS: F-[GSTVJ.p.R-L-[G>J. 

NAME: Somatotropin, prolactin and related hormones signature 1 

CONSENSUS: C-x^STJ-xaHLIVMFY].x-[LIVMSTAJ-P-x(5^lTALIV].x(7>.(L!VMI^.x(^^ 
CONSENSUS: rUVMFY).x(2HSTA]-W. J xWHi-iVMhYj x(b). 

NAME: Somatotropin, prolactin and related hormones signature 2. 

CONSENSUS: C-[UVMFY]-x(2>D-{LIVMFysrA].x(5).[mMFy]-x(2HLIVM^ 

NAME: Tachykinin family signanire. 
CONSENSUS: F-riVFY]-G-[LM]-M-[G>). 

NAME: Thymosin beta-4 family signature. 
CONSENSUS: K-L-K-K-T-E-T-Q-E-K-N . 

NAME: Urotensin II signature. 
CONSENSUS: C-F-W-K-Y-C. 

NAME: Cecropin family signature. 

CONSENSUS: W.x(0,2V[KDN]-xa)-K-[KRE]-[Ln-E-[RKN]. 

NAME: Mammalian defensins signature. 
CONSENSUS: C-x-C-x(3.5)-C-x(7)-G-x-C-x(9>-C-C. 

NAME: Arthropod defensins signature. 

CONSENSUS: C-x(2.3)-(HN1-C-x(3.4HGR).x(2)-G-G-x-C-x(4,7H:-x-C. 
NAME: Cathelicidins signature I. 

CONSENSUS: Y-x.[ED)-x.V-x-IRQ]-A-[LIVMA]-[DQG]-x-ILIVMFY)-N-IEQJ. 
NAME: CaAelicidins signature 2. 

CONSENSUS: F-x-[LIVMl-K.E-T-x-C.x(10)-C-x-F.[KR]-EKEJ. 

NAME: Endothelin fomily signature. 
CONSENSUS: C-x-C-x(4)-D-x(2)-C-x{2HFYl-C. 

NAME: Plant thionins signature. 
CONSENSUS: C-C-x(5)-R-x(2)-IFY]-x(2K:. 

NAME: Gamma-thionins family signature. 

CONSENSUS: [KRJ-x-C.x(3HSVI-x(2HFYWH)-x^GF].x^.x<5H:-x(3H:. 
NAME: Snake toxins signature. 

CONSENSUS: G-C.x(1.3H:-P.x<8.10)-C-C-x(2)-lPDEN]. 
NAME: Myotoxins signature. 

CONSENSUS: K.x<:.H-x-K.x(2)-H-C-x(2)-K.x(3)^.x(8>K.xaH:-xa)-E^ 
NAME: Scorpion short toxins signature. 

CONSENSUS: C-x(3K-x(6.9HGAS].K-C.(1MQT|-x(3K:-x-C. 

NAME: Heat-stable entcrotoxins signature. 
CONSENSUS: C-C-x(2)-C-C-x-P-A-C-x-G-C. 


1071 


wo 01/1 2659 PCT/IBOO/01496 

NAME: Aerolysin type toxins signature, 
CONSENSUS: [KTI-x(2)-N.W-x(2)-T.[DN]-T. 

NAME: Shiga/ricin ribosomal inactivating toxins active site signature. 

CONSENSUS: [LIVMA]-x-[LIVMSTAJ(2)-x-E-ISAGV]-{STAL)-R-[Fy]-{RKNQS]-x-(LIVMJ-[EQS]- 
CONSENSUS: x(2>-[LrVMF). 

NAME: Channel forming colicins signature. 
CONSENSUS: T-x(2)-W-x-P-[LIVMFY)(3)-x(2)-E. 

NAME: Hok/gef family cell toxic proteins signature. 

CONSENSUS: [LIVMAl(4)-C-[LIVMFA]-T-(LIVMAJ(2)-x(4)-(UVM]-x-[RG]-x(2)-L-(CY]. 

NAME: Staphylococcal enterotoxin/Screptococcal pyrogenic exotoxin signanire 1 . 
CONSENSUS: Y-G-G-lLIVJ-T-x(4)-N. 

NAME: Siaphyloccocal enterotoxin/Screptococcal pyrogenic exotoxin stgnaure 2. 
CONSENSUS: K-x(2)-[LIVl-x(4).lLIVI-D-x(3)-R-x(2).L-x(5)-ILIVJ-Y. 

NAME: Thiol -activated cytolysins signature. 
CONSENSUS: (RK]-E>C-T-G-L-x-W-E-W-W-[RK]. 

NAME: Membrane attack complex components / perforin signature. 
CONSENSUS: Y-x(6HFY)-G-T-H-[FY]. 

NAME: Pancreatic trypsin inhibitor (Kunitz) family signature. 
CONSENSUS: F-x(3)-G-C-x(6)-(FY]-x(5)-<:. 

NAME: Bowman-Birk serine protease inhibitors family signature. 

CONSENSUS: C-x(5.6)-tDENQKRHSTA]-C.[PASTDHl-[PASTDK]-[ASTDV]-C-[NDKSl-(DEKRHSTAl-C. 

NAME: Kazal serine protease inhibitors family signature. 
CONSENSUS: C-x(7)-C-x(6)-Y-x(3)-C-x(2,3)-C. 

NAME: Soybean trypsin inhibitor (Kunitz) protease inhibitors family signature. 
CONSENSUS: lUVM]-x-D-x-[EDNTY]-[DGl-(RKHDENQJ-x-[LIVM]-x(5)-Y-x-[LlVM). 

NAME: Serpins signanire. 

CONSENSUS: lLIVMFYl-x-[UVMFYACJ-(DNQ)-[RKHQSl-(PSTl-F-[UVMFY)-[LlVMFYC)-x- 
CONSENSUS: [LIVMFAH]. 

NAME: Potato inhibitor I family signanire. 

CONSENSUS: (FYW]-P.|BQH]-[UV](2)-G-x(2)-[STAGV]-x(2)-A. 

NAME: Squash family of serine protease inhibitors signature. 
CONSENSUS: C-P-x(5K-x(2)-D-x-D-C-x<3)-C-)i-C. 

NAME: Strepiomyces subrilisin-type inhibitors signanire. 
CONSENSUS: C-x-P-x(2.3)-G-x-H-P-x(4)-A-C.[ATD].x-L. 

NAME: Cysteine proteases inhibitors signature. 

CONSENSUS: lGSTEQKRV]-Q-[UVT).[VAFl-ISAGQ)-G-x-(UVMNK)-x(2)-[LIVMFY3-x-[^^ 
CONSENSUS: [DENQKRHSIV). 

NAME: Tissue inhibitors of metal! oproteinases signature. 
CONSENSUS: C-x-C-x-P-x-H-P-Q-x-A-F-C. 

NAME: Cereal trypsin/alpha-amylase inhibitors family signanire. 

CONSENSUS: C-x(4HSAGD]-x(4>.[SPAL]-[LF].x(2)-C-[RH)-x-[LIVMFYl(2)-x(3,4)-C. 

NAME: AIpha-2-macroglobulin family thiolestcr region signature. 
CONSENSUS: [PGl-x-[GSl<:-[GA}-E-lEQ]-x-[LIVM]. 

NAME: Disintegrins signature. 

CONSENSUS: C-x(2)-G-x-C-C-x-[NQRS)-C-x-lFM]-x(6)-C-[RKl. 

NAME: Lambdoid phages regulaiocy protein Cni signature. 
CONSENSUS: E-S-x-L-x-R-x(2)-[KRl-x-L-x(4)-[KR)(2)-x{2)-rDE]-x-L. 

NAME: Chaperonins cpn60 signanire. 
CONSENSUS: A-[AS]-x-[DEQ)-E-x(4)-G-G-[GA). 

NAME: Chaperonins cpnlO signanire. 

CONSENSUS: [UVMFn-x-P-[n-Tl-x-[DENHKR]-rLIVMFAI(3).lKREQ]-x(8,9HSG]-x- 
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CONSENSUS: IUVMFY](3). 

NAME: Chaperonins TCP- 1 signature 1. 

CONSENSUS: IRKEL]-[ST]-x.[LMFY]-G-P-x-(GSA|-x-x-K-[UVMF](2). 
NAME: Chaperonins TCP- 1 signature 2. 

NAME: Chaperonins TCP- 1 signature 3. 

CONSENSUS: Q-(DEKJ-x-x-fLIVMGTAl-[GAJ-D-G.T. 

NAME: Hcai shock hsp20 proteins family profile. 

NAME: Heal shock hsp70 proteins family signature I . 
CONSENSUS: [IVJ-D.L-G-T-[STJ-x-[SCl. 

NAME: Heal shock hsp70 proteins family signature 2 
CONSENSUS: ™^^[^'^^^^i-^^>^^^^-'VMFS)-G-^^^^^ 

NAME: Heat shock hsp70 proteins family signature 3 . 

CONSENSUS: fI-IVMY]-x-[LIVMF)-x.G-G.x.IST].x.[LIVM].p.x.(LIVM]-x-[DEQKRSTAJ. 

NAME: Heat shock hsp90 proteins family signature. 
CONSENSUS: Y-x-(NQHl-K-[DEl-IIVA)-F-L-R.[ED]. 

NAME: Chaperonins cIpA/B signature I . 

CONSENSUS: D-IAn-ISGA)-N-ILIVMF|(2>-K-fPD-x-L-x(2)-G. 
NAME: Chaperonins cIpA/B signature 2 

toSuS: ^y}.';',''"'^'-'^*-''-^-l'-'VMFY]-x-E-[KRQ].x-[ST^ 
NAME: Nt-dnaJ domain signature. 

CONSENSUS: (FYl-x(2).[LIVMA].x(3HFYWHNTl-[DENQSAJ.x.L-x-{DNl<3HK^^^^ 

NAME: dnaJ domain profile. 

NAME: CXXCXGXG dnaJ domain signature, 

CONSENSUS: C.[DEGSTHKRl-x-C-x-G-x.[GK]-(AGSDMl-x(2)-[GSNKR]-x(4,6VC-x(2.3)-C-x-G.x-G. 
NAME: grpE protein signature. 

NAME: Bacterial type II secretion system protein C signature 

CONSENSUS: P-x(6)-F-x(4)-L.x(3)-D-[LIVM]-A.[LIVM]-x-fLIVM]-N.x-[LIVMl-x-L. 
NAME: Bacterial type II secretion system protein D signature 

tGRI-[DEQKGl-(STVM].[UVMAl(3HGA].G-(LIVMFY]-x(ll).[UVM)-P- 
CONSENSUS: lLIVMFYWGSJ.[LIVMn-lGSAEl-x-[LIVM]-P.[LIVMFYWl(2W^^ 

NAME: Bacterial type II secretion system protein E signature 
CONSENSUS: ILIVMI-R-x(2).P-D.x-[LIVMl(3)<5-E.[LIVM]-R-D. 

NAME: Bacterial type II secretion system protein F signature 

NAME: Bacterial type n secretion system protein N signature 
CONSENSUS: G-T-L-W-x-G-x(l l>-L-x(4)-W. 

NAME: Bacterial export FHIPEP family signature. 
NAME: Protein secA signatures. 

CONSENSUS: [IV]-x-aVHSA]-T-rNQl.M.A<J-R-G-x.D.I.x-L. 
NAME: Protein sccY signature 1 . 

CONSENSUS: (GST]-[LIVMF)(2)-x-[LIVM]-G-ILIVM]-x-P.[LIVMFYl(2)-x-IASVrGST^ 
CONSENSUS: [LIVMFAT](3)-Q-[LIVMFAK2). «irijuhx IAJ>J lO^^TQl 
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NAME: Protein secY signature 2. 

CONSENSUS: [LIVMFYWI(2)-x4DE]-x-[LIVMFl-ISTN]-x(2)-G-[UVMFI-IOSTl.[NST]-G-js-[GSTl- 

CONSENSUS: [LIVMF1(3). 

NAME: Proiein secE/scc61 -gamma signanire. 

CONSENSUS: [LIVMFY].x(2)-[DENQGAJ-x(4)-[LIVMTAJ-x-[KRVJ-x<2)-[KW]-P.x(3).lSEQ].x(7)- 
CONSENSUS: [LIVTIILIVGAJ-LLIVFGAST]. 

NAME: Gram-negadve pUi assembly chaperone signature. 

CONSENSUS: [UVMFYl.[APN).x-[DNSl.[KREQ]-E-[STR]-{UVMAR>x.[FYWT)-x^ 
CONSENSUS: x(2>-[LIVM)-P.tPAS). 

NAME: Fimbrial biogenesis outer membrane usher protein signature. 

CONSENSUS: (VL]-[PASQJ-[PAS3-G-[PAD]-[FY]-x-(Ln-[DNQSTAP]-[DNH]-[LIVMFYl. 

NAME: SRP54-type proteins GTP-binding domain signanire. 

CONSENSUS: P-[UVM]-x-[FYLl.tLIVMATl.IGS]-x-(GS)-IEQ].x(4)-[LrVMF]. 

NAME: Cytochrome c oxidase assembly factor COX 10/ctaB/cyoE signature, 
CONSENSUS: [EDl-x-D-x(2)-M-x-R-T-x(2)-R-x(4)-G. 

NAME: Cyclin-dependem kinases regulatory subunits signature 1 . 

CONSENSUS: Y-S-x-[KRl-Y.x.[DE](2>-x.[Fy]-E-Y-R-H-V-x-rLVl-(Pn.[KRPl. 

NAME: Cyclin-dependent kinases regulatory subunits signature 2. 
CONSENSUS: H-x-P-E-x.H-[IVl-L-L-F-[KRJ. 

NAME: Pcmaxin family signamrc. 
CONSENSUS: H-x-C.x-(SnW.x-(ST). 

NAME: Immunoglobulins and major histocompatibility complex proteins signature. 
CONSENSUS: (FY]-x-C-x-[VAJ-x-H. 

NAME: Prion protein signature 1. 

CONSENSUS: A-G-A-A-A-A-C-A-V-V-G-G-L-G-G-Y. 
NAME: Prion protein signature 2. 

CONSENSUS: E-x-fED)-x-K-(LlVM){2>-x-(KR]-rUVMl(2)-x-[QEJ-M-C-x(2)-Q-Y. 
NAME: Cyclins signature. 

CONSENSUS: R-x(2HUVMSA]-x(2)-[FYWSJ.[LIVMl-x(8)-rUVMFC]-x(4)-[LIVMFYA]-x(2)- 
CONSENSUS: [STAGC).fLIVMFYQ)-x-[UVMFYC}-[LlVMFYl-D-(RKHl-[LIVMFYW]. 

NAME: Proliferating cell nuclear antigen signature 1 . 

CONSENSUS: lGAj-[UVMF].x-[lJVMA]-x-tSAV).[UVM]-D-x-(NSAE]-(HKR]^^ 
CONSENSUS: [VGA]-x.[LIVMl-x-[UVMl.x(4).F. 

NAME: Proliferating cell nuclear antigen signamre 2. 

CONSENSUS: ERKAJ-C-{DE)-[RH]-x(3)-ILlVMF)-x(3)-[LIVMl-x-ISGAN]-(LIVMF]-x.K- 
CONSENSUS: [LIVMF](2). 

NAME: ActuKdepolymeriziiig proteins signature. 

CONSENSUS: P-IDEl-x.[SAl-x.[LIVMT3-[KR3-x-lKR]-M-[LIVM]-[YA].[STA](3M3)-^^ 
CONSENSUS: [KRJ. 

NAME: BCL2-like apoptosis inhibitors (spans pan of BH3. BHl and BH2). 
NAME: Apoptosis regulator. Bcl-2 family BHl domain signanire. 

CONSENSUS: fLVME]-[Fnx.tGSD]-[GL].x(1.2)-{NS)-[YWI<^R.nLIV]-[LIVCHGATJ 
CONSENSUS: [LIVMF](2)-x-F-(GSAEHGSARY]. 

NAME: Apoptosis regulator. Bcl-2 family BH2 domain signature. 
CONSENSUS: W-ILIM]-x(3HGR)-G-[WQ]-(DENSAV].x-[FLGAHUVFrci. 

NAME: Apoptosis regulator, Bcl-2 femily BH3 domain signature. 

CONSENSUS: [lJVATl-x(3)-L-[KARQ^x-[IVALJ-G-D-[DESG^[UMFV3-(DENSHQ]-^^ 
CONSENSUS: [NSR]. 

NAME: Apoptosis regulator. Bcl-2 family BH4 domain signature. 
CONSENSUS: [DS]-[NTl-R-(AEh[UJ-V-x.[KD]-[FY)-(UV]-[GHSJ-Y.^^^ 
CONSENSUS: [HY)-x-ICWl. 

NAME: Apoptosis regulator. Bcl-2 family BH4 domain profile. 
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NAME: Arrestins signature. 

CONSENSUS: fFY]-R-Y-G-x-(DE](2)-x-[DE]-[LIVM](2)-G-[UVM]-x-F.x-[RKl-[DEQ]-(LIVMl. 
NAME: AAA-protein family signature. 

^nMcnM^J!!- fl-'VMTl-x-|UVI^-[UVMF)-x4GATMCJ-[STWNS)-x(4)-[LIVMl-D-x-A-(UFAl 

CONSENSUS: x-R. 

NAME: Ubiquitin domain signature. 

CONSENSUS: K>x(2)-fLIVM).x-(DESAKJ.x(3)-[LIVM]-[PAJ-x(3).Q-x-[UVMJ-(LIVMC]- 
CONSENSUS: [UVMFY]-x-G.x(4HDE]. 

NAME: Ubiquitin domain profile. 

NAME: ADP-ribosyladon faciors family signature. 

NAME: GTP-binding nuclear protein ran signature. 
CONSENSUS; D-T-A-G-Q>E-K-ILFl.G-G-L-R-tDE]^.Y-Y. 

NAME: SARI family signature. 

CONSE^Jsul' ^:^!^'^**^^-V-F-M<:-S-BJVM](2)-x^KR(a-x<3-Y-x-E^AGHHl-x-W^ 
NAME: Band 7 protein family signature. 

CONSENSUS: R-x(2)-[LIV]-[SAN]-x(6)-[UV].D-x(2)-T-x(2)-W<J.{UV>[KRH].rLIV]-x- 
CONSENSUS: (KRJ-(LIV]-E-[UV)-{KR). 

NAME: Tip-A^ (WD) repeats signature. 

SS!!!o!^!y^- ^LIVMSTAC].(LIVMFYWSTAGC^[UMSTAG]-[LIVMSTAGC].x(2HDN^.x(2). 
CONSENSUS: CLIVMWSTACl.x-[LIVMFSTAG)-W-[DEN]-[LIVMFSTAGCN]. 

NAME: G-protein gamma subunit profile. 

NAME: Ras GTPase -activating proteins signature 
cSSuS: 

NAME: Ras GTPase-activating proteins profile. 

NAME: Guanine-nucleotide dissociation stimulators CDC24 family signature 

CONSENSUS: L-x(2)-[LIVMFYW].L.x(2)-P-(LIVM]-x(2)-(LIVM]-x.(KRSl-x(2)-L-x-rLIVMl-x^ 
CONSENSUS: [DEQ]-[UVM]-x(3HST|. l-^J ^ x ilivmj x 

NAME: Guanine-nucleotide dissociation stimulators CDC25 family signamre 
CONSENSUS: fGAPl-[Cn-V-P-[FY)-x(4)-IUVMFYJ-x4DNHLIVMJ. 

NAME: MARCKS family signature I. 
CONSENSUS: G-Q-E N-G-H-V.{KR]. 

NAME: MARCKS family phosphorylation site domain. 

CONSENSUS: E-T-P.K(5>x(0j^F.S-F.K.K-x.F-K-L.S-G.x-S.F-K-[KR]-p^^^ 

NAME: Stathmin family signature 1. 

CONSENSUS: P-lICQi-IKRl(2HDE)-x-S.L-[EGl-E. 

NAME: Stathmin family signature 2. 
CONSENSUS: A-E-K-R-E-H-E-[KR1-E-V. 

NAME: GTP-binding elongation factors signature. 

SSusi ;^^^^Q™')■E^KRAQ^x-(RKQDHG(Wn.M«^^^ 

NAME: Elongation factor 1 beta/beta '/delta chain signature 1 
CONSENSUS: [DE1.[DEG]^DE](2>.[LIVMF].D-L-F-G. 

NAME: Elongation factor 1 beta/betaVdelta chain signature 2 
CONSENSUS: V-Q-S.x-D-[LIVM]-x-A-(FWM]-[NQ]-K-[LIVM]. 

NAME: Elongation factor 1 gamma chain profile. 

NAME: Elongation factor Ts signature 1. 

CONSENSUS: L-R-x(2)-T-[GDQ]-x-[GS]-{UVMF]-x(0.1HDENKAq-x-K-[^ 
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NAME: Elongation factor Ts signature 2. 

CONSENSUS: E-rLIVM]-N>[SCV)-[QE)-T-D-F-V-|SAl-[KRN]. 

NAME: Elongation factor P signature. 

CONSENSUS: K-x-A-x(4)-G-x(2)-{LIV] x-V.P-x(2)-fLIV].x(2)-G. 
NAME: Eukaryotic initiation factor 1A signature. 

CONSENSUS: [IMl-x.G-x-IGS]-(KRH]x(4)-lCtJ-x-D-G-x(2)-R-x(2HRHl-I-x-G. 
NAME: Eultaryotic initiation factor 4E signature. 

CONSENSUS: PEl-{!FYl-Ji(2)-F-[KRJ-x(2)-tLIVM]-x-P-x-W-E-[DV]-x(5)-G-G-[KRl-W. 

NAME: Eukaryotic initiation factor 5A hypusine signanire. 
CONSENSUS: [PTl-G-K-H-G-xA-K. 

NAME: Initiation factor 2 signaojre. 

CONSENSUS: G-x.[UVM]-x(2)-L-[KRl-[KRHNS]-x-K-x(5>-lUVMl-x(2>-G-x-[DEN]-C-G. 
NAME: Initiation factor 3 signature. 

CONSENSUS: (KK]-[LIVMJ(2)-IDNl-(FYJ-IGSN)[KR)-[LIVMFYS]-x-[FYl-{DEQT)-x(2HKRJ. 

NAME: Translation initiation factor SUIl signature. 

CONSENSUS: (LIVMI.[EQ1-!LIVM]-Q-G-[DEN}-[KHQ1-[KRV]. 

NAME: Prokaryotic-type class I peptide chain release factors signature. 
CONSENSUS: (ARMSTAl-x-G-x-G-G-Q-rHNGCS]-V-N-x(3)-[Sn-A-aVl. 

NAME: Transcription termination factor nusG signature. 
CONSENSUS: (UVM}-F-G-rKRWJ.x-T-P-(IV]-x-[LIVM]. 

NAME: Calponin family repeat. 

CONSENSUS: lLIVMJ-x.[LSJ-Q-IMASl-G-[STYl-[NT].[KRQ)-x(2>(STN]-Q-x-G-x(3.4)-G. 

NAME: CAP protein signature 1. 

CONSENSUS: ILIVM)(2)-x-R-L-[DEJ-x(4)-R.L-E. 

NAME: CAP protein signature 2. 

CONSENSUS: D-[LlVMFY]-x-E-x-[PAJ-x-P-E-Q-[LIVMFY)-K. 
NAME: Calreticulin family signature 1 . 

CONSENSUS: [KRHN]-x-(DEQN]-[DEQNK]-x(3K:-G-G.[AGMFY]-[UVMJ-[KN]-{LIVMFY1(2). 

NAME: Calreticulin family signature 2. 
CONSENSUS: [LIVM](2)-F-G-P-D-x-C-IAGl. 

NAME: Calreticulin family repeated modf signature. 

CONSENSUS: [IV]-x-D-x-[DENST).x(2)K-P-(DEHJ-D-W-[DEN). 

NAME: Calsequesthn signature 1. 

CONSENSUS: [EQJ-[DEl-G-L-IDNJ-F-P-x-Y-D-G.x-I>-R.V. 
NAME: Calsequestiin signanirt 2. 

CONSENSUS: [DE)-L-E-D-W-ILIVM]-E-D-V-L-x-G-x-[UVM>N-T-E-D-D-D. 
NAME: S-lOO/ICaBP type calcium binding protein signature. 

CONSENSUS: [LrVMFYWl{2)-x(2)-[Liq-D-x(3)-[DN].x(3)-pNSG)-tFy]-x-[ESl-{FYVq.x(2>- 
CONSENSUS: (LIVMFS1-[LIVMFJ. 

NAME: HemoIysir>-type calcium-binding region signature. 
CONSENSUS: D-x-(LI]-x(4)-G-x-D-x-[Ul-x-G-G-x(3)-D. 

NAME: HlyD family secretion proteins signanire. 

CONSENSUS: [UVM]-x(2)^-[LM].x(3HSTGAVJ.x.[UVMT]-x-[UVMTJ-[GEl-x-^^^ 
CONSENSUS: [LIVMFYW](2)-x-{LIVMFYW](3). 

NAME: P-n protein urydylation site. 
CONSENSUS: Y-[KR)-G-(AS]-(AE)-Y. 

NAME: PA\ protein C-terminal region signature. 

CONSENSUS: lSn-x(3)-G-[DY)-G-[KR}-(IV)-(FW)-ILIVM]-x(2)-[LIVM). 

NAME: 14-3-3 proteins signature 1. 

CONSENSUS: R-N-L-[LIV1-S-(VG1-(GA]-Y-(KN]-N-(IVA]. 
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NAME: 14-3-3 proteins signature 2. 

CONSENSUS: Y.K-{DE].S-T.L4.[IMJ.Q-L-[LF]-nmq.D-N-[LIl-T.fUJ)-W-(^^ 

NAME: ATPIGI / PLM / MAT8 family signature 

CONSENSUS: CDNSJ-x-F-x-Y-D-x(2)-ISTJ.{LIVM]-[RQ)-x(2)-G. 

NAME: BTGl family signature 1. 

CONSENSUS: V.x{2)-rHP]-W.fFY]-(APJ.E.x.P-x.K-G.x-tGAJ.IFYl-R-C.[IVl-[RHHIV]. 
NAME: BTGl family signature 2. 

CONSENSUS: [LVJ-P-x-[DE]4UWJ-[STl-[LIVM^W-^VJ-D.P-x-E-V-[Sq-x.(RQ]-x.G^ 
NAME: Cullin family signature. 

CoSuft ™•;«H"V^»(2)-L-^(DEQJ.[KRHNQ).x-Y-IL^^^ 

NAME: Cullin family profile. 

NAME: Enhancer of nidimentary signature 

CONSENSUS: Y-D-I.[SAl-x-L-[FY]-x.F-(IVJ-D-x(3M)-(LlVJ.S. 
NAME: G 10 protein signature I. 

CONSENSUS: L-C-C-x-[KRl-C-x(4)-[DEJ-x.N-x(4)-C.x-C.R.V-P. 

NAME: GIO protein signature 2. 

CONSENSUS: C.x-H-C-G-C-IKRH)^-C-(SA]. 

NAME: Gtucokinase regulatory protein family signature 

CONSENSUS: G-[PA]-E.x-[LIVl-[STAl-G-S-lSD-R-[UVMl.K-[STGAJ(3)-x(2)-K. 
NAME: GTPl/OBG family signature. 

CONSENSUS: D-(LIVM]-P-G-ILIVMl(2HDEYl-[GN].A-x(2)-G-x-G. 

NAME: HIT family signature. 

CONSENSUS: [^sGAi'^*^^"^'^*'"'*^'^*'^''-"-^'^'^^ 

NAME: Caseins alpha/beta signature. 
CONSENSUS: C-L.fLVl.A-x-A-[LVF].A. 

NAME: Clathrin adaptor complexes mediuni chain signanire 1 

^sS; ™°^^'-*-R-»».3MGAD].x(2)-IHY]<2)-N-v[UVMAFY^^^ 

NAME: Clachrin adaptor complexes medhini chain sigiature 2 
CONSENSUS: tUV]-x-F.l.P-P.x-G-x-IUVMFY]-x-L-x(2)-Y. 

NAME: Clathrin adaptor complexes small chain signanire. 
CONSENSUS: [LrVM](2)-Y-(KR]-x(4)-L-Y-F. 

NAME: Ependymins signature I. 

CONSENSUS: F.E-E-G-x.IUVMFl-Y-[EDl-M>-x(2)-N.[QEl-S-C-[RKHK2). 
NAME: Epenciymins signamre 2. 

CONSENSUS: [QE3-[UVMAJ.F-x(2VP-[STAl-[FYl-C-IDE]-[GA)-[LIVM]-x(2H^^^ 

NAME: Syntaxin / eptmorphin family signature. 

CONSENSUS: lRQ]-x(3)-fLIVMA]-x(2HLIVMl-fESH].x(2Hl^ 

[Lrv^-[FS]-x(2)-[LIVMJ.x(3HLIVT]-x(2)^J.[GADEQ).x(2)-ELIV^ 
[LIVMF]-[DESV]-x(2)-[LIVM]. ^ j i vij x 


CONSENSUS 
CONSENSUS 


NAME: Exttaceltular proteins SCP/Tpx-l/Ag5/PR-l/Sc7 signature 1 
CONSENSUS: [GDER]-H-[FYWHJ-T-Q-[UVM](2)-W.x(2)-[STN]. 

NAME: ExtiaccUular proteins SCP/Tpx-l/Ag5/PR-l/Sc7 signature 2 

CONSENSUS: ILIVMFYHl-[LIVMFY].x<:.[NQRHS].Y.x-[PARH].x-(GLJ-N-[^ 

NAME: Fetuin fiunily signamre 1. 

NAME: Fetuin family signamre 2. 
CONSENSUS: L-E-T-x-C*H.x-L-D-P.T-P. 
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NAME: Legume lectins beta-chain signature. 
CONSENSUS: [LrV].[STAG]-V-[DEQVl.[FLI]-D.[STl. 

NAME: Legume lectins alpha-chain signature. 

CONSENSUS: [LrV]-x-[EDQ]-lFYWKR]-V-x.(LIV]-G-[LFJ-[STl. 

NAME: Venebraie galactoside-binding lectin signature. 

CONSENSUS; W-[GEK)-x.[EQ]-x-[Klffi]-x(3,6)-CPCTin-(LIVMH-[NQEGSKV]-x-[GH]-x(3)- 
CONSENSUS: [DENKHS]-[LIVMFC1. 

NAME: Lysosome-associated membrane glycoproteins duplicated domain signature. 
CONSENSUS: ^STA)-C-ILIVM]-[LIVMFYWJ-A-x-[LIVMFYWl-x(3)-[LIVMFYW^x(3)-Y. 

NAME: LAMP glycoproteins transmembrane and cytoplasmic domain signature. 

CONSENSUS: C-x(2)-D-a(3.4)-(LIVM](2)-P-[LIVM)-x.[LIVM]-G-x(2HLIVM1-x-G-[LIVM)(2)- 

CONSENSUS: x-[LIVM](4)-A-[FYl-x-lLlVM]-x(2)-[KR]-[RH]-x(U2)-tSTAGJ(2)-Y-[EQ]. 

NAME: Glycophorin A signature. 

CONSENSUS: M-x-[GAC]-V-M-A-G-[LIVMJ(2). 

NAME: PMP-22 / EMP / MP20 family signature 1 . 

CONSENSUS: tLIVMF](4)-[SA]-T-x(2HDNKS]-x-W-x(9,13)-(LIV)-W-x(2)-C. 

NAME: PMP-22 / EMP / MP20 family signature 2. 

CONSENSUS: [RCy-lAV]-x-M-IIV]-L-S-x-[LI]-x(4HGSA]'(UVMFl(3). 

NAME: Oxysterol-binding protein family signature. 
CONSENSUS: E'[KQ]-x-S-H-[HRl-P-P-x-(STACFl-A. 

NAME: Yeast PIR proteins repeats signature. 

CONSENSUS: S-Q-(IVJ-[STGNH)-D-G-Q-{LIV]-Q-[AIV].[STA). 

NAME: Seminal vesicle protein I repeats signature. 

CONSENSUS: fIVM|-x-G.Q-D-x-V-K-x(5)-[KN)-G-x(3HSTLV]. 

NAME: Seminal vesicle protein 11 repeats signature. 
CONSENSUS: [GSA]-Q-x-K-S-fFYl-x-Q-x-K-[SAl. 

NAME: Serum amyloid A proteins signature. 

CONSENSUS: A-R-G-N-Y-[ED]-A-x-|QKR)-R-G-x-G-G-x-W-A. 

NAME: Spermadhesios family signature 1. 
CONSENSUS: C-C-x(2)-[LI)-x(4>-G-x.I-x(9H;-x-W-T. 

NAME: Spermadbesins family signamre 2. 

CONSENSUS: C-x-K-E-x-(LIVMl-E-[LIV\!]-x-(DE]-x(3HGS)-x(5)-K-x-C. 

NAME: Stress-induced proteins SRPl/TIPl family signature. 
CONSENSUS: P-W-Y-[ST)(2)-R-L. 

NAME: Glypicans signature. 

CONSENSUS: C-x(2K.x-G-[Ln^-x(4)-P<:.x(2HFYl-C-xa)-[UVM)-x<2H5-C. 

NAME: Syndecans signature. 

CONSENSUS: [FY]-R-[IM14KR1-K(2)-D-E^-S-Y. 

NAME: Tissue factor signature. 

CONSENSUS: W-K-x-K-C-x(2)-T-x-(DEN]-T-E-C-I>-[LIVM]-T-D-E. 
NAME: Translationally controHed tumor protein signature 1. 

CONSENSUS: [IA]-G-[GASl-N-[PA^S-A-E-(GDE}-[PAGE].x(0,lHDEG^x-^)ENl-x(2)-[DEl. 
NAME: Translationally controlled mmor protein signature 2. 

CONSENSUS: [FLl.[FYl.[IVTl-G-E.x-[MA]-x{2.5)-[DEN14GASJ-x-[LV]-[AVl-x(3)-IF^ 
CONSENSUS: [DE]. 

NAME: Tab family signature I . 

CONSENSUS: F-[KHQ}-G-R-V.[STI-x-A-S-V-K-N-F-Q. 
NAME: Tub family signature 2. 

CONSENSUS: A-F.[AG] ^[SAC]-[LIVMl-(STl-S-F-x-[GSTJ.K-x-A-C-E. 

NAME: HCP repeats signature. 
CONSENSUS: H-R-H-R-G-H-x(2)-[DE](7). 


1078 


PCT/IBOO/01496 

NAME: Bacterial ice-nucleaiion proteins octamer repeat 
CONSENSUS: A-G-Y-G-S-T-x-T. 

NAME: Cell cycle proteins ftsW / rodA / spoVE signature 

NAME: Enterobacterial virulence outer membrane protein signature I 
CONSENSUS: G-[LIVMFY].N-[LIVM]-K-Y-R-Y-E. 

NAME: Enterobacterial virulence outer membrane protein signature 2 
CONSENSUS: [FYW].x(2)-G.x-G-Y-[KR]-F>. 

NAME: Hydrogenases expression/synthesis hypA family signature 

Sensus- 

NAME: Hydrogenases expression/synthesis hupF/hypC family signatuie 
CONSENSUS: < M-C-[LIV]-(GAJ-f UV]-P.x-[QKRJ-[LIV). 

NAME: Staphylocoagulase repeat signature. 

CONSENSUS: A-R-P-x(3)-K-x.S-x-T-N.A-Y-N.V-T-T-x(2HDNl^.x(3).Y-G. 

NAME: 1 1-S plant seed storage proteins signature. 

CONSENSUS: N-G.x-PEl(2)-x.[UVMF]-C.[STl-x(l 1 .12HPAG1-D. 

NAME: Dehydrins signature 1 . 

CONSENSUS: S(5)-[DE]-x-[DE)-G-x(1.2)-G-x(0.1).EKR](4). 

NAME: Dehydrins signature 2. 

CONSENSUS: [KRMUM).K-[DE]-K-[LIM)-P-G. 

NAME: Germin family signature. 

CONSENSUS: G.x(4)-H.x-H.p.x.A.x-E-[LIVMl. 

NAME: Oleosins signature. 

CONSENSUS- ^^^^-1^*<2HAG3-A(2HLIVMJ^SADJ.T-P.[m^ 

NAME: Small hydrophUic plant seed proteins signature 
CONSENSUS: G-[EQJ-T-V-V-p.G-G-T. 

NAME: Pathogenesis-ixlated proteins Betvl family signauire. 

CONffiNSUsi ^f2V[LIVMF].x(4)-E-x(2)-[CSTAEN]-x(8.9HGND]-G-r^^^ 

NAME: Pollen proteins Ole e I family signature. 
CONSENSUS: [EQ]-G-x-V-Y-C-D-T-C-R. 

NAME: Thaumatin family signature. 

CONSENSUS: G-x-IGF]-x-C-x-T-[GA]-CK:-x(1,2)-G.x(2.3)-C. 
NAME: Mrp family signature. 

CONSENSUS: W-x(2)-[UVM]-D-[LIVMY](4).D.x.p.P-G-T-[GSJ.D. 

NAME: Glucose inhibited division protein A family signature 1 
CONSENSUS: lGS].P-x-Y-C-P-S-[UVM]-E-x-K-{LIVMJ-x-[KRI.F. 

NAME: Glucose inhibited division protein A family signature 2 
CONSENSUS: A-G^^x-[NTI-G-x(2>^-Y.x-E.fSAGK3)-fQSl-G.IU^ 

NAME: NOH/NOP2/sun family signanire. 

CONSENSUS: rFV].I>.[KRA3-[LIVMA)-L-x-D.[AV]-P-C-[ST|-[GAl. 
NAME: PETl 12 family signanire. 

CONSENSUS: CDN>x-[DN]-R.x(3)-P-L.[UVJ-E-(LIV].x-[ST|-x.p. 
NAME: Protein smpB signature. 

CONSENSUS: rrA)-G.[UVMhx-L-x-G-x-E-[LIVMMKQ]-{SA3-[LIVMl. 
NAME: Hypothetical cof fomily signature 1 . 

CONSENSUS: IUVFYANl-fLIVMFA)-x<2)-I>.[LIVMH-fNDJ-G-T-[I^ 
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NAME: Hypothetical cof family signature 2. 

CONSENSUS: [LIVMFCI-G-D.[GSANQJ-x-N-D-x(3)-(LIMFY}-x(2).[AV]-x(2)-(GSCP]-x(2)- 
CONSENSUS: [LMPJ-x(2HGAS], 

NAME: RIOl /ZK632.3/MJ0444 family signature. 
CONSENSUS: [LrVM)-V-H-(GA]-D-L-S-E-[FY]-N-x-[LIVM). 

NAME: SUA5/yciO/yrdC family signature. 

CONSENSUS: [LIVMTA](3HUVMFYC]-[PGJ-T.[DEJ-ISTAJ-x-[FY]-[GA]-[LIVM].[GSJ. 

NAME: Uncharacterized protein family UPFOOOl signature. 
CONSENSUS: {FW].H-[FMHIVl-G-x-(LIVl-Q-x-INKRJ-K-x(3)-[LIV]. 

NAME: Uncharacterized protein family UPP0003 signature. 

CONSENSUS: G-x-V-x(2)-rLIVl-x(3V[SAJ-x(6)-I>-x(3)-lUVTl(3)-P-N-x(2)-fLIVMF](2)- 
CONSENSUS: x(5)-N. 

NAME: Uncharacterized protein family UPP0004 signature. 
CONSENSUS: [UVMl-x4LIVMT)-x(2)-G-C-x(3)<:^STANl-(fT]-C-x^ 

NAME: Uncharacterized protein family UPP0005 signanire. 

CONSENSUS: G-[UVM](2)-[SAJ-x(5,g)-G-x(2HLIVM].G.P-x-L-x(4HSAG)-x(4,6). 
CONSENSUS: [LIVMK2)-x(2)-A-x(3)-T-A-tLIVM](2)-F. 

NAME: Uncharacterized protein fiamily UPP0006 signamre 1. 
CONSENSUS: llJVMFYJ(2)-D-[STA)-H.x-H.[LIVMF]-fDNl. 

NAME: Uncharacterized protein family UPP0006 signature 2. 
CONSENSUS: P-[LIVM]-x-[LIVM]-H-x-R-x-[TA3-x-[DE]. 

NAME: Uncharacterized protein famOy UPP0006 signanire 3. 

CONSENSUS: [LVSAHUVAhx(2)-[LIVMl-[PS)-x(3VL-[UVM]-[LIVMSJ-E-T-D-x-P. 

NAME: Uncharacterized protem fiimily UPP0007 signanire. 
CONSENSUS: V-L-nVl-H-D-IGAl-A-R. 

NA^iE: Uncharacterized protein family UPPOOl 1 signature. 
CONSENSUS : S-D-A-G-x-P-x-[LIV]-(SN]-D-P-G . 

NAME: Uncharacterized protein fomily UPF0012 signaoire. 
CONSENSUS: [GTAJ-x(2)-[IVT|-C.Y-D-[LlVM).x-F-P-x(9)-G. 

NAME: Uncharacterized protein family UPF0015 signanire. 

CONSENSUS: [DEl-[LIVMF](3)-R-T-(SGl-G-x(2)-R-x-S-x-[FY]-[LIVM]<2)-W-Q. 

NAME: Uncharacterized protein femily UPF0016 signature. 
CONSENSUS: E-{UVM1-G-D-K-T-F-[UVMF](2)-A. 

NAME: Uncharacterized protein family UPF0017 signature. 

CONSENSUS: D.x(8HGNl-[LFY]-x(4HDET|.[LY]-Y-x(3HST|-x(7)-{IV]-xa)-CPSJ-x- 
CONSENSUS: [LIVM]-x-[LIVM]-x(3)-[DN]-D. 

NAME: Uncharacterized protein family UPF0019 signanire. 

CONSENSUS: L-P-V-(VTJ-[NQL)-F-[ATI-A-G-G-(LIV)-A-T-P-A-D-A-A-[U43. 

NAME: Uncharacterized protein fomily UPF0Q2O signature. 
CONSENSUS: D-P-[LIVMFl-C-G.{STl-G-x(3HUl-E. 

NAME: Uncharacterized protein family UPP0021 signamre. 
CONSENSUS: C-K'X(2)-F-x(4)-E-x(22,23)-S-G^-K-D. 

NAME: Uncharacterized protein family UPF0O23 signanire. 
CONSENSUS: D-x-D-E-(LrVJ-L-x(4)-V.F-x(3)-S-K-0. 

NAME: Uncharacterized protein family UPF0024 signamre. 
CONSENSUS: G-x-K-D-(KRl-x-A-[LVJ-T-x-Q-x-[UVF)-(SGC]. 

NAME: Uncharacterized protein family UPP0O2S signature. 
CONSENSUS: D-V-[LIVl-x(2)-G-H-(Sn-H-x(12HUVMFl-N-P-G. 

NAME: UfK:haractcrized protein fiamily UPF0027 signature. 
CONSENSUS: Q-CUVM].x-N-x-A-x4LIVM]-P-xJ.x(6).[LIVM].P-I^^^ 

NAME: Uncharacterized protein family UPF0028 signamre. 
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CONSENSUS: [GA]-(GSJ-G-[GA]-A-R-G.x.ISA].H-x-G-x(9).(lV)-A-(IV].D-A(2)-rGAl-G.x-S. 
CONSENSUS: x-G. i x ri j vj * a 

NAME: Uncharacterized procein family UPF0029 signature. 

CONSENSUS: G-x(2HUVM](2)-x(2)-lLIVM]-x(4)-[LIVM].x(5)-[LIVM](2)-x.R-rFYW](2)-G- 
CONSENSUS: G-x(2HLIVM]-G. 

NAME: Uncharacterized protein family UPF0030 signature. 
CONSENSUS: [GA}-L-I-(LIV]-P-G-G-E-S-T-[STA]. 

NAME: Uncharacterized protein family UPFOOSl signature 1. 

CONSENSUS: [SAVl-[IVW]-[LVA]-[UVJ-G-[PNS)^-L-(GPl-x-[DENOTl. 

NAME: Uncharacterized protein family UPRX)31 signature 2. 
CONSENSUS: IGA}-G-x.G-D-(TV>[LT|-[STA]-G-x-[LIVMI. 

NAME: Uncharacterized protein family UPF0032 signature. 

CONSENSUS: Y.x(2)-F-IUVMAJ(2)-x-L-x(4)-G.x(2)-F-[EQ]-[LIVMF]-P-(UVM3. 

NAME: Uncharacterized protein family UPF0033 signature. 
CONSENSUS: L-[DN].xa)-rrAGl-x(2)-C-P-x-P.x-fLIVMJ. 

NAME: Uncharacterized protein family UPF0034 sigiature. 

CONSENSUS: (LIVM].[DNG]-[UVM]-N-x-G-C-P-x(3HLIVMASQ)-x(5H5-(SAC]. 

NAME: Uncharacterized protein family UPF0035 signature. 
CONSENSUS: L-L-T-x-R-(SAl-x(3)-R>x(3)-G-x(3)-F-P-G-G. 

NAME: Uncharacterized protein family UPF0036 signatuic. 

CONSENSUS: H-x-S-G-H-{GAhx(3)-PE)-x(3HIJ^]-x<5)-P.x(3HLIVMl-P-x.H-G-[DE^ 

NAME: Uncharacterized protein family UPF0038 signature. 
CONSENSUS: G-x-[LIl-x-R-x(2)-L-x(4)-F-x(8)-(LIV].x(5)-P-x-(LIV]. 

NAME: Uncharacterized protein fimiily UPF0044 signature. 

CONSENSUS ^^^■*<^>-^-*<3HKRHSGA3-x-tGAJ-H-x-L-x-P-tLIVl-xa)-^^ 

NAME: Uncharacterized protein family UPF0047 signature 
CONSENSUS: S-X(2)-[LIV].x-{LIV]-x{2)-G-x(4K5-T-W-Q-x-[LIVl. 

NAME: Uncharacterized protein family UPF0054 signanirc. 
CONSENSUS: H-[GSl-x-L-H-L-[U]-G-IFYW)-D-H. 

NAME: Uncharacterized protein family UPF0057 signature 

CONSENSUS: fLIV]-x.[STA]-[LIVFl(3)-P-P-IUVA]-[GA]-[IV>x(4HGKN]. 

NAME: Hypothetical YER057c/yijV family signature. 

CONSENSUS:. P-[ATJ-R-[SAl-x-[LIVMY]-x(2HAKl-x-L-P-x{4).[LIVMJ-E. 

NAME: Hypothetical hesB/yadR/yfhF family signanire. 

CONSENSUS: F-x-[LIVMFY].x-N.CPGHNSK)-x(4K:-x-C-tGSJ-x.S-F. 

NAME: Hypothetical yabO/yceC/sftiB family signature. 

CONSENSUS: [NHY].R-fLI]-D-x(2).T-[STl-G-{LIVMA]-fUVMF](2)-[LIVMFGHSGAq. 
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We claim : 

1 . An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_16cl6; hfbr2_16f21; 
hfbr2_16gl8; hfbr2_16il2; hfbr2_16k22; hfbr2_16112; hfbr2_22f21; hfbr2_22hl3; 
hfbr2_22hl3; hfbr2_22i4; hfbr2_22k3; hfbr2_22k8; hfbr2_23blO; hfbr2_23b21; 
hfbr2_23f2; hfbr2_23124; hfbr2_23nl6; hfbr2_23o24; hfbr2_23o5; hfbr2_2a2; hfbr2_2bl7; 
hfbr2_2b5; hfbr2_2cl; hfbr2_2cl7; hfbr2_2cl8; hfbr2_2dl5; hfbr2_2dl7; hfbr2_2d20; 
hfbr2_2gl8; hfbr2_2hl; hfbr2_2hl0; hfbr2_2il7; hfbr2_2kl4; hfbr2_2kl9; hfbr2_3bl6; 
hfbr2_3cl8; hfbr2_3fl6; hfbr2_3g8; hfbr2_312; hfbr2_41ml5; hfbi2_62bll; hfbr2_62fl0; 
hfbr2_62119; hfbr2_62nl0; hfbr2_62ol7; hfbr2_64all; hfbr2_64al5; hfbr2_64cl6; 
hfbr2_64c4; hfbr2_64h6; hfbr2_64i20; hfbrZ 64jl8; hfbr2_64k24; hfbr2_64ol6; 
hfbr2_6al7; hfbr2_6b24; hfbr2_6i20; hfbr2_6ol7; hfbr2_71o20; hfbr2_72bl8; 
hfbr2_72dl3; hfbr2_72112; hfbr2_72ml6; hfbr2_72nl2; hfbr2_78c24; hfbr2_78dl3; 
hfbr2_78k24; hfbr2_78n23; hfbr2_7a24; hfbr2_7e22; hfbr2_7j4; hfbr2_82c20; 
hfbrl_10c20; hfbr2_82el7; hfbrl_10el7; hfbr2_82e4;; hfbrl_10e4; hfbr2_82gl4;; 
hfbrl_10gl4; hfbr2_82il7;; hfbrl lO; hfbr2_82i24;; hfbrl lO; hfbr2_82ml6;; hfbrl_10; 
hfbr2_82m6;; hfbrl lO; hfkd2_lj9; hfkd2_24al5; hfkd2_24bl5; hfkd2_24e23; 
hfkd2_24n20; hfkd2_24p5; hfkd2_3il3; hfkd2_3ol7; hfkd2_46a6; hfkd2_46bl0; 
hfkd2_46dl3; hfkd2_46j20; hfkd2_46kl9; hfkd2_46m4; hfkd2_47a4; bfkd2_4b6; 
hfkd2_4c8; hfkd2_4kl4; hfkd2_4mn; hracfi_lall; hmcfl_lc23; hmcfl_lel5; 
hmcn_lgl3; hhtes3_ln3; htes3_14g5; htes3_14h21; htes3_14pl4; htes3_14p7; 
htes3_15al3; Htes3^15c24; htes3_15c6; htes3_15gl4; htes3_15hl; htes3_15i5; 
htes3_15jl8; Htes3_15j3; htes3_15kll; htes3_17fl0; htes3_17117; htes3_17nl2; 
htes3_17nl8; Htes3_18f3; htes3_18I7; htes3_19fl9; htes3_lS(jl7: htes3_lcl; htes3_lgl3: 
htes3_lkll; htes3_20c21; htes3_20fe2: htes3_20ml8; htes3_21d4; htes3_21jl5; 
htes3_21116; htes3_21n23; htes3_22c23; htes3_22g2; htes3_22nl3; htes3_23111; 
htes3_23nl9; Htes3_23nl9; htes3_26g22; htes3_27dl; htes3_27k4; htes3_27ol4; 
htes3_28dl4; htes3_2all; htes3_2al7; htes3_2dl5; htes3_2el2; htes3_2fl4; htes3_2g7; 
htes3_2hl; htes3_2hl5; htes3_2I19: htes3_2ml8; htes3_2ni20; htes3_2n9; htes3_2ol3; 
htes3_30f4; Htes3_35b4; htes3_35b5; htes3_35e21; htes3_35g6; htes3_35kl6; 
htes3_35k24; htes3_35nl2; htes3_35n24; htes3_35n9; htes3_35pl7; htes3_35p22; 
htes3_4b4; htes3_4fl7; htes3_4f5; htes3_4h6; htes3_4ol9; htes3_50j4; htes3_50n06; 
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htes3_50n23; htes3_6b21; htes3_6cll; htes3_6dl6; htes3_72kll; Htes3_72kl5; 
htes3_72pl6; htes3_7b22; htes3_7dl7; htes3_7j3; htes3_7j8; htes3_7pl0; htes3_7p9; 
htes3_8e24; Htes3_8gll; Htes3_8g5; htes3_8ml0; Htes3_8p7; Htes3_9e22; Htes3_9i20; 
Htes3_9k22; hutel_17k7; hutel_18cl2; hutel_18il9; hutel_18i4; hutel_1811; 
hutel_19fl9; hutel_19gl9; hutel_19g22; hutel_19hl7; hutel_19jll; hutel_li2; 
hutel_20bl9; hutel_20g21; hutel_20hl3; hutel_20mll; hutel_20in24; hmel_21dl5; 
hutel_22d2; hutel_22el2; hutel_22n2; hutel_22o2; hutel_23el3; hutel_23gll; 
hutel_24cl9; hutel_24ell; hutel_24j6; hutel_2h3; their complements; and variants 
thereof. 

2. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_16cl6; hfbr2_16f21; 
hfbr2_16gl8; hfbr2_16il2; hfbr2_16k22; hfbr2_16112; hfbr2_22f21; hfbi2_22hl3; 
hfbr2_22hl3; hfbr2_22i4; hfbr2_22k3; hfbr2_22k8; hfbr2_23bl0; hfbr2_23b21; 
hfbr2_23f2; hfbr2_23I24; ; hfbr2_23nl6; hfbr2_23o24; hfbr2_23o5; hfbr2_2a2; 
hfbr2_2bl7; hfbr2_2b5; hfbr2_2cl; h£br2_2cl7; hfbr2_2cl8; hfbr2_2dl5; hfbr2_2dl7; 
hfbr2_2d20; hfbr2_2gl8; hfbr2_2hl; hfbr2_2hl0; hfbr2_2il7; hfbr2_2kl4; hfbr2_2kl9; 
hfbr2_3cl8; hfl)r2_3fl6; hfl)r2_3g8; hfbr2_312; hfbr2_41ml5; hfbr2_62bll; hfbr2_62fl0; 
• hfbr2_62119; hfbr2_62nlO; hfbr2_62ol7; hfbr2_64all; hfbr2_64al5; hfbr2_64cl6; 
hfbr2_64c4; hfbr2_64h6; hfbr2_64i20; hfbr2_64jl8; hftr2_64k24; hfbr2_64ol6; 
hfbr2_6al7; hfbr2_6b24; hfbr2_6i20; hfbr2_6ol7; hfbr2_71o20; hfbr2_72bl8; 
hfbr2_72dI3; hfbr2_72112; hfbr2_72ml6: hfbr2_72nl2; hfbr2_78c24; hfbr2_78dl3; 
hfbr2_78k24; hfbr2_78n23; hfbr2_7a24; hfbr2_7e22; hfbr2_7j4; hfbr2_82c20; 
hfbrl_10c20; hfbr2_82el7; hfbrl_10el7; hfbi2_82e4; hfbrl_10e4; hfbr2_82gl4; 
hfbrl_10gl4; hfbr2_82il7; hfbrl_10; hfbr2_82i24; hfbrl_10; hfbr2_82ml6; hfbrl_10; 
hfbr2_82m6; hfbrl lO; their complements; and variants thereof. 

3. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of aclone selected from the group consisting of: hfbr2_16f21; hfbr2_16k22; 
hfbr2_22f21; hfbr2_22hl3; hfbr2_22i4; hfbr2_22k3; hfbr2_22k8; hfbr2_23f2; ; hfbr2_23o24; 
hfbr2_23o5; hfbr2_2a2; hfbr2_2cl; hfbr2_2cl8; hfbr2_2d20; hfbr2_2gl8; hfbr2_2hl; 
hfbr2_2hl0; hfbr2_2kl9; hfbr2_3fl6; hfbr2_312; hfbr2_62nl0; hfbr2_64all; hfbr2_64cl6; 
hfbr2_64c4; hfbr2_64h6; hfbr2_64i20; hfbr2_64ol6; hfbr2_6al7; hfbr2_6i20; hfbr2_71o20; 
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hfbr2_72dl3; hfbr2_72ml6; hfbr2_72nl2; hfbr2_78dl3; hfbr2_78n23; hfbr2_7a24; 
hfbr2_7e22; hfbr2_7j4; hfbr2_82ml6; and hfbrl_10. 


4. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfkd2_lj9; hfkd2_24al5; 
hfkd2_24bl5; hfkd2_24e23; hfkd2_24n20; hfkd2_24p5; hfkd2_3il3; hfkd2_3ol7; 
hfkd2_46a6; hfkd2_46blO; hfkd2_46dl3; hfkd2_46j20; hfkd2_46kl9; hfkd2_46m4; 
hfkd2_47a4; hfkd2_4b6; hfkd2_4c8; hfkd2_4kl4; hfkd2_4mll; their complements; and 
variants thereof. 

5. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfkd2_lj9; hfkd2_24e23; 
hfkd2_46a6; hfkd2_46bl0; hfkd2_46dl3; hfkd2_4b6; hfkd2_4c8; their complements; and 
variants thereof. 

6. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hmcfl lall; hmcfl_lc23; 
hmcfl_lel5; hmcfl_lgl3; their complements; and variants thereof. 

7. An assemblage, comprising at least oiie nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hmcfl_lc23 hmcfl_lgl3; their 
complements; and variants thereof. 

8. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hhtes3_ln3; htes3_14g5; 
htes3_14h21; htes3_14pl4; htes3_14p7; htes3_15al3; Htes3_15c24; htes3_15c6; 
htes3_15gl4; htes3_15hl; htes3_15i5; htes3_15jl8; Htes3_15j3; htes3_15kll; 
htes3_17fl0; htes3_17117; htes3_17nl2; htes3_17nl8; Htes3_18f3; htes3_1817; 
htes3_19fl9; htes3_19jl7; htes3_lcl; htes3_lgl3; htes3_lkll; htes3_20c21; htes3_20k2; 
htes3_20ml8; htes3_21d4; htes3_21jl5; htes3_21ll6; htes3_21n23; htes3_22c23; 
htes3_22g2; htes3_22nl3; htes3_23111; htes3_23nl9; Htes3_23nl9; htes3_26g22; 
htes3_27dl; htes3_27k4; htes3_27ol4; htes3_28dl4; htes3_2all; htes3_2al7; htes3_2dl5; 
htes3_2el2; htes3_2fl4; htes3_2g7; htes3_2hl; htes3_2hl5; htes3_2119; htes3_2ml8; 
htes3_2m20; htes3_2n9; htes3_2ol3; htes3_3pf4; Htes3_35b4; htes3_35b5; htes3_35e21; 
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htes3_35g6; htes3_35kl6; htes3_35k24; htes3_35nl2; htes3_35n24; htes3_35n9; 
htes3_35pl7; htes3_35p22; htes3_4b4; htes3_4fl7; htes3_4f5; htes3_4h6; htes3_4ol9; 
htes3_50j4; htes3_50n06; htes3_50n23; htes3_6b21; htes3_6cll; htes3_6dl6; htes3_72kll; 
Htes3_72kl5; htes3_72pl6; htes3_7b22; htes3_7dl7; htes3_7j3; htes3_7j8: htes3_7plO; 
htes3 7p9; htes3 Je24; Htes3_8gll; Htes3_8g5; htes3_8mlO; Htes3_8p7; Htes3_9e22; 
Htes3_9i20; Htes3_9k22; their complements; and variants thereof. 

9. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: htes3_14g5; htes3_14pl4; 
htes3_14p7; htes3_15al3; htes3_15gl4; htes3_15hl; htes3_15jl8;htes3_17flO; Htes3_18f3; 
htes3_19n9; htes3_19jl7; htes3_20c21; htes3_21n23; htes3_22c23;htes3_22nl3; 
Htes3_23nl9; htes3_27ol4; htes3_28dl4; htes3_2al 1; htes3_2dl5; htes3_2fl4; htes3_2g7; 
htes3_2hl5; htes3_2119; htes3_2m20; htes3_2n9; htes3_30f4; htes3_35g6; htes3_35n24; 
htes3_35pl7; htes3_4b4; htes3_4fl7; htes3_4ol9; htes3_50j4; htes3_50n23; htes3_50n06; 
htes3_6b21; htes3_6dl6; htes3_72kl 1; htes3_7dl7; htes3_7j8; Htes3_8gl 1; Htes3_8g5; 
Htes3_8p7; Htes3_9e22; Htes3_9i20; Htes3_9k22; their complements; and variants thereof. 

10. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_16gl8; hfbr2_2kl4; 
Htes3_35b4; htes3_35p22; htes3_7j3; htes3_7pl0; hutel_20mll; their complements; and 
variants thereof. 

11. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_16cl6; hfbr2_2b5; 
htes3_15i5; htes3_1817; htes3_lkll; Htes3_72kl5; htes3_7b22; hutel_19g22; hutel_24j6; 
their complements; and variants thereof. 

12. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_2dl5; htes3 35e21; 
hutel_2h3; their complements; and variants thereof. 

13. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_23124; hfbr2_2il7; 
hfbr2_41ml5; hfbr2_62fl0; hfbr2_62119; hfbr2_64jl8; 


loss 


wo 01/12659 PCT/1B0()/01496 

hfkd2_^24n20; hfkd2 24p5; hfkd2_4kl4; htes3^1gl3; htes3_21116; htes3_23111; 
htes3^26g22; htes3Jh6; htes3 72pl6; hutel_19hl7; hutel_20hl3; hutel_24ell; their 
complements; and variants thereof. 

14. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_3g8; hfbr2_62ol7; 
hfbr2_6b24; hfbr2_^78k24; hfkd2_24bl5; hfkd2_3ol7; hfkd2 46j20; htes3_17117; 
htes3_17nl8; htes3_27dl; htes3_2al7; htes3_35b5; htes3_35kl6; htes3_35nl2; 

htes3 35n9; hutel_20bl9; hutel_20m24; hutel_23el3; their complements; and variants 
thereof. 

15. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_23blO; hfbr2_3cl8; 
hfbr2_64al5; hfbr2_6ol7; hfbr2_72bl8; hfbr2_^72I12; hfbr2_82i24(hfbrl J0)i 
htes3_14h21; Htes3_15j3; htes3_^20ml8; htes3_22g2; htes3_2ml8; htes3_7p9; 
htes3_8mlO; hutel_1811; their complements; and variants thereof. 

16. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_23b21; hfbr2_23nl6; 
hfbr2__2cl7; hfbr2_62bll; hfbr2_78c24; hfbr2_82e4 (hfbrl_10e4); hfbr2_^82il7 
(hfbrl_^10); hfbr2_82m6 (hfbrl^lO);_hfkd2 46m4; htes3_15kll; htes3_lcl; hhles3_.ln3; 
htes3_20k2; htes3_21d4; htes3_23nl9; htes3_4f5; htes3_6cll; htes3_8e24; hutel_20g21; 
hutel_22d2; hutel_22el2; their complements; and variants thereof. 

17. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_16il2; hfbr2_16112; 
hfbr2_22hl3; hfbr2_2bl7; hfbr2_2dl7; hfbr2_64k24; hfbr2_82c20 (hfbrl_10c20); 
hfbr2_82el7 (hfbrl_10el7); hfbr2J2gl4 (hfbrl_10gl4); hfkd2_^24al5; hfkd2Jil3; 
hfkd2_4mll; hmcfljall; hmcfl_lel5; htes3_15c6; htes3_2ol3; htes3_27k4; htes3_2hl; 
htes3_35k24; hutel_19fl9; and hutel_24cl9; their complements; and variants thereof. 

18. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfkd2_46kl9; hfkd2_47a4; 
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htes3_2el2; htes3_21jl5; htes3_17nl2; hutel_18il9; hutel_li2; their complements; and 
variants thereof. 

19. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hutel_17k7; hutel_18cl2; 
hutel_18il9; hutel_18i4; hutel_1811; hutel_19fl9: hutel_19gl9; hutel_19g22; 
hutel_19hl7; hutel_19jll; hutel_li2; hutel_20bl9; hutel_20g21; hutel_20hl3; 
hutel_20mll; hutel_20m24; hutel_21dl5; hutel_22d2; hutel_22el2; hutel_22n2; 
hutel_22o2; hutel_23el3; hutel_23gll; hutel_24cl9; hutel_24ell; hutel_24j6; 
hutel_2h3; their complements; and variants thereof. 

20. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hutel_17k7; hutel_18cl2; 
hutel_18i4; hutel_19gl9; hutel_19jll; hutel_22n2; hutel_21dl5; hutel_22o2; 
hutel_23gl 1 ; their complements; and variants thereof. 

21 . A computer readable medium, conq)rising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_16cl6; hfbr2_16f21; hfbr2_16gl8; hfbr2_16il2; hfbr2_16k22; hfbr2_16112; 
hfbr2_22f21; hfbr2_22hl3; hfbr2_22hl3; hfbr2_22i4; hfbr2_22k3; hfbr2_22k8; 
hfbr2_23blO; hfbr2_23b21; hfbr2_23f2; hfbr2_23124; ; hfbr2_23nl6; hfbr2_23o24; 
hfbr2_23o5; hfbr2_2a2; hlbr2_2bl7; hfbr2_2b5; hfbr2_2cl; hfbr2_2cl7; hfbr2_2cl8; 
hfbr2_2dl5; hfbr2_2dl7; hfbr2_2d20; hfbr2_2gl8; hfbr2_2hl; hfbr2_2hlO; hfbr2_2il7; 
hfbr2_2kl4; hfbr2_2kl9; hfbr2_3cl8; hfbr2_3fl6; hfbr2_3g8; hfbr2_312; hfbr2_41ml5; 
hfbr2_62bll; hfbr2_62flO; hfbr2_62119; hfbr2_62nl0; hfbr2_62ol7; hfl)r2_64all; 
hfbr2_64al5; hfbr2_64cl6; hfbr2_64c4; hfbr2_64h6; hfbr2_64i20; hfbr2_64jl8; 
hfbr2_64k24; hfbr2_64ol6; hfbr2_6al7; hfbr2_6b24; hfbr2_6i20; hfbr2_6ol7; 
hfbr2_71o20; hfbr2_72bl8; hfbr2_72dl3; hfbr2_72112; hfbi2_72ml6; hfbr2_72nl2; 
hfbr2_78c24; hfbr2_78dl3; hfbr2_78k24; hfbr2_78n23; hfbr2_7a24; hfbr2_7e22; 
hfbr2_7j4; hfbr2_82c20; hfbrl_10c20; hfl)r2_82el7; hfbrl_10eI7; hfl)r2_82e4;; 
Iifbrl_10e4; hfbr2_82gl4;; hfbrl_10gl4; hfbr2_82il7;; hfbrl_10; hfbr2_82i24;; hfbrl_10; 
hfbr2_82ml6;; hfbrl_10; hfbr2_82m6;; hfbrl_10; hfkd2_lj9; hfkd2_24al5; hflcd2_24bl5; 
hfkd2_24e23; hfkd2_24n20; hfkd2_24p5; hfkd2_3il3; hfkd2_3ol7; hfkd2_46a6; 
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hfkd2_46bl0; hfkd2_46dl3; hfkd2_46j20; hfkd2_46kl9; hfkd2_46m4; hfkd2_47a4; 
hfkd2_4b6; hfkd2_4c8; hfkd2_4kl4; hfkd2_4mll; hmcfl lall; hmcfl_lc23; hmcfl lelS; 
hmcfl_lgl3; hhtes3_ln3; htes3_14g5; htes3_14h21; htes3_14pl4; htes3_14p7; 
htes3_15al3; Htes3_15c24; htes3_15c6; htes3_15gl4; htes3_15hl; htes3_15i5; 
htes3_15jl8; Htes3_15j3; htes3_15kll; htes3_17fl0; lites3_17117; htes3_17nl2; 
htes3_17nl8; Htes3_18f3; htes3_1817; htes3_19fl9; htes3_19jl7; htes3_lcl; htes3_lgl3; 
htes3_lkll; htes3_20c21; htes3_20k2; htes3_20ml8; htes3_21d4; htes3_21jl5; 
htes3_21116; htes3_21n23; htes3_22c23; htes3_22g2; htes3_22nl3; htes3_23Ill; 
htes3_23nl9; Htes3_23nl9; htes3_26g22; htes3_27dl; htes3_27k4; htes3_27ol4; 
htes3_28dl4; htes3_2all; htes3_2al7; htes3_2dl5; htes3_2el2; htes3_2n4; htes3_2g7; 
htes3_2hl; htes3_2hl5: htes3_2119; htes3_2ml8; htes3_2ni20; htes3_2n9; htes3_2ol3; 
htes3_30f4; Htes3_35b4; htes3_35b5; htes3_35e21; htes3_35g6; htes3_35kl6; 
htes3_35k24; htes3_35nl2; htes3_35ii24; htes3_35n9; htes3_35pl7; htes3_35p22; 
htes3_4b4; htes3_4fl7; htes3_4f5; htes3_4h6; htes3_4ol9; htes3_50j4; htes3_50n06; 
htes3_50n23; htes3_6b21; htes3_6cll; htes3_6dl6; htes3_72kll; Htes3_72kl5; 
htes3_72pl6; htes3_7b22; htes3_7dl7; htes3_7j3; htes3_7j8; htes3_7pl0; htes3_7p9; 
htes3_8e24; Htes3_8gll; Htes3 8g5; htes3_8mlO; Htes3_8p7; Htes3_9e22; Htes3_9i20; 
Htes3_9k22; hutel_17k7; hutel_18cl2; hutel_18il9; hutel_18i4; hutel lSU; 
hutel_19fl9; hutel_19gl9; hutel_l9g22; hutel_19hl7; hutel_19jll; hutel_li2; 
hutel_20bl9; hutel_20g21; hutel_20hl3; hutel_20mll; hutel_20m24; hutel_21dl5; 
hutel_22d2; hutel_22el2; hutel_22n2; hutel_22o2; hutel_23el3; hutel_23gll; 
hutel_24cl9; hutel_24ell; hutel_24j6; hutel_2h3; their complements; and variants 
thereof. 

22. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_16cl6; hfbr2_16f21; hfbr2_16gl8; hfbr2_16il2; hfbr2_16k22; hfbr2_16112; 
hfbr2_22f2I; hfbr2_22hl3; hfbr2_22hl3; hfbr2_22i4; hfbr2_22k3; hfbr2_22k8; 
hfbr2_23bl0; hfbr2_23b21; hfbr2_23f2; hfbr2_23124; ; hfbr2_23nl6; hfbr2_23o24; 
hfbr2_23o5; hfbr2_2a2; hfbr2_2bl7; hfbr2_2b5; hfbr2_2cl; hfbr2_2cl7; hfbr2_2cl8; 
hfbr2_2dl5; hfbr2_2dl7; hfbr2_2d20; hfbr2_2gl8; hfbr2_2hl; hfbr2_2hl0; hfbr2_2il7; 
hfbr2_2kl4; hfbr2_2kl9; hfbr2_3cl8; hfbr2_3fl6; hfbi2_3g8; hfbr2_312; hfbr2_41ml5; 
hfbr2_62bll; hfbr2_62fl0; hfbr2_62119; hn)i2_62nl0; hfbr2_62ol7; hfbi2_64all; 
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hfbr2_64al5; hfbr2_64cl6; hfbr2_64c4; hfbr2_64h6; hfbr2_64i20; hfbr2_64jl8; 
hfbr2_64k24; hfbr2_64ol6; hfbr2_6al7; hfbr2_6b24; hfbr2_6i20; hfbr2_6ol7; 
hfbr2_71o20; hfbr2_72bl8; hfbr2_72dl3; hfbr2_72112; hfbr2_72ml6; hfbr2_72nl2; 
hfbr2_78c24; hfbr2_78dl3; hfbr2_78k24; hfbr2_78n23; hfbr2_7a24; hfbr2_7e22; 
hfbr2_7j4; hfbr2_82c20; h£brl_10c20; hfbr2_82el7; hfbrl_10el7; hfbr2J2e4; 
hfbrl_10e4; hfbr2_82gl4; hfbrlJOgU; hfbr2_82il7; hfbrl_10; hfbr2_82i24; hfbrl_10; 
hfbr2_82inl6; hfbrl lO; hfbr2_82m6; hfbrl_10; complements of the nucleic acid 
sequences; and variants thereof. 

23. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_16f21; hfbr2_16k22; hfbr2_22f21; hfbr2_22hl3; hfbr2_22i4; hfbr2_22k3; hfbr2_22k8; 
hfbr2_23f2; ; hfl)r2_23o24; hfbr2_23o5; hfbr2_2a2; hfbr2_2cl; hfbr2_2cl8; hfbr2_2d20; 
hfbr2_2gl8; hfbr2_2hl; hfbr2_2hl0; hfbr2_2kl9; hfbr2_3fl6; hfbr2_312; hfbr2_62nl0; 
hfbr2_64al 1; hfbr2_64cl6; hfbr2_64c4; hfbr2_64h6; hfbr2_64i20; hfbr2_64k24; 
hfbr2_64ol6; hfbr2_6al7; hfbr2_6i20; hfbr2_71o20; hfbr2_72dl3; hfbr2_72ml6; 
hfbr2_72nl2; hfbr2_78dl3; hfbr2_78n23; h£br2_7a24; hfbr2_7e22; hfbr2_7j4; hfbr2_82ml6; 
hfbrl lO; complements of the nucleic acid sequences; and variants thereof. 

24. A cominiter readable medium, con^rising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfkd2_lj9; hflcd2_24al5; hfkd2_24bl5; hfkd2_24e23; Wkd2_24n20; h£kd2_24p5; 
hfkd2_3il3; hfkd2_3ol7; hfkd2_46a6; hfkd2_46bl0; hfkd2_46dl3; hfkd2_46j20; 
hfkd2_46kl9; hfkd2_46m4; hfkd2_47a4; hflcd2_4b6; hfkd2_4c8; hfkd2_4kl4; 
hflal2_4mll; conq)lenients of the nucleic acid sequences; and variants thereof. 

25. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: hfkd2_l j9; 
h£kd2_24e23; hflcd2_46a6; hfkd2_46bl0; hfkd2_46dl3; hfkd2_4b6; h&d2_4c8; 
complements of the nucleic acid sequences; and variants thereof. 

26. A conq)uter readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 


1089 


wo 01/12659 PCT/IBOO/01496 

hmcfl lall; hmcfl_lc23; hmcfl lelS; hmcfl lglS; complements of the nucleic acid 
sequences; and variants thereof. 

27. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hmcfl_lc23; hmcfl_lgl3; conq>leinents of the nucleic acid sequences; and variants thereof. 

28. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hhtes3_ln3; htes3_14g5; htes3_14h21; htes3_14pl4; htes3_14p7; htes3_15al3; 
Htes3_15c24; htes3_15c6; htes3_15gl4; htes3_15hl; htes3_15i5; htes3_15jl8; Htes3_15j3; 
htes3_15kll; htes3_17flO; htes3_17117; htes3_17nl2; htes3_17nl8; Htes3_18D; 
htes3_1817; htes3_19n9; htes3_19jl7; htes3_lcl; htes3_lgl3; htes3_lkll; htes3_20c21; 
htes3 20k2; htes3_20ml8; htes3_21d4; htes3_21jl5; htes3_21116; htes3_21n23; 
htes3_22c23; htes3_22g2; htes3_22nl3; htes3_23111; htes3_23nl9; Htes3_23nl9; 
htes3_26g22; htes3_27dl; htes3_27k4; htes3_27ol4; htes3_28dl4; htes3_2all; 
htes3_2al7; htes3_2dl5; htes3_2el2; htes3_2fl4; htes3_2g7; htes3_2hl; htes3_2hl5; 
htes3_2119; htes3_2ml8; htes3_2m20; hles3_2n9; htes3_2ol3; htes3_30f4; Htes3_35b4; 
htes3_35b5; htes3_35e21; htes3J5g6; htes3_35kl6; htes3_35k24; htes3_35nl2; 
htes3_35n24; htes3_35n9; htes3_35pl7; htes3_35p22; htes3_4b4; htes3_4fl7; htes3_4f5; 
htes3_4h6; htes3_4ol9; htes3_5()j4; htes3_50n06; htes3_50n23; htes3_6b21; htes3_6cll; 
htes3_6dl6; htes3_72kll; Htes3_72kl5; htes3_72pl6; htes3_7b22; htes3_7dl7; htes3_7j3; 
htes3_7j8; htes3_7pl0; htes3_7p9; htes3_8e24; Htes3_8gll; Htes3_8g5; htes3_8mlO; 
Htes3_8p7; Htes3_9e22; Htes3_9i20; Htes3_9k22; complements of the nucleic acid 
sequences; and variants thereof. 

29. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: htes3_14gS; 
htes3_I4pl4; htes3_14p7; htes3_15al3; htes3_15gl4; htes3_15hl; htes3_15jl8; 
htes3_17flO; htes3_17nl8; Htes3_18f3; htes3_19fl9; htes3_19jl7; htes3_20c21; 
htes3_21n23; htes3_22c23; htes3_22nl3; Htes3_23nl9; htes3_27ol4; htes3_28dl4; 
htes3_2al 1; htes3_2dl5; htes3_2fl4; htes3_2g7; htcs3_2hl5; htes3_21I9; htes3_2m20; 
htes3_2n9; htes3_30f4; htes3_35g6; htes3_35n24; htes3_35pl7; htes3_4b4; htes3_4fl7; 
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htes3_4ol9; htes3_50j4; htes3_50n23; htes3_50n06; htes3_6b21; htes3_6dl6; htes3_72kl 1; 
htes3_7dl7; htes3_7j8; Htes3_8gl 1; Htes3_8g5; Htes3_8p7; Htes3_9e22; Htes3_9i20; 
Htes3_9k22; complements of the nucleic acid sequences; and variants thereof. 

30. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_16gl8; hfbr2_2kl4; Htes3_35b4; htes3_35p22; htes3_7j3; htes3_7pl0; 
hutel_20mll; complemaits of the nucleic acid sequences; and variants thereof. 

31. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_16cl6; hfbr2_2b5; htes3_15i5; htes3_1817; htes3_lkll; Htes3_72kl5; htes3_7b22; 
hutel_19g22; hutel_24j6; complements of the nucleic acid sequences; and variants thereof. 

32. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hft)r2_2dl5; htes3_35e21; hutel_2h3; complements of the nucleic acid sequences; and 
variants thereof. 

33 . A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbi2_23124; hfbr2_2il7; hfbr2_41ml5; hfbr2_62fl0; hfbr2_62119; hfbr2_64jl8; 
h!kd2_24n20; hfkd2_24p5; hfkd2_4kl4; htes3_lgl3; htes3_21116; htes3_23111; 
htes3_26g22; htes3_4h6; htes3_72pl6; hutel_19hl7; hutel_20hl3; hutel_24ell; 
complements of the nucleic acid sequences; and variants thereof. 

34. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consistiAg of: 
hfl)r2_3g8; hfbr2_62ol7; hfbr2_6b24; h£bi2_78k24; hfkd2_24bl5; hfkd2_3ol7; 
hflcd2_46j20; htes3_17117; Htes3_17nl8; htes3_27dl; htes3_2al7; htes3_35b5; 
htes3_35kl6; htes3_35nl2; htes3_35n9; hutel_20bl9; hutel_20m24; hutel_23el3; 
con^lements of the nucleic acid sequences; and variants thereof. 

35. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
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hfbr2_23bl0; hfbr2 3cl8; hfbr2_64al5; hfbr2_6ol7; hfbr2_72bl8; hfbr2_72112; 
hfbr2J2i24(hfbrl_^10)^htes3_14h21; Htes3_15j3; htes3JOml8; htes3_22g2; htes3Jml8; 
htes3_7p9; htes3_8mlO; hutel_1811; complements of the nucleic acid sequences; and 
variants thereof. 

36. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2__23b21; hfbr2^23nl6; hfbr2_2cl7; hfbr2_62bll; hfbr2_78c24; hfbr2J2e4 
(hfbrl_10e4); hfbr2J2il7 (hfbrl^lO); hfbr2_82m6 (hfbrl_10);_hfkd2_46m4; htes3_15kll; 
htes3_lcl; hhtes3_ln3; htes3_20k2; htes3_21d4; htes3_23nl9; htes3_4f5; htes3_6cll; 
htes3_8e24; hutel_20g21; hutel_^22d2; hutel_22el2; complements of the nucleic acid 
sequences; and variants thereof. 

37. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: . 
hfbr2_16il2; hfbr2_16112; hfbr2_22hl3; hfbr2__2bl7; hfbr2_2dl7; hfbr2_64k24; 
hfbr2_82c20 (hfbrl_^10c20);.hfbr2_82el7 (hfbrl_10el7); hfbr2_82gI4 (hfbrl_10gl4); 
hfkd2_24al5; hfkd2_3il3; hfkd2_4mll; hmcn_lall; hmcfl_lel5; htes3_15c6; 
htes3_2ol3; htes3_27k4; htes3__2hl; htes3J5k24; hutel_19fl9; and hutel_24cl9; 
complements of the nucleic acid sequences; and variants thereof. 

38. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfkd2_46kl9; hfkd2_47a4; htes3_2el2; htes3_21jl5; htes3_17nl2; hutel^l8il9; 
hutel_li2; complements of the nucleic acid sequences; and variants thereof. 

39. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hutel J7k7; hutel_18cl2; hutel_18il9; hutel 18i4; hutel_1811; hutel_19n9; 
hutel_19gl9; hutel J9g22; hutel_19hl7; hutel_19jll; hutel_li2; hutel_20bl9; 
hutel_20g21; hutel J0hl3; hutel_20mll; hutel_^20m24; hutel_21dl5; hutel_22d2; 
hutel_22el2; hutel_22n2; hutel_22o2; hutel_23el3; hutel_23gll; hutel_24cl9; 
hutel_24ell; hutel_24j6; hutel_2h3; complements of the nucleic acid sequences; and 
variants thereof. 
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40. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hutel_17k7; hutel_18cl2; hutel_18i4; hutel_19gl9; hutel_19jl 1; hutel_22n2; 
hutel_21dl5; hutel_22o2; hutel_23gl 1; complements of the nucleic acid sequences; and 
variants thereof. 

41 . A nucleic acid molecule having the sequence of a clone selected from the 
group consisting of hfbr2_16cl6; hfbr2_16f21; hfbr2_16gl8; hfbr2_16il2; hfbr2_16k22; 
hfbr2_16112; hfbr2_22f21; hfbr2_22hl3; hfbr2_22hl3; hfbr2_22i4; hfbr2_22k3; 

hfbr2_22k8; hfbr2_23bl0; hfbr2_23b21 ; hfbr2_23f2; hfbr2_23l24; hfbr2_23nl6; 

hfbr2_23o24; hfbr2_23o5; hfbr2_2a2; hfbr2_2bl7; hfbr2_2b5; hfbr2_2cl; hfbr2_2cl7; 

hfbr2_2cl8; hfbr2_2dl5; hfbr2_2dl7; hfbr2_2d20; hfbr2_2gl8; hfbr2_2hl; hfbr2_2hl0; 

hfbr2_2il7; hfbr2_2lcl4; hfbr2_2kl9; hfbr2_3bl6; hfbr2_3cl8; hfbr2_3fl6; hfbr2_3g8; 

hfbr2_312; hfbr2_41ml5; hfbr2_62bll; hfbr2_62flO; hfbr2_62119; hfbr2_62nl0; 

hfbr2_62ol7; hfbr2_64all; hfbr2_64al5; hfbi2_64cl6; hfbr2_64c4; hfbr2_64h6; 

hfbr2_64i20; hfbr2_64jl8; h£br2_64k24; hfbr2_64ol6; hfbr2_6al7; hfbr2_6b24; 

hfbr2_6i20; hfbr2_6ol7; hfbr2_71o20; hfbr2_72bl8; hfbr2_72dl3; hfbr2_72112; 

hfbr2_72ml6; hfbr2_72nl2; hfbr2_78c24; hfbr2_78dl3; hfbr2_78k24; hfbr2_78n23; 

hfbr2_7a24; hfbr2_7e22; hfbr2_7j4; hfbr2_82c20; hfbrl_10c20; hfbr2_82el7; 

hfbrl_10el7; hfbr2_82e4;; hfbrl_10e4; hfbr2_82gl4;; hfbrl_10gl4; hfbr2_82il7;; 

hlbrl_10; hfbr2_82i24;; hfbrl_10; hlbr2_82ml6;; hfbrl_10; hfbr2_82m6;; hfbrl_10; 

hfkd2_lj9; hfkd2_24al5; hfkd2_24bl5; hfkd2_24e23; hfkd2_24n20; hflcd2_24p5; 

hfkd2_3il3; hfkd2_3ol7; hfkd2_46a6; hfkd2_46bl0; hfkd2_46dl3; hfkd2_46j20; 

hfkd2_46kl9; hfkd2_46m4; hfkd2_47a4; hfkd2_4b6; hfkd2_4c8; hfkd2_4kl4; 
hfkd2_4mll; hmcfl_lall; hmcfl_lc23; hmcfl_lel5; hmcfl_lgl3; hhtes3_lii3; 
htes3_14g5; htes3_14h21; htes3_14pl4; htes3_14p7; htes3_15al3; Htes3_15c24; 
htes3_15c6; htes3_15gl4; htes3_15hl; htes3_15i5: htes3_15jl8; Htes3_15j3; htes3_15kll; 
htes3_17fl0; htes3_17ll7; htes3_17nl2; htes3_17nl8; Htes3_18f3; htes3_1817; 
htes3_19fl9; htes3_19jl7; hles3_lcl: htes3_lgl3; htes3_lkll; htes3_20c21; htes3_20k2; 
htes3_20ml8; htes3_21d4; htes3_21ji5; htes3_21116; htes3_21n23; htes3_22c23; 
htes3_22g2; htes3_22nl3; htes3_23111; htes3_23nl9; Htes3_23nl9; htes3_26g22; 
htes3_27dl; htes3_27k4; htes3_27ol4; htes3_28dl4; htes3_2all; htes3_2al7; htes3_2dl5; 
htes3_2el2: htes3_2fl4; htes3_2g7; htes3_2hl; htes3_2hl5; htes3_2ll9; htes3_2ml8; 
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htes3_2m20; htes3 2n9; htes3_2ol3; htes3_30f4; Htes3^35b4; htes3_35b5; htes3_35e21; 
htes3 35g6; htes3_35kl6; htes3 35k24; htes3 35nl2; hies3 35n24; htes3_35n9; 
htes3J5pl7; htes3J5p22; htes3_4b4; htes3_4fl7; htes3_4f5; htes3_4h6; htes3_4ol9; 
htes3J0j4; htes3_50n06; htes3_50n23; htes3_6b21; htes3_6cll; htes3_6dl6; htes3J2kll; 
Htes3_72kl5; htes3_72pl6; htes3^7b22; htes3_7dl7; htes3_7j3; htes3^7j8; htes3_7plO; 
htes3^7p9; htes3Je24; Htes3_8gll; Htes3Jg5; htes3^8mlO; Htes3_8p7; Htcs3Je22; 
Htes3_9i20; Htes3_9k22; hutel_17k7; huiel J8cl2; hutel_18il9; hutel_18i4; hutel_1811; 
hutel 19fl9; hutel_19gl9; hutel_19g22; hutel_19hl7; hutel_19jll; hutelji2; 
hutel_20bl9; hutel_20g21; hutel_20hl3; hutel_20mll; hutel_20in24; hutel_21dl5; 
hutel_22d2; hutel_22el2; hutel_22n2; hutel__22o2; hutel_23el3; hutel_23gll; 
hutel_24cl9; hutel_24ell; hutel_24j6; hutel_2h3; their complements; and variants 
thereof. 

42. A polypeptide encoded by the nucleic acid molecule according to claim 41. 

43 . An antibody or fragment thereof that is capable of binding to a specific portion 
of the peptide according to claim 42. 

44. A pharmaceutical composition, comprising (a) an eflfective amount of a 
pharmaceutical agent, wherein said pharmaceutical agent is selected from the group consisting 
of the polypeptide according to claim 42, variants or fiihctional derivatives thereof, and 
antibodies thereto; and (2) a physiologically acceptable carrier or excipient. 

45. An expression vector comprising the nucleic acid molecule of clahn 41 or a 
fragment thereof, and optionally a promoter operably linked to said nucleic acid molecule or 
said fragment. 

46. A method for recombinantly producing a desired peptide, comprising expressing 
in a host cell a peptide encoded by the nucleic acid molecule according to claim 41. 
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